This generative model allows you to sketch out a scene with a few words, it then leverages an LLM to flesh out the details, with the ultimate goal of feeding those details to a downstream visual image generation model.
It is almost, but not quite, entirely the inverse of image captioning models.
This offers the closest experience to an image generation tool that's usable by people with visual impairments.

huggingface.co/spaces/lllyasvi…

#a11y #genai #llm

Tamas G reshared this.