Seeking ideas for text to image generation

sunethmaw6 · June 7, 2024, 4:39pm

Hello everyone,

I am a university student currently working on my final year research project, and I am looking for ideas related to text-to-image generation. My goal is to identify a meaningful research gap in this field that I can explore.

So far, I have reviewed several existing models and techniques, such as DALL-E and other GAN-based methods, but I am struggling to pinpoint a specific area that has not been extensively covered. I am particularly interested in topics that could contribute to improving image quality, semantic coherence, or other aspects of text-to-image models.

Could anyone recommend some potential research areas or gaps in the current literature that I could investigate? Any suggestions, papers, or resources would be greatly appreciated.

Thank you in advance for your help!

grandell1234 · June 7, 2024, 4:45pm

There are two kinds of image models: GAN and Diffusion. Currently, most people are using diffusion for their models, such as DALL-E and Stable Diffusion, as they are the best. An idea would be to do more research on GAN, as that has significantly slowed down. Another idea would be to create image upscalers using GANs, as I don’t believe I have ever seen a GAN upscaler before, as they are mostly diffusion models. I made a GAN a bit ago and some code and results can be found here: GitHub - grandell1234/S.C.O.R.P: Text-To-Image GAN Model

sunethmaw6 · June 7, 2024, 6:41pm

Does creating image upscalers using GANs, means inputting a low quality image and generating a high quality image from that. like the Super-Resolution Generative Adversarial Networks (SRGANs). Can you explain a bit more about it.

grandell1234 · June 7, 2024, 6:59pm

Yeah, or taking it from like 512x512 to 1280x1280 adding pixels where required.

Myango · June 8, 2024, 12:47am

Currently I ran into a problem where Dalle can’t divide the rendered image into exact spaces, for example a rectangle into 8 exact boxes, then making an image for every box. The problem is I’ve seen it accomplished by one GPT. The other obvious issue is rendering text or counting. I think Dalle should add a layer of text object on top of the Dalle image then merge them prior to output. If the Coordinates can be matched then we resolve this issue completely by merging text logic with the image logic, then merging them prior to output. So an instruction to create art at X, Y coordinates of an image. Once this is mastered, commands can place text. Even HTML can accomplish this simple concept of a background image under text. Dalle needs to understand its canvas space. To take a command like:
Create 8 boxes with random art numbered 1-8, center text.

We can’t even get Dalle to return 8 exact sized boxes of art yet.

…

Topic		Replies	Views
Spelling mistakes in Dalle-3 generated images API gpt-4 , dall-e-3 , dalle3	15	11765	July 31, 2024
Dall-E is sooo bad at recognizing letters and numbers - any advice? Prompting gpt-4 , chatgpt , dalle3 , dalle3-feedback	11	3186	May 17, 2024
Improving Text Generation in Images Community dall-e	2	335	March 8, 2025
Using Dalle to create backgrounds for social publications Prompting chatgpt , dalle3	4	551	March 3, 2025
DALL·E Mastery: Seeking Vibrant Community and Top-Tier Talent API api , dall-e-3 , dalle3	5	785	May 13, 2024

Seeking ideas for text to image generation

Related topics