Image Generation through Open AI

So I have a model which is typically taking one PDF file and summarizes it. Now I want to couple this model with another text to image generation model which should generate images based on the summarized text coming from the text summarization model.

The real problem I am facing is: I implemented this with Stable Diffusion model but if there are alot of text summaries, it takes that summaries time i.e. O(n square) time to generate the summaries and result in taking roughly 20 hours generating around 20 images. So I am thinking of implementing this with DALL-E API (for which I need to spend some dollars from my own pocket) but I am not sure whether this will help me in time optimization. I am performing this on MPS (Apple Silicon) GPU.

Can anyone give me any recommendation on reducing the time complexity by any means such as any other solution apart from Stable Diffusion or DALL - E or hardware related (I know already that NVIDA will be GOD to resolve this problem) but before that any other custom solution can be developed to handle this?

I am open for any thoughts please think out loud here I am waiting for your responses

1 Like

DALL-E API can produce with a rate limit of 500 images per minute of usage rate limit at tier-1 account level now, and you can launch them all in parallel with code. Huge datacenter and model optimized for throughput means your hundreds of API calls make little impact on that “o”. Generation time is around 10 seconds.

The cost is $0.02-$0.12 per image, depending on specifically which model and which image size and detail settings you request.

The images are not reliable diagrams of data suitable for most PDF-sourced business presentations, though, they are artworks.