I just generated an image over the api and the image quality is really bad (compared to low on bad days) although I set it to high and got charged for high (6208 Tokens Output):
quality: high
size: 1536x1024
text input: 105t
output: 6,208t
Please, if you don’t have the computational capacity for high renderings at any given moment (which is totally understandable with your growth rate), tell us with an error message and or status code, so we don’t spend 0.25 USD on a low quality image which is not usable.
On top of that if I use that in a batch environment, I would need another AI on top to judge the output result every time to see if it’s garbage (like this one) or is actually usable.
That’s not really efficient from any perspective.
Ultra-realistic 16:9 cinematic bright midday shot of Expo City Dubai, alive with innovation and movement. A young family explores under the towering, artistic shade of Al Wasl Dome, while behind them, autonomous shuttles glide past smart pavilions with dynamic facades. Children interact with touchscreen displays near kinetic art, and multicultural groups walk beneath digital signage welcoming the world. Everything gleams with purpose and progress—Expo City as the beating heart of Dubai’s sustainable, experiential future.
And this was the complete result (just zoomed in on the details before):
Don’t get me wrong, I know the Image Generation from OpenAI ist the best out there, it just seems that under high load the results “vary” a bit too much on the lower side.
At this price point I would expect a high quality result if I request and pay for it or get a 503 / 429.
I’ll try to remember to bring it up at the next meeting with OpenAI.
BTW, for anyone listening, the forum Regulars have meetings with OpenAI too occasionally. Great reason to become a TL3 member here. Best way to achieve it? Helping others and keeping the mods’ job easy!
Seriously, though, thanks for passing it on. Squeaky wheels get the grease!
ETA: All that said, that prompt may be a bit too “thin” for gpt-image-1? Try playing with slightly longer prompts. Input tokens aren’t as expensive and sometimes they help.
That would be great, if you could make them aware of this issue - that’s all I wanted to achieve with this post.
I can tell you from over 5000 rendered images that the prompt’s style and length works great normally and that it’s just the render quality from time to time.
And from a technical point of view it makes sense because it’s not the elements or objects in the picture which are a problem it’s just the render quality of the details (faces, background etc.) and it looks like those results which you get from the lower quality settings.
I have a feeling theres some A/B testing going on or quality throttling. eg. we make a request and pay for a high quality image, but some other enterprise customer is hogging the compute, so they take our money and just hand us a low quality dud instead of having a long response time or timing out while waiting for compute
The twisting and contorting of faces at medium but recognizable scale is a real issue with gpt-4o image 1 gen whatever.
The balance of features and requirements should be judged against other models that are known for representing the style of depiction needed. If you don’t know what that would be, you can go exploring the random generations of various providers at https://lmarena.ai/
I re-prompt the concept so it has representable details to offer it up to a provider. Readily making people facing all directions is to its advantage:
A bright midday photograph taken at Expo City Dubai. A young family in the foreground explores the pavilion under the towering, artistic shade of Al Wasl Dome, while behind them, autonomous shuttles glide past smart pavilions with dynamic facades. The background portrays children interacting with touchscreen displays near kinetic art, or multicultural groups walking beneath digital signage which welcomes the world. Everything is detailed in the photo, with modern purpose and progress.