Gpt4o api is producing bad images compared chatgpt gpt4o

So much difference between API (first image) and Chatgpt models. Used the same prompt.

“A high-resolution photograph of a tall, transparent glass vase with two elegant orchid stems. The orchids have vibrant orange and pink petals with deep magenta centers. The vase is filled with clear water and small air bubbles, and it casts a soft shadow on a white background. The composition is minimalistic and well-lit, with a subtle gradient in the lighting from left to right. Some unopened orchid buds are visible on the stem, adding a natural touch.”


1 Like

The image you have there looks A LOT like DALL-E 2. The image has the same bland pallor and improper framing when instructing photo-realism as is often seen with that model.

I’m going to upgrade to API DALL-E 3 at HD and prompt it well.

What did I send? It was not this:

but instead:

(expand) DALL-E 3 API prompt

In the center of a pristine, minimalist white setting stands a tall, cylindrical glass vase with flawless transparency. The vase is narrow yet elongated, rising with clean verticality that subtly magnifies the clarity of the water it holds. This water is perfectly clear, its stillness only disturbed by fine bubbles suspended throughout—tiny air pockets that gather lightly along the inner curve of the glass. These bubbles catch light delicately, creating faint glints that ripple subtly through the refracted view of the stem inside.

Emerging from the water are two elegant orchid stems, each gracefully arcing outward and upward in natural poise. These stems are slender, green with a hint of gloss, and dotted with nodes and a few unopened buds—each bud a pale greenish-pink with soft, matte texture, their tips suggesting imminent bloom. The buds are arranged asymmetrically along the stems, enhancing the organic flow and realism of the floral structure.

At the apex of each stem bloom vibrant orchid flowers—five or six in total—each meticulously rendered with luxurious detail. The petals are velvety in texture, glowing in a radiant gradient of orange and pink, transitioning seamlessly from golden peach tones near the outer edges into saturated sunset pink toward the center. At the heart of each blossom sits a deep magenta labellum, with rich, velvety folds and a speckled interior, creating a visual focal point that draws the viewer inward. The orchids exhibit slight curling at the petal tips and gentle variance in petal orientation, emphasizing realism.

Lighting in the composition is sophisticated and deliberate, casting a subtle gradient across the white background—from a warmer light on the left side to a slightly cooler, soft gray on the right. This gradient lends a three-dimensional depth and artful contrast to the scene, enriching the texture of petals and the glossy sheen of the glass. The light also creates a diffuse, soft-edged shadow beneath and slightly behind the vase, anchoring it realistically to the setting and reinforcing its placement on a smooth, matte white surface.

The entire composition is captured in high resolution, with a focus sharp enough to distinguish individual fibers on the petal surfaces and the internal structure of the air bubbles. The image’s perspective is direct and frontal, aligning the vertical axis of the vase centrally in the frame, with the orchid stems rising symmetrically but with natural asymmetry in their reach and curve. This gives the photograph a botanical precision without sacrificing the grace and flow of artistic still life.

The style is hyper-realistic with a photographic quality, balancing scientific clarity with aesthetic minimalism. There are no background distractions, no props, no texture beyond what is innate to the flowers, glass, and water. The emotional tone is calm and contemplative, with every element—from the fine orchid hairs to the refractive distortion of the waterline—designed to invite a slow, appreciative gaze.

It was the language produced by this technique below, that is designed for improving your ChatGPT gpt-4o images. The “your task” can be run as a system message on a high-quality API AI model. The output language is more than the actual internal DALL-E 3 can accept as tokens, but the API rewriter will take care of that.

# Image idea

(your language)

# Your task: 

Write out directly into this chat a full detail synthesizing all the language I have provided so far together into an elaborate language passage describing everything related to the image subject or subjects, the setting and background, the style and dimensionality and perspective, and the technique therein. Everything that must be depicted must be intricately described clearly, up to six paragraphs that can forensically reproduce the entirety of how you imagine the image needs to appear.

# Automatic task 2:

Only then after writing *to me* that language will you finally send to the tool recipient, automatically. The prompt sent to image_gen has none of that language included; instead the prompt shall ONLY act as a trigger to set the image creator in motion with the correct size, and shall be exactly, "Create described image based on language just seen. Ignore previous image generations." You must understand that "prompt" no longer needs to have useful language, because what you output to be here before sending to the tool recipient is context that can be observed and is passed.

How does the technique work for ChatGPT? It works a lot better to impose that on a continuing conversation, but here is the first session output…

Thanks for taking time to answer this. Chatgpt image is still so much cleaner and visually appealing than advanced DALLE-3.

So there is no way to reproduce Chatgpt level images with API?

Note: Also I am not sure if you can send that long prompt with API because of Gpt4o token limits.

I am sure I can send it because I did. And again for you. 4000 character limit, and you can just truncate anyone’s attempts to go further.

The API has DALL-E models. For the task of rewriting prompts to further quality (more than the API does internally), I don’t know why gpt-4o would play into any restriction as it can input and output much more.

No image creation by gpt-4o-multimodal-vision-output is offered yet, but may come in the future.

I explicitly set dalle-3 now the output looks better. thank you.

Yes hopefully they release Chatgpt level image quality through API. Chatgpt atm producing super cool images that people actually love.

1 Like