Picture-to-picture with GPT-4o and DALL·E API does not match ChatGPT

Hello,
I’m trying to replicate a task I often do inside ChatGPT — modifying the appearance of an image (for example, making a 3D model look like polished gold on a black reflective background), without changing its geometry or layout.

In ChatGPT, this works surprisingly well. From what I understand, the image is interpreted by GPT-4o, which generates a descriptive prompt that is then passed to DALL·E. The key strength is that ChatGPT internally knows how to describe the image so that DALL·E replicates it faithfully with improved materials and lighting — without destroying the original form.

However, when I try to build the same flow using the GPT-4o + DALL·E APIs directly (image → GPT → DALL·E), the results are much worse:

  • DALL·E often ignores the structure of the image
  • Descriptions from GPT-4o get blocked due to content policy (even if neutral)
  • The visual consistency is far from what ChatGPT produces

This leads to my main question:

Can we get API access to the same internal “picture-to-picture” pipeline that ChatGPT uses?
That is, a way to let GPT-4o handle both the image analysis and safe prompt generation for DALL·E behind the scenes, just like it does here in the ChatGPT app.

This would save developers like me from having to reverse-engineer the ChatGPT magic. Even a restricted or opt-in version would be immensely helpful.

Thanks in advance — and thank you for the incredible tools.

1 Like

Your understanding is incorrect.

The DALL-E model is retired from ChatGPT, except in a special GPT.

The new gpt-4o-based image creation model similar to ChatGPT, but on the API, is called gpt-image-1

What you would do to match the pattern of “chat” (although it is better to directly use the edits endpoint:

  • Use the Responses endpoint where you can specify internal tools;
  • use the image creation tool, setting appropriate parameters;
  • use gpt-4o-2024-11-20 as the chat AI to have a similar experience;
  • use user role messages, multimodal multi-part, to provide request language and input images

Click on “Documentation” in the forum’s sidebar, go to images, and you will see this pattern (instead of showing how to use the edits endpoint to not chat but to make individual requests directly).

2 Likes

Thank you very much for your help. It was very enlightening and helpful. I used the editing images section of the cookbook, and it worked perfectly.

2 Likes