Hello,
I’m trying to replicate a task I often do inside ChatGPT — modifying the appearance of an image (for example, making a 3D model look like polished gold on a black reflective background), without changing its geometry or layout.
In ChatGPT, this works surprisingly well. From what I understand, the image is interpreted by GPT-4o, which generates a descriptive prompt that is then passed to DALL·E. The key strength is that ChatGPT internally knows how to describe the image so that DALL·E replicates it faithfully with improved materials and lighting — without destroying the original form.
However, when I try to build the same flow using the GPT-4o + DALL·E APIs directly (image → GPT → DALL·E), the results are much worse:
- DALL·E often ignores the structure of the image
- Descriptions from GPT-4o get blocked due to content policy (even if neutral)
- The visual consistency is far from what ChatGPT produces
This leads to my main question:
Can we get API access to the same internal “picture-to-picture” pipeline that ChatGPT uses?
That is, a way to let GPT-4o handle both the image analysis and safe prompt generation for DALL·E behind the scenes, just like it does here in the ChatGPT app.
This would save developers like me from having to reverse-engineer the ChatGPT magic. Even a restricted or opt-in version would be immensely helpful.
Thanks in advance — and thank you for the incredible tools.