How to send image and get image as output from GPT-4o model using API

Using chatGPT UI we can attached a reference image, add a text prompt and get the output image. How we can achieve similar using API?

Hey! What you’re referring to is only possible through the ChatGPT web interface and not via the API. Unfortunately, as of now, the ChatGPT API only supports generating images from text prompts or analyzing images to extract information but it doesn’t allow using an image as a reference for generation like in the web version. it may come soon I’m waiting for it as well.

With gpt-image-1 you can now do this using either the Responses API or the image edits endpoint with reference images like:

import base64
from openai import OpenAI
client = OpenAI()

prompt = """
Generate a photorealistic image of a gift basket on a white background 
labeled 'Relax & Unwind' with a ribbon and handwriting-like font, 
containing all the items in the reference pictures.
"""

result = client.images.edit(
    model="gpt-image-1",
    image=[
        open("body-lotion.png", "rb"),
        open("bath-bomb.png", "rb"),
        open("incense-kit.png", "rb"),
        open("soap.png", "rb"),
    ],
    prompt=prompt
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("gift-basket.png", "wb") as f:
    f.write(image_bytes)

What he’s referring to is entirely possible using the API.

However one has to submit to the ID verification where you send your government ID picture along with taking video of yourself to a third-party sketch company withpersona.com to unlock the needed AI model gpt-image-1 on the images edits API.

And in fact, while dall-e-2 can infill and outfill exactly where you have drawn an alpha channel mask, gpt-image-1 (the gpt-4o based model being discussed), can only re-imagine the image with subtle changes, and can also completely ignore lack of a drawn mask - which is its job..

Image input, and auto-mask for my API app is only to outfill, but the prompt box has an instruction to add more stuffed animals:

Edited result received based on a “reference” image (which could be the best description of the tech):

(the prompt and subject is the kind needed not to get blocked by the moderation done, also)

But checkout the top-right corner of your first image.

Is that a cat levitating below the tree/above the other cat on bench?

Now that’s what I call quality.

The input was by dall-e-3, wide.

And, but check out the fact that the flying cats are now gone, along with the lake turning into a meadow, with no lake to look at, a park bench now observing the picnic, or as seen through only a 512px input version, along with everything else different, reframed, all that is outside of the masked alpha channel area in red.

Unchanged is the 32 bit RGBA mask used with dall-e-2, sent through the mask file form field.

OpenAI continues with mistruths: