How to generate a new image using an existing image as input (like ChatGPT does with GPT-4o)?

I’ve been trying to replicate what chatgpt does when I upload an image (e.g. a food photo like fried rice) and ask it to “make this look good for a social media thumbnail.”

In chatgpt, it literally gives me back a new, enhanced image — brighter colors, better lighting, blurred background — looks great.

But when I try this through the openAI API, I’m not getting the same result. I’ve tried:

  • images.edit
  • images.create_variation → just creates random variations, not enhancements
  • gpt-4o with chat.completions.create() → I can send the image and a prompt, but it just replies with text… not a new image.

I want to send an image and a prompt like “Make this food image look vibrant and professional for a thumbnail,” and get back a new, enhanced image — just like how chatgpt returns one when you upload.

How do I actually get back an edited image from GPT-4o api?
Do I need to enable some setting, or use a different endpoint?

Any working example or guidance would be super appreciated :folded_hands:

2 Likes

You would use the image edits endpoint, but would specify the gpt-image-1 API model. This requires a different set of form-data parameters than dalle-e-2, essentially treating it like a different API.

It can accept multiple images, and uses discussion instead of mask, because it can only regenerate the entire image.

An organization requires personal government ID verification and selfie video of the person to use this model.


Here is an example procedural non-async Python script to make a highest-quality request with the type of description needed for synthesis, also using the more costly input_fidelity parameter. For the three input images demonstrated, the total would be about $0.44 per API call.

"""
Edit images with /v1/images/edits (model gpt-image-1) via httpx.
"""
import os
import base64
from pathlib import Path
import logging
import httpx  # non-built-in, may already be installed by `openai` module

logging.basicConfig(level=logging.INFO)

# user inputs
PROMPT = """
The first image is me in a pink dress.
The second image is another model wearing a light paisley dress and top.
The third image is a satin cover-up top.
Produce a new image where I am wearing the dress from the second image and the top from the third image.
"""
INPUT_PATHS = [
    "images/input-woman.jpg",  # file name, or with directories, relative or absolute
    "images/input-dress.webp",
    "images/input-top.jpg",
    ]  # add more paths as needed
OUTPUT_FILE = "new-ai-image.png"
params = {
    "model": "gpt-image-1",
    "prompt": PROMPT.strip(),          # text prompt
    "quality": "high",               # "high" | "medium" | "low"
    "size": "auto",               # "1024x1024" | "1536x1024" | "1024x1536" | "auto"
    "output_format": "png",            # "png" | "jpg" | "webp"
    # "output_compression": 95,        # lossy jpg/webp only (0‑100)
    "background": "opaque",            # "opaque" | "transparent"
    "input_fidelity": "high",        # extra cost for better copying
    "stream": "false",                 # "true" streams chunks
    #"user": "myCustomer",
}
url = "https://api.openai.com/v1/images/edits"

def get_auth_headers() -> dict:
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise RuntimeError("OPENAI_API_KEY env var is missing")
    return {"Authorization": f"Bearer {api_key}"}

# build multipart parts
multipart_parts = []
for p in map(Path, INPUT_PATHS):
    ext = p.suffix.lower().lstrip(".")
    mime = {
        "png":  "image/png",
        "jpg":  "image/jpeg",
        "jpeg": "image/jpeg",
        "webp": "image/webp",
    }.get(ext, "image/png")           # default to image/png
    multipart_parts.append(("image[]", (p.name, p.open("rb"), mime)))

multipart_parts.extend((k, (None, str(v))) for k, v in params.items())

try:
    with httpx.Client(timeout=240.0) as client:
        resp = client.post(url, files=multipart_parts, headers=get_auth_headers())
        resp.raise_for_status()
except httpx.HTTPStatusError as e:
    logging.error("OpenAI API error [%s]: %s", e.response.status_code, e.response.text)
    raise
except httpx.RequestError as e:
    logging.error("Request error: %s", e)
    raise

# decode image
b64 = resp.json()["data"][0]["b64_json"]
img_bytes = base64.b64decode(b64)

# find unused output name, increments name suffix to prevent overwrite
out_path = Path(OUTPUT_FILE)
out_path.parent.mkdir(parents=True, exist_ok=True)
if out_path.exists():
    stem, suffix = out_path.stem, out_path.suffix or ".png"
    idx = 2
    while (candidate := out_path.with_name(f"{stem}{idx}{suffix}")).exists():
        idx += 1
    out_path = candidate

out_path.write_bytes(img_bytes)
print(f"Image written to {out_path.resolve()}")
1 Like

Also possibly helpful (they are sometime so hard to find) Generate images with high input fidelity

Scroll to the bottom for examples like your request

2 Likes

Thank you for the response.

I’ve encountered a strange issue.

I’m trying to use the client.images.edit() method with input_fidelity, like this:

result = client.images.edit(
    model="gpt-image-1",
    image=input_img,
    prompt=prompt,
    input_fidelity="low",
    quality="medium",
    output_format="jpeg"
)

This works perfectly fine on Google Colab, and I’m using version openai==1.97.1 there.

However, when I run the exact same script on my local machine, also with openai==1.97.1, I get this error:

TypeError: Images.edit() got an unexpected keyword argument 'input_fidelity'

I’ve verified that both environments are using the same SDK version.

Anything else i have missed?

It is very easy to have multiple versions of Python libraries installed. If you do not have administrator or root rights at the time you do a pip upgrade, an upgrade will be run against your user profile directory path for site-packages. Additionally, you may be using venv, conda, or any number of container techniques to partition libraries and execute the API code, perhaps forgetting the update method to run vs the code execution environment.

The latest API SDK should know this parameter, however old versions would block before a request is made like you show. (Dang I hate that perpetually-obsolete library, where the only way to gimp your way through to the API is to actually know a JSON key in the API spec is too new, and use a special parameter to get it sent.)

Python can answer for you, the same way you are making API calls:

  1. Check the openai version
import openai
print(openai.__version__)
## output such as "1.97.0"
  1. have the library report it’s location:
print(openai.__file)

You’ll either get the Python installation or some other obvious user location.

  1. look at the path the environment is using, see the python{version}/site-packages directories that are explored to find libraries, in order:
import sys; print(sys.path)

receive a printout of the path that can be set by the system, the user, by python environment commands, etc. The first in the path will be used first.

  1. If you have multiple versions of the library, start uninstalling with the rights of the user, then the rights of the system administrator, in a terminal or shell

pip uninstall openai

Or if you just want to force the newest version on the correct location, you can do that too.

Enjoy editing.


Protip: discard OpenAI libraries, make requests yourself, live happy.

  • this forum post of mine incidentally has an SDK-free request example I shared - and since it is RESTful (or specifically for “edits”, multipart/form-data), you can also talk to an AI about code instead of weekly changes to a stupefying library that pretraining can never keep up with.

Hi @Donnie_legg

Yes, I got the same experience, so I tried to use API.
The input_fidelity parameter worked.

ENDPOINT = “https://api.openai.com/v1/images/edits