I’ve observed that generating an image using the GPT-4 Image 1 model (with text overlays and small icons or illustrations) costs about $1.6 and uses around 3300 tokens for 4 images. While reasonable for limited use, this adds up quickly at scale.
I’m considering switching to image editing instead of full generation — making small changes like updating text or swapping icons on a base image.
My main questions:
Is image editing cheaper than full generation?
Does editing consume fewer tokens (especially for minor changes)?
Any tips to optimize cost when generating or editing multiple image variations?
Your described costs are a bit higher than my usage experience.
One thing to notice is that you are billed by inputs, so if you use lots of images as input base to generate an image, it will cost more than a text prompted image.
In this sense, editing should cost more than a text prompted one because there is no real mask, it is a “soft mask” that will generate a whole new image based on the inputted one, it only tells the model where to apply changes. So, it doesn’t save costs and if you pay close attention the rest of the image is also slightly changed too.
One “tip” is maybe that if you don’t need a more refined approach with a long conversation, using the images endpoint will only charge for image generation, while responses API will also charge you for “vision” as it internally converts the image into a description for better interaction (notice it gives a description there).
The best is to make a few controlled batches of experiments, and keeping track of usage and costs. If the difference is negligible, perhaps responses API makes sense, but if it gets a problem, images API may save you a few bucks.
In the docs there is a more detailed guide explaining the differences and advantages for each one, but basically:
Choosing the right API
If you only need to generate or edit a single image from one prompt, the Image API is your best choice.
If you want to build conversational, editable image experiences with GPT Image or display partial images during generation, go with the Responses API.
I am using GPT image -1 model. But it quite expensive than what mentioned on their official API price distribution page. I have already consumed $2.28 dollar just for 7 images, that’s around $0.32 per image. So even if I’ generate 1000 images, it’ll cost me around $320. isn’t it costly. The total token consumption is about 5100 input token and 1300 output. I am not using image to image…but still I can see image to image in one of my generations. How?
Sure…import os
import base64
from openai import OpenAI
api_key = os.getenv(“OPENAI_API_KEY”)
if not api_key:
raise ValueError(“Please set the OPENAI_API_KEY environment variable.”)
client = OpenAI(api_key=api_key)
prompt = “”" My prompt is so detailed. It request about image has to be sleek, simple relevant illustrations, text overlay as given etc.. (Sorry that I can’t share the exact prompt)
“”"
try:
# Generate the image
result = client.images.generate(
model=“gpt-image-1”,
prompt=prompt,
size=“1024x1024”
)
image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)
with open("output.png", "wb") as f:
f.write(image_bytes)
print("Image successfully saved as 'output.png'.")
except Exception as e:
print(f"An error occurred: {e}")
The prompt seems pretty normal, no image inputs, just text.
Yes.
Well, unless your description is absurdly huge it shouldn’t affect costs much.
I assume you are not specifying quality, so it defaults to high (best quality).
Thinking a little better, it might explain. A landscape size is near double the size of a 1024x1024 image. But it is strange, the max resolution for gpt-image-1 is 1536x1024.
Right now, I have only one prompt that has detailed description. I also also ChatGPT for a little calculation about the cost. here is what chatgpt said " * Input tokens cost $10 per 1,000,000 tokens
This is for you, and I’ve been cranking away on some background info for a while, but I figured it would serve well as its own post - tracking when OpenAI feels like documenting what remains undocumented.