Image generation: high fidelity editing

We’ve improved image generation in the API. Editing with faces, logos, and fine-grained details is now much higher fidelity with features preserved. Edit specific objects, create marketing assets with your logo, or adjust facial expressions, poses, and outfits on people. A guide on getting started: Generate images with high input fidelity.

11 Likes

Fantastic!

Would we expect this to improve “mask editing” performance? (e.g., more like a ‘hard mask’ than ‘soft’?)

No changes to masking today (although we’re working on overall improvements)!

3 Likes

I have no doubt you are :slight_smile: Sounds good, thanks for the response and keep up the great work over there.

This is only regarding the edits endpoint, and concerns the vision input.

It costs more for the “vision” part of the image input used for replication, as now described:

For GPT Image 1, we calculate the cost of an image input the same way as described above, except that we scale down the image so that the shortest side is 512px instead of 768px. The price depends on the dimensions of the image and the input fidelity.

Conventional input image charges, then:

image

When input fidelity is set to low, the base cost is 65 image tokens, and each tile costs 129 image tokens. When using high input fidelity, we add a set number of tokens based on the image’s aspect ratio in addition to the image tokens described above.

  • If your image is square, we add 4096 extra input image tokens.
  • If it is closer to portrait or landscape, we add 6144 extra tokens.

More precisely, the additional cost per image is then:

  • exactly square: $0.041
  • non-square: $0.062

(“closer to portrait” is very fishy language when it comes to billing expectations)

Thus doubling the cost of a single image input, medium quality generation at 1024x1024 in/out. Or tripling cost with two input images.

1 Like

Incredible.

If someone asked me if this was going to be possible 2 years ago, I would say “absolutely not”.

Props to the OpenAI team.