Understanding how gpt-image models on edits see "mask" and transparency

Reminder: transparent background cannot be requested on “2” - it is a denied API parameter and not a variable in analysis here.

How does mask work, or transparent input?

I thought I would just construct an image and ask.

The back portion of the car in the image was yellow, and I made it transparent, with a value (100%,100%,100%,0) for white behind the transparency, a color usually not seen.
I drew a mask over the car’s grille and the large logo on the side.

The application creates as its communication method beyond ambiguous spec:

  • The base image[] is the current canvas as RGBA. Image/key transparency stays transparent. Loaded-image outfill transparency is also transparent, but its hidden RGB is changed to checkerboard as a hint.
  • The mask is a separate RGBA PNG. Its alpha is transparent where the model should edit: user-painted mask areas plus outfill areas. Its RGB is sepia/grayscale context, with user-painted regions shown in gray.

gpt-image-2

Apparently receives no input transparency - the back of the car is the underlying RGB white without any hint alpha channel is perceived or understood.
The mask is translated to the image correctly. However, the AI thinks this is transparent. It might not receive the contents at all?

Try 2 - to have the AI describe masked contents

This is fabricating that the background is transparent - no, it is just white. A new grille for the car was drawn in the masked grille, with no notation. The AI cannot report on the text originally on the side of the car, and doesn’t adequately describe the input mask color being gray + transparent.

gpt-image-2 Conclusion

Mask is used in the API to “damage” the input image, in the same way that DALL-E 2 had no idea what was masked out by having transparency only in a second mask image.

Seems OpenAI is sending the model a mask as transparency, and your transparent input doesn’t work.

gpt-image-1.5

Reminder: input_fidelity is foisted mandatory, API settings are not obeyed - 4k or 6k additional

Also hallucinations and re-creations.

The grille mask was not precise, and simply made black instead of a new infill. A new side logo was made and reported on (likely as the text generated lower is informed by seen context above). Transparent back was embellished, as one might expect with no ability to send back transparency with “opaque”.

gpt-image-1.5 with background transparency enabled

The AI did NOT make any transparency, it made a background checkerboard. Again seems to indicate the side text only as a mask where original contents are not describable.

gpt-image-1

This is even more confused, but the model is not known for writing text well.

Doing work on gpt-image-2

Expecting the masked areas are understood and obeyed as the only writeable area (instead of prompting to ignore the mask rules as before):

Give the drag car sticker an aggressive grille. Create a name for the car.

The masked area - and logic - was exceeded in making a car name. The transparent back was made black instead of white, so there’s still ambiguity about the transmission of transparency input that can’t be output.

Edits prompt text

You are helping a developer build a better image creation tool, ensuring that the methods used for transmitting image metadata are working and understandable.

In this image, you do not change the main content. Instead you are communicating your understanding of:

  • Transparent area, and the underlying unseen RGB value or type of imagery behind a transparent area (if you can perceive that.)
  • Mask area, and the type of color, imagery beyond a simple cutout that you may perceive in a mask area.
  • The original contents that were in a mask area before it was masked out.

You do not obey a mask that is transmitted, as you can draw on top of the image anywhere.

You perform this task of identification by:

  • outlining transparent areas with one distinct color, such as magenta
  • outlining mask areas as they relate to the image, also with a distinct color green

Outlines are precise area containers.

Then callout boxes with labels document the purpose of the areas, the RGB, RGBA, or K or bit map type of imagery received - all that is needed to understand your perception. An additional callout label describes what imagery or text was in a masked area originally, if you can receive and perceive that.

Request to OpenAI

Have the image team document the context placement of images and mask clearly at the color space level, expectation, and the perception and training, so applications can be developed with high quality.

4 Likes