Does image edit with masking actually work?

I am able to generate images using a mask and a source image, however areas outside of the mask bounds are getting edited. Are there extra rules I might be missing?
Inputs:


Output:

Welcome to the community.

Hrm. I’ve not tested yet… Looks like your output is a different size, though? Maybe that’s it?

Any code we can see?

Shade from 4.1

1 Like

Well, seeing code does help us help you. Are you using the new gpt-image-1 model? You didn’t really give us many details to go on.

Have you tried the sample on the API page? Tried simplifying? The prompt at least? Might be a problem there… but with little details other than “not working” it’s hard, so I guessed to help you out.

Sorry if it wasn’t helpful!

Maybe… maybe not? About to dive into editing images on my own tonight, so…

Good luck!

1 Like
  editOpenAIImage = async (input: OpenAIEditImagePayload): Promise<EditImageResult> => {
    const {ai: {openai}, assets: {uploadFile, getAssetContents, urlFor, getAssetsById}} = this

    // Validate inputs
    if (!input.images || input.images.length === 0) {
      throw new Error('No images provided')
    }

    if (!input.prompt) {
      throw new Error('Edit prompt is required')
    }

    try {
      // Get images
      const imageAssets = await getAssetsById({ids: input.images.filter(id => !id.startsWith('http'))})
      const imageUrls = input.images.filter(id => id.startsWith('http'))

      // Get primary image buffer directly
      const primaryImageBuffer = imageAssets[0]
                                 ? (await getAssetContents(imageAssets[0])).buffer
                                 : (await axios.get(imageUrls[0], {responseType: 'arraybuffer'})).data

    // Create File object for OpenAI SDK
      const primaryImageFile = new File(
        [primaryImageBuffer],
        'image.png',
        {type: 'image/png'}
      )

      // Get additional reference images if any
      const additionalImages = await Promise.all([
        ...imageAssets.slice(1).map(async asset => {
          const {buffer} = await getAssetContents(asset)
          return new File([buffer], 'image.png', {type: 'image/png'})
        }),
        ...imageUrls.slice(1).map(async url => {
          const response = await axios.get(url, {responseType: 'arraybuffer'})
          return new File([response.data], 'image.png', {type: 'image/png'})
        })
      ])

      // Get mask if provided
      let maskFile: File | undefined
      if (input.mask) {
        // Assume mask is an asset ID
        const maskAsset = await getAssetsById({ids: [input.mask]})
        if (maskAsset[0]) {
          const {buffer} = await getAssetContents(maskAsset[0])
          maskFile = new File([buffer], 'mask.png', {type: 'image/png'})
        }
      }

      // Call OpenAI edit endpoint
      const result = await openai.images.edit({
        model: 'gpt-image-1',
        prompt: input.prompt,
        image: [primaryImageFile, ...additionalImages],
        mask: maskFile,
        n: input.n,
        size: (input.size ?? 'auto') as any, // OpenAI type is wrong here - should be '1024x1024' | '1536x1024' | '1024x1536' | 'auto'
        quality: input.quality ?? 'low'
      })

      // Process and save each edited image
      const assets = await Promise.all(result.data.map(async (image, i) => {
        const buffer = Buffer.from(image.b64_json!, 'base64')

        const asset = await uploadFile(
          {
            buffer,
            mimetype: 'image/png',
            size: buffer.length
          },
          {
            name: `${input.caption || `edited-${i + 1}`}.png`,
            title: input.caption || `Edited: ${input.prompt.slice(0, 100)}`,
            type: 'image',
            assetExtra: {
              metadata: {
                _openai: {
                  prompt: input.prompt,
                  model: 'gpt-image-1',
                  editType: 'edit',
                  originalImages: input.images,
                  size: input.size,
                  quality: input.quality
                }
              }
            }
          }
        )

        return {asset, url: await urlFor(asset), attempt: i}
      }))

      // Return in expected format
      return [
        {
          images: assets,
          report: {
            model: 'gpt-image-1' as const,
            score: 1, // OpenAI doesn't provide quality scores
            notes: `Edited using OpenAI GPT-Image model with prompt: "${input.prompt}"`
          }
        }
      ]

    } catch (error) {
      this.logError('OpenAI image edit error:', error)
      return [{error}]
    }
  }

Knock yourself out.

I’m busy at the moment, but I hope someone better comes along to help you!

Again, good luck and happy coding! :slight_smile:

2 Likes

In ChatGPT, it certainly doesn’t “work” to keep edits inside a pixel-based perimeter, and what goes in to gpt-4o cannot be overlaid with gpt-4o output without significant shift of it essentially being a re-telling. I’m guessing that at most “mask” is a “color here” hint with the technology.

1 Like

So are you saying “should” is a weasel word here?

mask
An additional image whose fully transparent areas (e.g. where alpha is zero) indicate where image should be edited. 

These docs claim otherwise:

Have you tried using the model to generate a more accurate mask of just the dress? Might get better results, I’m thinking, for consistency of the rest of the character…

It is completely normal to share relevant code so that issues can be analysed.

Are you kidding me??

This is a completely unacceptable response to someone who welcomed you and reached out to help you.

5 Likes

Oh a swamp. Can I get in? I love that. Let’s insult eachother. DM me.

1 Like

Yeah, good luck. I am sure you’ll find it.

Imagen is superior with mask edits, but there’s a lot of vagary in the various image APIs out there. It’d be helpful if OpenAI could better explain the limitations/expectations of the various common use cases (VTON, style transfer, color-up, character swap, background modification, feature editing, etc). Their mask example works fine but the setup of the shot lends itself to easy to fill in details for consistency.

Yeah, cool that means you go there and don’t disturb us here. Looking at your code makes me shiver. I’d say you wouldn’t even be able to understand it. So why should we waste our time.

I mean you can always appologize…

Makes you shiver? Please do explain. Who are you who are so wise in the ways of science?

I am very sure you won’t get any explanations here.

I was just giving you a portion of your own medicine. You come into a forum and start insulting a mod. You are not the brightest candle on the cake for sure.

1 Like