Discrepancy between Chat and Responses API Image Generation tool output

Hi everyone,

I’m running into a big difference in image generation quality between Chat and the OpenAI Responses API (gpt-5) with ImageGeneration tool (gpt-image-1), even when using identical prompts and images.

Goal:

I’m building a workflow to generate packshot-quality images of clothing items and accessories. The input is usually a photo of a garment (either taken by a user in their closet or sourced from a website where the item is shown on a model). The desired output is a clean, e-commerce–style product photo — like what you’d see on professional retail sites.

Setup:

  • Model: gpt-5 via the Responses API

  • Tool config:

    {
      "type": "image_generation",
      "model": "gpt-image-1",
      "size": "auto",
      "quality": "high",
      "output_format": "png",
      "background": "transparent",
      "moderation": "low"
    }
    
  • Same prompt and same input image each time

  • Input image passed via content[] as

    {
      "type": "input_image",
      "image_url": "<base64img>",
      "detail": "high"
    }
    

We’re explicitly using the Responses API with the Image Generation tool to replicate the behavior of Chat as closely as possible.

Issue

When I run the prompt in Chat, the results are beautiful — sharp lighting, realistic textures, clean backgrounds, and accurate product details.

But when I run the exact same prompt and image through the API, the quality drops a lot:

  • Details about the garment are wrong

  • Lighting and shadowing are inconsistent

  • Image looks less professional

I need to process thousands of images, so I can’t rely on Chat manually — I really need API-level consistency that matches Chat’s quality. Interestingly, my colleague, who’s been generating a larger volume of images via Chat (with the same model and prompt), consistently gets better-quality outputs than I do. This makes me think it has something to do with personalization.

Questions:

  • Has anyone else noticed this difference between Chat and the Responses API results?

  • Are there hidden differences in how the Chat interface calls the image tool vs how the API would call the image tool (e.g., preprocessing, better/personalized automatic prompt expansion, system context)?

Below is an example (you can see the Chat result looks great, the API result added a zipper and got the collar shape wrong). Thanks for any insights!