How to Achieve Multi-turn Image Editing When Manually Managing Conversation State?

  • Currently, OpenAI provides two approaches for managing conversation state: “Manually manage conversation state” and “OpenAI APIs for conversation state” However, when using the image_generation(gpt-image-1) tool in a multi-turn conversation, only the previous_response_id method (i.e., the API-managed approach) is supported.

  • This creates a conflict because, in the manual state management approach, assistant messages do not support file attachments—making it impossible to reference previously generated images within continued dialogue when using manual state management.

  • Can only the generated image be placed in the user message (content: [{ type: "input_text", text: content }, { type: "input_image", image_url: "``https://xxx.png``"}])?

I would appreciate any kind reply, best regards!

3 Likes

You still must use the store:true server-side persistence. There is no true sending of inputs as generated outputs, just as with audio on chat completions needing an ID but having a short timeout instead of no storage at all offered if you don’t pick “essentially forever if you don’t clean IDs”.

Developers are untrustworthy, and will abuse and train the model on assistant multishot to make bad outputs, implies OpenAI. Even though image generation is a different AI that takes over in an undocumented context-consuming way.

You can see how to pass back the ID you previously received here, “using image ID”.

https://platform.openai.com/docs/guides/tools-image-generation#multi-turn-editing

1 Like

Thank you for your reply.

According to the example in the documentation using the image ID, an error will be reported: 400 Item 'ig_xxxxxx' of type 'image_generation_call' was provided without its required 'reasoning' item: 'rs_xxxxxx'.

import OpenAI from "openai";
const openai = new OpenAI();

const response = await openai.responses.create({
  model: "gpt-5",
  input:
    "Generate an image of gray tabby cat hugging an otter with an orange scarf",
  tools: [{ type: "image_generation" }],
});

const imageGenerationCalls = response.output.filter(
  (output) => output.type === "image_generation_call"
);

const imageData = imageGenerationCalls.map((output) => output.result);

if (imageData.length > 0) {
  const imageBase64 = imageData[0];
  const fs = await import("fs");
  fs.writeFileSync("cat_and_otter.png", Buffer.from(imageBase64, "base64"));
}

// Follow up

const response_fwup = await openai.responses.create({
  model: "gpt-5",
  input: [
    {
      role: "user",
      content: [{ type: "input_text", text: "Now make it look realistic" }],
    },
    {
      type: "image_generation_call",
      id: imageGenerationCalls[0].id,
    },
  ],
  tools: [{ type: "image_generation" }],
});

const imageData_fwup = response_fwup.output
  .filter((output) => output.type === "image_generation_call")
  .map((output) => output.result);

if (imageData_fwup.length > 0) {
  const imageBase64 = imageData_fwup[0];
  const fs = await import("fs");
  fs.writeFileSync(
    "cat_and_otter_realistic.png",
    Buffer.from(imageBase64, "base64")
  );
}

It needs to be modified as follows (carrying both the image_generation_call ID, reasoning ID and message ID):

// Follow up

const response_fwup = await openai.responses.create({
  model: "gpt-5",
  input: [
    {
      role: "user",
      content: [{ type: "input_text", text: "Now make it look realistic" }],
    },
    [
      {
        id: "rs_xxxx", // reasoning ID
        type: "reasoning",
        summary: [],
      },
      {
        id: "ig_xxxx", // image generation ID
        type: "image_generation_call",
      },
      {
        id: "msg_xxxx", // message ID
        type: "message",
        status: "completed",
        content: [
          {
            type: "output_text",
            annotations: [],
            logprobs: [],
            text: "",
          },
        ],
        role: "assistant",
      },
    ],
  ],
  tools: [
    {
      type: "image_generation",
      quality: "medium",
      size: "1024x1024",
      output_format: "jpeg",
    },
  ],
});

Additionally, I have a question: We currently do not have Zero Data Retention enabled. How long will these three IDs be stored by OpenAI (1 hour, at least 30 days, permanently, or something else)? Thank you.

3 Likes

If you do not send “store”:false, you are implying “store those response IDs for me, like I’d want to chat later reusing response_id as a chat history mechanism.”

While the documentation says “30 days default”, that is wording that is not being delivered on. (checks again) Nothing has EVER been expired from the responses log that I didn’t systematically delete myself.

You will need to record the response ids and purge them yourself after reuse of images is no longer required. You likely already have tons of orphans without an ID you’ve tracked, and without any listing method, if you’ve simply been making API calls to Responses without thought about this, no way to automatically purge programmatically.


Here’s a tedious hack for OpenAI persisting unwanted collection forever unexpectedly by a default parameter that is meant to scarf and deny you full management access to your data. A cleanup “goodbye kiss” to this endpoint after you make a function or application of your own, for calling the edits endpoint directly for image creation, with known expense, and refuse any of the internal tools.

1 Like

+1 to this question. I’ve tried adapting the multi-turn / using image ID example from the docs to accommodate store=False but have not been successful.

This snippet (adapted from the docs example) works with store unspecified:

import openai
import base64

response = openai.responses.create(
    model="gpt-5.2",
    input="Draw a random short word in green font.",
    tools=[{"type": "image_generation"}],
)

image_generation_calls = [
    output
    for output in response.output
    if output.type == "image_generation_call"
]

# Follow up

response_fwup = openai.responses.create(
    model="gpt-5.2",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": (
                        "Now, change the font to blue. Keep the word "
                        "and everything else the same."
                    ),
                }
            ],
        },
        {
            "type": "image_generation_call",
            "id": image_generation_calls[0].id,
        },
    ],
    tools=[{"type": "image_generation"}],
)

if I try adding

store=False,
include=["reasoning.encrypted_content"],

to my invocations we get

NotFoundError: Error code: 404 - {'error': {'message': "Item with id 'ig_...' not found. Items are not persisted when `store` is set to false. Try again with `store` set to true, or remove this item from your input.", 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

as expected, because there’s no image to reference.

How should we pass in the image data? I tried changing the input item to

{
    "type": "image_generation_call",
    "id": image_generation_calls[0].id,
    "result": image_generation_calls[0].result,
    "status": image_generation_calls[0].status,
},

which appears to be a valid input item according to their SDK types, but I get the same error here.

Are there any recommendations about what should be done here? Would we format the base64 data into an input_image and put that in a user message? How do we coherently represent the model’s response in the input items in subsequent conversational turns?