Persisting reference image

In my agentic application, there will be some reference images that I will pass as arguments to custom image generation tools (python functions), in turn calling the Responses API to generate images with image input. The same images will be used in the prompts repeatedly.

    response = client.responses.create(
        model="gpt-5",
        input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": prompt},
                {
                    "type": "input_image",
                    "file_id": reference_image_file_id,
                },
            ],
        }
    ],
        tools=[{"type": "image_generation", 
                "quality": "high"}],
    )

Obviously passing images as prompts consume tokens. Is there any magical way to persist these images, without passing them explicitly every time?

Your application: having reference images that are passed as vision for an AI model to comprehend in every API and be passed in every use, then those image are used as a whole in the separate gpt-4o image model’s understanding your double-billing.

Write your own function that can call the edits endpoint. Give a list of images that would be useful to the chat AI, and put those as an array the AI can fill with strings or ID numbers. Make a multiline tool description of 1000 characters of all the files and behavior needed in calling.

Then, the AI only reads the text, and only the correct images are seen by the edits endpoint.

Sorry I don’t understand your answer, was it sarcasm? I have thought about passing an LLM generated description of the reference images in words, but it does not really work.

Let me clarify my overly-terse advice:

You want a chat-based flow where the model talks through ideas, selects from your reference images, and then an image‑creation model uses just those references to produce the output. The good news: the Image Edits API can now handle that natively on a single call basis, no need to chat about it.

The catch with OpenAI’s Responses + internal generate_image tool is that it forwards the entire chat history (including previous images and tasks) to the image model. That’s noisy, can rack up costs, and invites an “interloper” into your otherwise clean chat flow. The chat AI vision keeps on billing you for the images you place in the conversation history that grows.

The middle path I recommend: wire up your own lightweight function interface to the Image Edits endpoint. The chat AI can receive only a text description of the possible reference images, and their index number to be used.

Let the chat model produce a structured function call that includes which reference image IDs to use and the edit/creation instructions to be sent as prompt, along with how that prompt is best structured to achieve the needed results. Your code then matches those IDs to the actual files and sends only those images—plus the prompt—to the Edits API.

Net result: the chat model reads and writes just text, the edits endpoint sees only the exact reference images you selected, you avoid paying twice for context, and the image model stays laser‑focused on the right visuals.

1 Like

No worries, thanks!

I suspect there are some nuances to your answer I have not understood yet, so I need to read up on the Edits endpoint.