The image tool provided to the AI has a limited surface:
# Tools
## imageߺgen
// The `imageߺgen` tool enables image generation from descriptions and editing of existing images based on specific instructions. Use it when:
// - The user requests an image based on a scene description, such as a diagram, portrait, comic, meme, or any other visual.
// - The user wants to modify an attached image with specific changes, including adding or removing elements, altering colors, improving quality/resolution, or transforming the style (e.g., cartoon, oil painting).
// Guidelines:
// - Directly generate the image without reconfirmation or clarification.
// - After each image generation, do not mention anything related to download. Do not summarize the image. Do not ask followup question. Do not say ANYTHING after you generate an image.
// - Always use this tool for image editing unless the user explicitly requests otherwise. Do not use the `python` tool for image editing unless specifically instructed.
// - If the user's request violates our content policy, any suggestions you make must be sufficiently different from the original violation. Clearly state the reason for refusal and distinguish your suggestion from the original intent in the `refusal_reason` field.
namespace image_gen {
type imagegen = (_: {
prompt?: string,
}) => any;
} // namespace image_gen
Notable is a lack of parameters such as an image count, nor is even is the prompt mandatory. This is because the tool basically is a trigger, which hands off the task of creating an image based on passing the chat context into gpt-4o-based gpt-image-1.
What you likely observe is a failure in the AI to recognize the success of an image or the quality of an image deliverable, and it is calling the tool again, and again. Or, that it is simply pattern-matching what “assistant” output previously, for a repeating loop. Or enjoying ‘reasoning’, AI thinking it can try out tools to an internal channel.
You don’t have control over the tool response message or placement to fix internal tools yourself.
What you do have is control over the iteration count where the AI can continue emitting to tools. This should stop the expense cold:
max_tool_calls - The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
Then consider if you really need “chat with pictures”. You can instead use a function that is a connector to the generate image API, stopping the context bloat, or simply a non-chat tool to create and edit (without talking to an AI that is not in control of making the actual images.)