Question about image rate limits with gpt-5.3-chat when using previous_response_id

ltnew007 · March 5, 2026, 2:45pm

Hi everyone,

I ran into an image rate limit while testing gpt-5.3-chat-latest during a Twitch stream integration, and I’m trying to understand exactly how the input image per minute limit works when using previous_response_id.

Setup

I run a Python middleware that connects Twitch chat and game screenshots to the OpenAI Responses API. The bot has two main interaction types:

Text responses
- Twitch chat → LLM response
- No images included
Screenshot reactions
- A screenshot from the game is sent with a prompt
- The model comments on what it sees

Both routes use previous_response_id so the model maintains conversational context.

What I observed

When using gpt-5.3-chat-latest, I started hitting this error:

Rate limit reached for input-images per min
Limit: 10
Used: 4
Requested: 7

The confusing part is that the request that triggered the error did not include any images. It was a text-only message.

However, earlier in the conversation thread there had been several screenshot reactions.

Behavior difference between models

During the same stream I switched the model to:

gpt-5.1-chat-latest

After switching, I sent many more screenshots over time and never hit the same image rate limit.

This makes me suspect one of the following:

gpt-5.3-chat has a lower image-per-minute limit than gpt-5.1
When using previous_response_id, earlier image turns may be reprocessed as part of the context window, causing a later text-only request to count as multiple image inputs
Some other internal behavior specific to gpt-5.3’s multimodal context handling

Questions

I’m hoping someone from OpenAI or another developer familiar with this can clarify:

Are input image limits different per model, specifically between gpt-5.1 and gpt-5.3?
When using previous_response_id, can earlier images in the thread count again toward the image/minute limit if they are included in context?
Is the 10 images/min limit expected for this model, and is it temporary or likely to change?
For systems that mix text chat and occasional screenshots, is the recommended approach to:
- separate text and image interactions into different conversation threads, or
- avoid continuing screenshot threads with previous_response_id?

Context for usage

This system runs during a live stream, so stability is important. A screenshot may be sent every 90–120 seconds, but normal chat messages continue in between.

Understanding how the image rate limit interacts with conversation threading would help determine the best architecture.

Thanks!

Topic		Replies	Views
Discrepancy: omni-moderation-latest Token Usage vs Tier 1 Rate Limits (Vision) Feedback moderation , gpt-4-vision , gpt-4o	4	227	January 7, 2026
How to limit number of input images processed by image_generation tool via Responses API? API	12	415	October 17, 2025
Image Interpretation - Are token limits practical? API gpt-4	4	4171	May 10, 2024
New "input-images per min" RateLimitError Bugs	14	668	November 8, 2024
Are the vision tokens added to the tokens per request limit? API	4	447	September 16, 2024