High latency SOMETIMES on gpt-4o completions with specific image

james20 · August 6, 2025, 3:21am

I am processing PDFs to markdown by taking each page as a JPEG, and feeding it into the API like this (Python):

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.0,
    messages=[
        {
            "role": "system",
            "content": parse_prompt,
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                        "detail": "high",
                    },
                },
            ],
        },
    ],
)

Total input tokens is 1495.
Output tokens vary but are always between 400-700.
One specific page of a particular document takes >160 seconds 20% of the time, and <9 seconds 80% of the time.
All other pages of this particular doc take <9 seconds always.
We notice similar behavior with other docs, where there is a particular page that takes dramatically longer on average to process.
The output tokens aren’t increased with the longer time.
I have verified using httpx hooks that the bottleneck is the upstream processing with OpenAI, not my code.

Why would this happen? What should we do to reduce this high latency?

Topic		Replies	Views
OpenAI GPT-4o Image Processing: 500 Errors and Long Response Times with Larger Requests API gpt-4 , api-vision	2	647	September 30, 2024
Completion API performances (response time) API	1	120	November 8, 2025
OpenAI API Vision model response time unstable and sometimes slow API gpt-4 , api-vision	0	104	March 19, 2025
GPT-Image-1 Model – Experiencing Delayed API Responses Feedback gpt-4 , chatgpt , api	3	323	July 23, 2025
Why is there a delay in response when using an API that includes images? API gpt-4	0	199	May 27, 2024

High latency SOMETIMES on gpt-4o completions with specific image

Related topics