High latency SOMETIMES on gpt-4o completions with specific image

I am processing PDFs to markdown by taking each page as a JPEG, and feeding it into the API like this (Python):

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.0,
    messages=[
        {
            "role": "system",
            "content": parse_prompt,
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                        "detail": "high",
                    },
                },
            ],
        },
    ],
)
  • Total input tokens is 1495.
  • Output tokens vary but are always between 400-700.
  • One specific page of a particular document takes >160 seconds 20% of the time, and <9 seconds 80% of the time.
  • All other pages of this particular doc take <9 seconds always.
  • We notice similar behavior with other docs, where there is a particular page that takes dramatically longer on average to process.
  • The output tokens aren’t increased with the longer time.
  • I have verified using httpx hooks that the bottleneck is the upstream processing with OpenAI, not my code.

Why would this happen? What should we do to reduce this high latency?