I am processing PDFs to markdown by taking each page as a JPEG, and feeding it into the API like this (Python):
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.0,
messages=[
{
"role": "system",
"content": parse_prompt,
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
"detail": "high",
},
},
],
},
],
)
- Total input tokens is 1495.
- Output tokens vary but are always between 400-700.
- One specific page of a particular document takes >160 seconds 20% of the time, and <9 seconds 80% of the time.
- All other pages of this particular doc take <9 seconds always.
- We notice similar behavior with other docs, where there is a particular page that takes dramatically longer on average to process.
- The output tokens aren’t increased with the longer time.
- I have verified using httpx hooks that the bottleneck is the upstream processing with OpenAI, not my code.
Why would this happen? What should we do to reduce this high latency?