OpenAI GPT-4o Image Processing: 500 Errors and Long Response Times with Larger Requests

n4hp4r · September 25, 2024, 12:53pm

I am working on an application that converts a PDF into images, which are then sent to OpenAI in a single call along with a question that needs to be answered based on those images. This setup works fine for PDFs that are about 60 pages long. However, for PDFs with around 80 pages, the OpenAI APIs frequently (but not always) return a 500 error:

{'error': {'message': 'The server had an error while processing your request. Sorry about that!', 'type': 'server_error', 'param': None, 'code': None}}

Sometimes, even when the call is successful, it takes an unusually long time. In one instance, it took 4342.31 seconds to complete.

Token count doesn’t seem to be the issue. Here are more details:

Total images: 86
Combined size: 15.8 MB
Resolutions breakdown:
1600x1200: 56 images
2133x1200: 30 images
OpenAI pricing details for 2133x1200 images:
Total tokens: 1105
Total price: $0.002763

Is there a limit to the number of images that can be sent in a single request? I couldn’t find any such limit in the documentation. I am using the GPT-4 model (gpt-4o).

Any pointers for debugging or fixing this issue would be greatly appreciated.

_j · September 25, 2024, 4:40pm

Maybe just a few tips for using vision.

The server must resize down any images that are sent where the shortest dimension would exceed 768 pixels. You can do that work yourself to not require triggering that round of server-side processing. 1600x1200 → 1024 x 768
URLs provided instead of images sent in base64 may have a server side caching. That may help with repeated calls, for example, retrying the same after failure, but consider the initial request, where a web fetcher has to successfully download 86 images. Try the alternate method.
detail:high will split images into tiles, encoding all tiles to tokens, another process to be performed. You can consider how to make your content understood with your own sectioning to 512x512 and using detail:low. The images could be interleaved with text such as “upper right of page 12”.
don’t send so much. Send plain text from your own extraction or from vision OCR requests on just single images.

n4hp4r · September 30, 2024, 3:57am

Thank you for some excellent suggestions.

We did try reducing the resolution, that didn’t seem to help much.
Sending detail as ‘low’ reduced the api response time, but need to check the accuracy.
OpenAI seems to be returning fewer 500 errors from past couple of days. The requests that were returning 500 earlier seem to be going through in about 60-70 secs.
For some of the requests, sending text is not an option as we need to detect elements visually. We have to come up with another approach such as splitting up images in two sets and combine the results later.

Topic		Replies	Views
Too many images in PDF (reponses api) creates error Bugs responses	3	165	March 19, 2025
Specific images causing 400 error for the GPT Vision API endpoint API gpt-4	1	1273	April 26, 2024
"Invalid image" error in gpt-4-vision API api , gpt-4-vision	38	15515	October 9, 2024
Increased Error Rate on GPT-4o Vision API error , gpt-4-vision , gpt4-vision	5	567	August 5, 2024
Unexpectedly Large Token Count for Image Analysis Using GPT-4 API Bugs	3	1358	May 26, 2024

OpenAI GPT-4o Image Processing: 500 Errors and Long Response Times with Larger Requests

Related topics