I am working on an application that converts a PDF into images, which are then sent to OpenAI in a single call along with a question that needs to be answered based on those images. This setup works fine for PDFs that are about 60 pages long. However, for PDFs with around 80 pages, the OpenAI APIs frequently (but not always) return a 500 error:
{'error': {'message': 'The server had an error while processing your request. Sorry about that!', 'type': 'server_error', 'param': None, 'code': None}}
Sometimes, even when the call is successful, it takes an unusually long time. In one instance, it took 4342.31 seconds to complete.
Token count doesn’t seem to be the issue. Here are more details:
OpenAI pricing details for 2133x1200 images:
Total tokens: 1105
Total price: $0.002763
Is there a limit to the number of images that can be sent in a single request? I couldn’t find any such limit in the documentation. I am using the GPT-4 model (gpt-4o).
Any pointers for debugging or fixing this issue would be greatly appreciated.
The server must resize down any images that are sent where the shortest dimension would exceed 768 pixels. You can do that work yourself to not require triggering that round of server-side processing. 1600x1200 → 1024 x 768
URLs provided instead of images sent in base64 may have a server side caching. That may help with repeated calls, for example, retrying the same after failure, but consider the initial request, where a web fetcher has to successfully download 86 images. Try the alternate method.
detail:high will split images into tiles, encoding all tiles to tokens, another process to be performed. You can consider how to make your content understood with your own sectioning to 512x512 and using detail:low. The images could be interleaved with text such as “upper right of page 12”.
don’t send so much. Send plain text from your own extraction or from vision OCR requests on just single images.
We did try reducing the resolution, that didn’t seem to help much.
Sending detail as ‘low’ reduced the api response time, but need to check the accuracy.
OpenAI seems to be returning fewer 500 errors from past couple of days. The requests that were returning 500 earlier seem to be going through in about 60-70 secs.
For some of the requests, sending text is not an option as we need to detect elements visually. We have to come up with another approach such as splitting up images in two sets and combine the results later.