I have built a tool to talk to PDFs. I am using the responses api. Everything works fine. But if a PDF has too many images the API produces an error.:
[message] => An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists.
Unfortunately there are many PDFs that contain a lot of images. Logos, icons etc. that are unimportant, but also drawings and images that are important.
If I use a GPT or ChatGPT, i.e. without an API, all the files work. What can I do to get the API to accept these PDFs? I have already split PDFs into individual pages. Sometimes there are too many images on only one page and the API crashes. Does anyone have a suggestion to solve the problem?
I found a solution. Model gpt-4o doesn’t work. chatgpt-4o-latest does. Unfortunately, websearch and filesearch are not supported in chatgpt-4o-latest. Is there a model that can do both?
When OpenAI built a new tool for talking to complete PDFs, they made it an input file type that can be sent in a user message, similar to how images are attached for vision.
This is distinctly different from using vector stores and the file search tool, which just extracts searchable text from PDFs and can accept other file types.
Here’s what to know:
It extracts text and provides the text to the model.
It renders each page and sends an image of the page to the model.
For the whole document.
Thus, individual images within a page do not increase consumption.
The token cost of text + images can be large.
If you are using the prior response ID, then you continue to grow a conversation. An error is returned if the chat + newest grows too large. To discard old messages, you would change the truncation parameter.
The chatgpt-4o-latest model wasn’t even working on the endpoint yesterday. It likely isn’t implemented the same way.
So to start exploring the cause here, you need to look at the token usage report for an API call, and see how much PDF input is costing.