Problems with PDF content recognition with gpt-4o-mini (OCR)

I’m creating an assistant capable of performing OCR on PDF documents or images in an ERP software. So far I was getting good results using function_calling and file_search to attach the documents to be analyzed to the threads. However, since last Friday (October 11, 2024) I’m having errors with PDF’s. The model makes up all the data and never “opens” these documents. With images this doesn’t happen. Does anyone know if there was any change with this model?

2 Likes

I’m having the same problem, can anyone help me? thank you! :blush:

1 Like

I believe what you are doing is incredibly inefficient. Although I’m not positive, it seems like you are passing the PDF to be consumed and converted using the vector store.

If you are attempting to perform OCR you can usually extract the text from the PDF. If the text is baked in I would highly recommend using something purposed as an OCR, AND THEN using something like GPT to convert the unstructured data into whatever structure you’d like.

There are also some really good parsers out there that come as both open-source and API

4 Likes

Yes, I’ve noticed the same thing with Mini hallucinating almost everything from reading a PDF stored in a vector store to a json schema structured output.

Simple text conversion isn’t an option because of how irregular the PDF data is.

I have had mini successfully perform a similar operation—the difference is now that I have the data lookup and Structured Output as response_format in a single step. While I haven’t been able to test this, maybe having both in a single step is too much for mini.

Meanwhile, I’ve had gpt-4o-2024-08-06 be able to do both in a single step.

1 Like