Problems with PDF content recognition with gpt-4o-mini (OCR)

joelomar4.stelorder · October 14, 2024, 9:25am

I’m creating an assistant capable of performing OCR on PDF documents or images in an ERP software. So far I was getting good results using function_calling and file_search to attach the documents to be analyzed to the threads. However, since last Friday (October 11, 2024) I’m having errors with PDF’s. The model makes up all the data and never “opens” these documents. With images this doesn’t happen. Does anyone know if there was any change with this model?

celia.stelorder · October 14, 2024, 9:32am

I’m having the same problem, can anyone help me? thank you!

anon10827405 · October 14, 2024, 5:58pm

I believe what you are doing is incredibly inefficient. Although I’m not positive, it seems like you are passing the PDF to be consumed and converted using the vector store.

If you are attempting to perform OCR you can usually extract the text from the PDF. If the text is baked in I would highly recommend using something purposed as an OCR, AND THEN using something like GPT to convert the unstructured data into whatever structure you’d like.

There are also some really good parsers out there that come as both open-source and API

thinktank · October 14, 2024, 7:11pm

Yes, I’ve noticed the same thing with Mini hallucinating almost everything from reading a PDF stored in a vector store to a json schema structured output.

Simple text conversion isn’t an option because of how irregular the PDF data is.

I have had mini successfully perform a similar operation—the difference is now that I have the data lookup and Structured Output as response_format in a single step. While I haven’t been able to test this, maybe having both in a single step is too much for mini.

Meanwhile, I’ve had gpt-4o-2024-08-06 be able to do both in a single step.

Topic		Replies	Views
Retriever Assistant can't read scanned pdfs? API gpt-4 , api	7	2843	July 22, 2024
Train assistant to read PDF with images API gpt-4	8	1663	July 22, 2024
Best practice scanned PDF / What model to use? API chatgpt , plugin-development , api , gpt-4-vision	3	311	February 19, 2025
OCR using API for text extraction API api	9	7181	December 18, 2024
Programatically reproduce gpt-4o file upload API gpt-4o	5	500	December 19, 2024

Problems with PDF content recognition with gpt-4o-mini (OCR)

Related topics