Process scanned pdfs through api

slimypi · December 10, 2024, 9:06pm

Hello,

I am trying to build a rag using gpt models for a digital doc library.

Some of these docs are pdf files that are scanned.

I’m trying to figure out how to go about with this using openai models especially since the text extraction is spot on when I tested a few scanned docs on chatgpt.

I’ve experienced with other models but they don’t seem to work very well (llama 3.2-vision).

Anyone knows how does gpt process the pdfs this precisely ? Can this be replicated though api?

Foxalabs · December 11, 2024, 6:17am

You can make an API call to extract the text from a document, but it needs to be done at the point you create the RAG daatabase. Once it’s created, you don’t need to do it again.

The vision ability can be invoked from the API by following this example:

https://platform.openai.com/docs/guides/vision

slimypi · December 12, 2024, 2:26am

I will try using vision. Thanks.

What I don’t understand is how does chatgpt handle a scanned pdf. Does it do the same?

Meaning when inputed a pdf it transforms it to images and understands them through vision ?

Because :

1- it does it super fast
2- it seems to keep context of the whole pdf as if it reads those multi pages as one item even for tens of pages pdf.

Or does it use some super duper extra ocr capability to process the files efficiently since my experience with vision (other models) indicates that the model is good at understanding one image but struggles when sent multiple images.

especially since there is also a limit on number of
images processed in each question (10 images) and I’ve uploaded 70 pages scanned pdf to gpt before and he was able to understand it as a whole. (It was a lease contract) .

So does it do vision 7 times for this file while keeping context ?

I’m really confused between what the docs say and what my experience shows while using gpt so I must be missing something.

Topic		Replies	Views
Scanned pdf with API and ask questions API chatgpt , api	3	325	October 15, 2024
OCR using API for text extraction API api	10	2844	December 18, 2024
GPT-4o PDF upload vs API vision API	3	6997	May 17, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3391	December 6, 2023
What is the current rag architecture of openai for pdf uploads? Community gpt-4	2	527	July 24, 2024

Process scanned pdfs through api

Related topics