How does chat GPT 'read' a pdf?

PDF are notoriously difficult to read accurately, there are so many variations in how elements can be encoded. However chatgpt does this effortlessly and quickly. How?
Is it using vision every time? Maybe - but how is it so quick on 200+ pages?
Is it using an ensemble method? Perhaps?

Does anyone know?

We don’t really know how OpenAI manages PDFs.

Complete speculation, beware:

They most likely have built, or use/modified a PDF parsing tool. One that combines both the potential text-elements of a PDF along with some OCR.

So if you have a PDF that has highlight-able text you can usually find that text data inside of the PDF on a row-by-row basis. Then, you can correlate that with the OCR results. So if the OCR says 8999123 but the text says 889123 then programmatically the text can be used to “influence” the OCR results (How can 8999123 exist?? We can use a distance test to find what string this is supposed to be)

If the text is “baked-in”, then the PDF is most likely treated exactly like an image.

It could be that they run some initial tests of classification (orientation, document type, etc) as well.

They can run these requests in parallel.