GPT-4o PDF upload vs API vision

diveflo · May 16, 2024, 8:07am

Hi,

I’m trying to process documents via the API. For this, I convert the pdf to images and send them to the API with my prompt.

Unfortunately, for some of them this API call misses some details.
However, when I upload the PDF to ChatGPT and use the same prompt, it get’s it right.
Does anyone know how the ChatGPT interface does pdf processing vs. the API’s vision capability?

jr.2509 · May 16, 2024, 8:16am

Hi and welcome to the Community!

Is your PDF a scan or machine readable?

diveflo · May 17, 2024, 7:45am

yeah you’re right. I just realized that the PDF only works in ChatGPT if it has embedded text.
Thanks!

russell4 · May 17, 2024, 8:50pm

Wait, is that right? Take this example PDF:

https://pdf.datasheetcatalog.com/datasheets/2300/45014_DS.pdf

As far as Foxit and PyMuPDF can tell, it does not have embedded text, but ChatGPT parses it perfectly. What am I missing?

Topic		Replies	Views
Process scanned pdfs through api API gpt-4 , chatgpt , api , pdf , ocr	2	910	December 12, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3831	December 6, 2023
What is the exact method used by ChatGPT 4o to read PDFs? API pdf	0	499	January 31, 2025
What are the limitations of GPT-4 in analyzing PDF text? Prompting gpt-4	6	30768	March 12, 2024
Retriever Assistant can't read scanned pdfs? API gpt-4 , api	7	2953	July 22, 2024

GPT-4o PDF upload vs API vision

Related topics