GPT-4o PDF upload vs API vision

Hi,

I’m trying to process documents via the API. For this, I convert the pdf to images and send them to the API with my prompt.

Unfortunately, for some of them this API call misses some details.
However, when I upload the PDF to ChatGPT and use the same prompt, it get’s it right.
Does anyone know how the ChatGPT interface does pdf processing vs. the API’s vision capability?

2 Likes

Hi and welcome to the Community!

Is your PDF a scan or machine readable?

1 Like

yeah you’re right. I just realized that the PDF only works in ChatGPT if it has embedded text.
Thanks!

Wait, is that right? Take this example PDF:

https://pdf.datasheetcatalog.com/datasheets/2300/45014_DS.pdf

As far as Foxit and PyMuPDF can tell, it does not have embedded text, but ChatGPT parses it perfectly. What am I missing?

5 Likes