Retriever Assistant can't read scanned pdfs?

tytung2020 · November 8, 2023, 9:11am

It seems the assistant is unable to read scanned pdf, both in the playground and the api. Is anyone else having this problem? It seems despite gpt api can comprehend images, its api still not able to do so.

alden · November 9, 2023, 2:08am

That is correct … I tried it too and got the “No text detected” message. I dont think the retriever can do Vision and OCR on uploaded documents.

I would make sense – since images are not one of the supported formats for retriever. The retriever only supports 16 types of files right now.

rn1 · November 9, 2023, 3:00am

File upload and retrieval too buggy atm, basically non-functional. Hopefully, they already working on fixes.

anon10827405 · November 9, 2023, 3:10am

PDFs are notorious for being difficult & inconsistent to read.

In the meantime you can use GPT-4V to create a more digestible format

tytung2020 · November 9, 2023, 5:30am

the model gpt-4-1106-vision-preview is not available yet for the assistant api.

anon10827405 · November 9, 2023, 5:55am

What I intend to say is do some pre-processing work on the PDF using GPT4V in ChatGpt for example

ankit_naag · July 22, 2024, 9:45am

Does anyone have a solution on this, I created an assistant and send pdf file for processing which had scanned images in it, but the file batch processing getting failed, I also tried first converting the pdf file using an online ocr tool and then upload, but got no luck with that too…

trenton.dambrowitz · July 22, 2024, 10:10am

You are likely better off using gpt-4o’s vision capabilities instead, here is the relevant documentation for sending images to the assistants API.

Topic		Replies	Views
Train assistant to read PDF with images API gpt-4	8	1919	July 22, 2024
How can I retrieve data from a PDF that was created from an image captured by a camera? API assistants-api , assistants-files	3	960	May 4, 2024
Assistant api retriever sometimes cannot read pdf API gpt-4 , api	5	1974	November 29, 2023
Assistant API cant read my PDF.. How come? API api	4	2410	July 20, 2024
Assistant API system files should not be exposed to the user + PDF file parsing is intermittently buggy Feedback api	6	554	March 25, 2024

Retriever Assistant can't read scanned pdfs?

Related topics