It seems the assistant is unable to read scanned pdf, both in the playground and the api. Is anyone else having this problem? It seems despite gpt api can comprehend images, its api still not able to do so.
That is correct … I tried it too and got the “No text detected” message. I dont think the retriever can do Vision and OCR on uploaded documents.
I would make sense – since images are not one of the supported formats for retriever. The retriever only supports 16 types of files right now.
File upload and retrieval too buggy atm, basically non-functional. Hopefully, they already working on fixes.
PDFs are notorious for being difficult & inconsistent to read.
In the meantime you can use GPT-4V to create a more digestible format
the model gpt-4-1106-vision-preview is not available yet for the assistant api.
What I intend to say is do some pre-processing work on the PDF using GPT4V in ChatGpt for example
Does anyone have a solution on this, I created an assistant and send pdf file for processing which had scanned images in it, but the file batch processing getting failed, I also tried first converting the pdf file using an online ocr tool and then upload, but got no luck with that too…
You are likely better off using gpt-4o’s vision capabilities instead, here is the relevant documentation for sending images to the assistants API.