There are particular PDF files which I can upload (either via the API or via the ‘dashboard’ on my OpenAI account) but can not add to a vector store (neither via the API nor via the dashboard). The files are listed in the vector store with status ‘failed’ on the dashboard. I can see no obvious reason: the files are not too large and they have a genuine PDF format. Other PDF files give no problems.
I attach a screenshot of a part of a PDF file which I can not add to a vector store.
Looking into the problematic PDF files, I noticed they are all scans of documents. In all cases, the scan quality is not so good, or the scan contains tables and pictures, or there is another motive which makes it plausible that it is difficult for a computer to read and interpret the scan.
Which brings me to suspect that the readability of a PDF is checked when it is added to a vector store and when the readability is found to be too poor, the addition fails.
Is this indeed the case? I can not find anything in the documentation on this and could be totally wrong.
My apologies if this question is very naive.