PDFs will fail to upload to a vector store with no apparent reason

polonuim210 · June 30, 2024, 7:49pm

My PDFs are OCRd and don’t have any sort of password protection on them.

However, ~1/5 of them fail without reason when uploading to a vector store. I just tried to build a store with just one of problem documents, and the error message is simply ‘An Interval Server Error Occurred’ and the store just exists with the failed file inside.

I have absolutely no idea how to solve this or how to even begin to understand why they are failing. I’ve tried saving as a different format and saving back to a PDF, this didn’t work either.

The scarier thing is that some of these files actually used to work last week in other vector stores, and now they arent… this is very frustrating and is putting my project in critical condition.

Thanks in advance!

_j · June 30, 2024, 8:16pm

If the PDF has been converted to have inclusion of searchable text, you can extract with code and libraries yourself. This can be a programmatic task just like OpenAI does poorly, or can be plain text files where you optimize the text for AI understanding and manage how it might be chunked by the vector store (even placing small snippets of knowledge under the chunk size into individual files).

Uploading to code interpreter, retrieval, or now vector storage has always been a minefield of nonsense since this endpoint was added. “Structured data” refused if inspected text contents look like a CSV or JSON, JSON rejected unless converted to a non-validating form, PDFs plain ignored without warning that they are image-based, files refused for no apparent reason…

polonuim210 · June 30, 2024, 9:15pm

I actually solved it. I was using the highest chunk size (4096 and 2048) and it was causing ~20% of them to fail (no pattern to it that I could see).

I split the chunks in half, and now all 3000 files are working fine.

Topic		Replies	Views
Failed to upload file to a vector store API assistants-api	7	1439	October 16, 2024
Readability check when PDF is added to vector store? Feedback assistants-api	0	51	August 24, 2024
Open ai Assistants, failed status in vector store Documentation assistants-api	1	148	August 14, 2024
Unable to upload .pdf files via SDK or API API	0	385	April 19, 2024
Issue with Uploading Arabic PDFs to File_Search Tool in Assistant API API assistants-api , vector-store , file-search	0	22	October 22, 2024

PDFs will fail to upload to a vector store with no apparent reason

Related Topics