Can assistants with file_search access actual files

seaofarrows · November 18, 2024, 7:48pm

I have uploaded a bunch of JSON artist profiles and added them to a vector store for my assistant.

When queried about the artists, the assistant finds a good group of profiles to respond with, and does so in markdown. So far, so good.

PROBLEM
For each artist in the list, it includes a photo, taken from a specific path the JSON that the assistant has been provided. This part usually works, but sometimes doesn’t.

When it doesn’t, it seems to have hallucinated the photo URL. That is, the URL does not exist at all in the source files.

Also, the broken images are usually on the same artist, but sometimes it actually shows the correct image for them, so it’s not like that file is just ‘broken’.

I have verified that there are no failed uploads to the vector store.

THEORY
Somehow the chunking of the file during upload to the vector store cut into the URL and had some ill effect in the embedding?

QUESTIONS

If it’s chunking, a different strategy probably wouldn’t help unless it’s larger than the biggest file?
Once the model has identified a file from an embedding, can it not go back to the source of truth (the actual file) for something like a long url that needs to be correct 100% of the time?

Topic		Replies	Views
Assistant gets data from wrong vector store file API assistants-api , vector-store	1	98	April 24, 2025
Assistant file search exhibits strange behaviors API assistants-api , file-search	1	265	December 20, 2024
The Assistant API responds by including the non-existent file in the annotation Bugs api , assistants-api , assistants-files	4	202	December 3, 2024
Improving File Search specificity w/ Assistant for accurate document processing API assistants-api , file-uploads	3	1275	December 3, 2024
Assistant API - Error with files API	20	6852	October 9, 2024

Can assistants with file_search access actual files

Related topics