The OpenAI console Assistant does not use or find some of the files uploaded in its file search zone

I created an assistant in the OpenAI console and then uploaded several (but not too many) PDF files to its file search section in the vector database. My issue is that the assistant often fails to find information in the uploaded files and responds with generic knowledge, ignoring the files.

  • The PDFs are well-organized and clearly separated, with the intention that the assistant will use them.
  • In the assistant’s description, I specified that it should use both its knowledge and the uploaded files.
  • I’ve also tried giving exact instructions for it to use a specific file (e.g., ‘If asked about excursions, use your document: “excursions.pdf” from your memory’).

These attempts have given partial results but with a significant margin of error, where changes to the assistant’s prompt for other topics cause it to stop working or stop searching.

Does anyone know how to correctly set up the prompt description so that the assistant consistently searches within specific uploaded documents in its embedding? (All documents are pre-uploaded from the console.)

1 Like

Use a smarter model?

The root of the problem is the AI doesn’t know what information it is trained on, and doesn’t know (and assistants doesn’t provide) what is in the documents to be found. The AI is ultimately a word predictor, it doesn’t know what it knows until a sentence is produced that is factual or bogus, and one of the “words” it must predict is whether to use a tool instead of responding.

You can fill in that understanding, including something like:

  • myfiles_browser tool has documents from the developer that are part of your skill and purpose
  • you have no pretraining to answer anything about widgets, inc. or company products
  • myfiles_browser has files about widgets, inc. products, policies, knowledge database, company info, that must be searched before any answering.
  • you must give no information about widgets, inc. that is not a citation directly from a document section returned by myfiles_browser. If the search and a query retry didn’t return any results, you don’t know.
  • for any question beyond the simplest, you must invoke a document search and synthesize from any relevant information found from your query.
3 Likes

Ok, I’ll try it, but one of my questions is: isn’t the embedding and semantic search (using the user’s request sentence) done before the phrase reaches the assistant? For example:

example: >user request >transformation of request to tokens >semantic search in the OpenAI files vector database >injection of the result of semantic search so the assistant responds using the information found.

  1. Something important for me to know would be for someone to confirm the flow that happens when I have files loaded in the assistant’s vector database in the OpenAI console instead of having my own vector database on my server. Is the order I listed above correct? Thank you.
  2. Is “myfiles_browser” an actual keyword for the assistant to know it should search its vector database in the console, or was it just an example keyword?

Thanks!

3 Likes

The file search is only a name for you as user. myfiles_browser is a tool the assistant can emit a query to. The name is from when retrieval had more methods to actually go exploring documents, making even more repetitive tool calls. Then a return result of chunks that rank top in similarity to the emitted language is returned to a thread after the AI output to the tool, using embeddings. Then the AI is called again.

Not retrieval-augmented-generation. More like a web search where the 20 results are 800 tokens of text.

In my use case, it was more efficient to limit it to 2 chunks of 4000 tokens + 2000 tokens of overlap. Can you suggest another way to save costs and respond with quality? My assistant is 4-turbo, using file_search via assistant.

1 Like

The other big cost is the length of the conversation being maintained itself.

You can use run API parameter truncation_strategy, and pass the maximum number of chat turns from a thread to employ as its value.

This has no logic as sophisticated as “maximum number of tokens I want to load up”. The parameter just shortens the chat length to forget older things at a specific point, with no further technique.