Understanding the current Assistant Retrieval process

CradleToCradle · November 16, 2023, 12:45pm

From the Docs:

How it works

The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:

it either passes the file content in the prompt for short documents, or

performs a vector search for longer documents

Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.

This is kind of a mixed message here. So does it create vector-embeddings or does it add the whole content of the files to the context-window? I feel like it’s just adding the file-contents to the prompt, when I have a look at the usage of tokens in my profile.

pwmbax · November 16, 2023, 1:10pm

So this openai Fcker can see billions of lines of code , but NOT the url you feed it . WTF , this is 2023 …

udm17 · November 16, 2023, 1:41pm

It depends on what the size of the document was tbf and what “they” have defined as a large enough document to their API that it merits a vector database

CradleToCradle · November 16, 2023, 2:15pm

Right now its very expensive to use when its not doing a vector-db retrieval. I’ve got a chatbot and did just a few messages today and used over 65k tokens…

udm17 · November 17, 2023, 8:10am

Are you passing the whole document through or just somme portions of it ? It sounds very basic, but using some sentiment classification can allow you to shorten the amount that you send to the model for generation. With smaller inputs, the quality of generation should improve as well

pgaods · November 17, 2023, 6:15pm

It would be nice to allow different types of RAG framework to be included in the Assistants API moving forward for OpenAI

CradleToCradle · November 18, 2023, 12:33pm

I have two text-files with about of which each has about 3’500 Tokens. There will be a lot more than that in the future. Right now its mostly an FAQ. SO i’ve created an Assistant and uploaded the files there. So it seems like its to short for OpenAI to create embeddings for it and just pass the data of these documents to the query.

I’m thinking about chunking the documents myself and host it on a vector-db… Its such a blackbox right now and it’s hard to improve accurancy.

As far as I can tell, Assistants only allow uploading documents for GPT4. If i can’t get more control on how the data is accessed by the LLM I might even change to 3.5 Turbo 16k and send the whole content in the query myself. The reasoning of 3.5 is enough for my current task and much cheaper too.

udm17 · November 20, 2023, 5:41am

3500 tokens is not a lot tbf and it is understandable why the model might be passing them directly and not create embeddings for them. While passing the document by itself will get the job done, some studies with LLM’s have recently shown that for bigger context windows, the model tends to forget information in the middle and has greater recall for information at the beginning or the end (i have experienced this at 6k tokens myself).

Keeping this is mind, i usually prefer just making embeddings, while it might be an extra step, it will improve your accuracy for sure.

Topic		Replies	Views
New "Assistants" API a potential replacement for low level "RAG" style content generation? API	9	8590	March 4, 2024
Assistant API / costs / where do I find my token consumtions in assistants\|messages\|threads API	4	2022	December 14, 2023
How knowledge base files are handled (Assistants API) API assistants-api	14	8487	February 8, 2024
Do Assistants use tokens to access Files (every time)? API	2	64	April 22, 2025
File retrieval in assistant uses huge amount of input tokens API assistants-api	11	2819	June 12, 2024

Understanding the current Assistant Retrieval process

How it works

Related topics