Retrieval Input_Tokens - Assistants API

iFaceTheWind · January 30, 2024, 4:52pm

I’ve been playing around with Assistants API to build a reservation system for restaurants, in which each client has it’s own thread in order to make the experience even more customized, as it remembers previous interactions.

Retrieval is enabled with a single file of ~10k tokens where the restaurant menu is included. I’m having some serious headaches to get this to production as the input tokens are way higher than expected. I can assume the input tokens from the thread context window + function calling + assistant prompt, but the fact that for every message in a thread, the assistant uses retrieval, although 90% of time it wouldn’t be necessary, it’s making it impossible for me to progress.

Is there a way to limit when the assistant uses retrieval? Tried specifying it in the prompt but it’s not working. Would embeddings api work better for this use case?

Sorry if this makes no sense, very beginner dev here

Macha · January 31, 2024, 12:49am

Hey there and welcome to the community!

Your assumption:

thread context window + function calling + assistant prompt

is very close, with one minor addition:

thread context window + function calling + assistant prompt + the entire contents of the file being retrieved

when using RAG, at least.

Perhaps my question here is: Why do you need retrieval? Retrieval is built exactly for those who need it at least 90% of the time, and in relation to other distinct bits of information. Therefore, I’m unsure if retrieval is the actual tool for the job you’re trying to do.

as it remembers previous interactions.

Does it? Keep in mind, you can match a client to a thread if you wish, but there is a cutoff in the context window, and it doesn’t “remember” in a sense, but rather “looks at the fat wad of data in front of its face and makes a reasonable conclusion about how to respond”. Any other memory or persistence would basically be your own DB.

iFaceTheWind · January 31, 2024, 11:29am

Thank you for your reply Macha!

I think I need retrieval in order to retrieve the information related with the restaurant menu with prices and allergens, although it would be used ~10% of times. Otherwise the assistant wouldn’t be able to answer questions that requires that information.

But rethinking based on your reply, a new question arises. Would be possible to solve this by adding a function call to the assistant that is based on embeddings? So when the assistant is asked about anything related with the menu, it automatically calls the “embedding_function”? This way retrieval could be removed from the assistant capabilities and the input_tokens would be drastically fewer. Am I making wrong assumptions or does this even make sense?

Thank you again for taking time to read and reply!

Macha · January 31, 2024, 8:52pm

Yeah, I would try making a custom function and see how that helps!

Now, the input tokens will be higher when it does retrieve the file, but perhaps by building a custom tool that may reduce the amount of times it is called. Give it a try!

Topic		Replies	Views
Why are my context tokens used so quickly? API api	3	2647	January 5, 2024
Assistant api using too much tokens Prompting assistants-api	0	929	January 30, 2024
Assistant API - consumes too much prompt tokens. What is the reason and how can I reduce it? API assistants , assistants-api	4	285	August 19, 2024
Assistant API token Usage - Token usage more than the whole attached file Plus prompts API assistants-api , assistants-pricing	9	2578	March 20, 2024
File retrieval in assistant uses huge amount of input tokens API assistants-api	11	2308	June 12, 2024

Retrieval Input_Tokens - Assistants API

Related topics