Optimizing Token Usage for a Questionnaire in a Vector Store Using LLM and Threads

midhuncodesverb · July 6, 2024, 3:55pm

I have a questionnaire file in a vector store uploaded as a whole (which i’m assuming is stored as a single chunk). Now i’m prompting the LLM to ask and obtain all the answers to those question from a user ( I am using Threads). However, by the time its reaching end of questionnaire, its costing a lot of tokens, 95 percent of which is input tokens. How can I optimize and reduce the number of input tokens? Would dividing the questionnaire file into smaller chunks myself and uploading them to the vector store help?

anon10827405 · July 6, 2024, 3:56pm

If you’re serious about this then you can create an API which manages the questions & answer types. Then you can use the model to interact with the API.

It’s quite straightforward. When a user makes an answer the model sends it to the endpoint, which returns a success, along with the next question.

midhuncodesverb · July 6, 2024, 4:00pm

And avoid the need to have a vector store for the questionnaire file?

anon10827405 · July 6, 2024, 4:01pm

Yes. It requires a lot more effort but if your goal is to save money & have more control then it may be a good option.

You could use a combination of the two.

icdev2dev · July 6, 2024, 4:11pm

That is a great idea!

Then you could store those answers into a user profile for use later.

midhuncodesverb · July 6, 2024, 5:43pm

But the problem with that is that questionnaire is very complex to be deployed on an API. I need an LLM to some extent. You sure there’s no other way I can reduce the input token count?

Topic		Replies	Views
Answering lots of questions from one large chunk of text without paying tokens to input the big text chunk for each question? API api	16	10715	December 24, 2023
Need advice to decrease the openAI tokens used API	3	454	August 7, 2024
How to reduce file_search token count API gpt-4 , api , assistants-api	1	617	April 29, 2024
Do Assistants use tokens to access Files (every time)? API	2	60	April 22, 2025
Are we repeatedly charged for all tokens in the context window? API	4	498	May 30, 2024

Optimizing Token Usage for a Questionnaire in a Vector Store Using LLM and Threads

Related topics