I’ve created an assistant that uses file search to limit answers to only those that I provide in a source file. When I do this, every question I ask triggers a charge of 18,000 tokens. Why so high? I’m using the gpt-4-turbo model.
You have an option to control for this through a combination of adjusting the chunk size and the parameter file_search.max_num_results. See below for more info:
Customizing File Search settings
You can customize how the
file_search
tool chunks your data and how many chunks it returns to the model context.Chunking configuration
By default,
max_chunk_size_tokens
is set to800
andchunk_overlap_tokens
is set to400
, meaning every file is indexed by being split up into 800-token chunks, with 400-token overlap between consecutive chunks.You can adjust this by setting chunking_strategy when adding files to the vector store. There are certain limitations to
chunking_strategy
:
max_chunk_size_tokens
must be between 100 and 4096 inclusive.chunk_overlap_tokens
must be non-negative and should not exceed max_chunk_size_tokens / 2`.Number of chunks
By default, the
file_search
tool outputs up to 20 chunks forgpt-4*
models and up to 5 chunks forgpt-3.5-turbo
. You can adjust this by setting file_search.max_num_results in the tool when creating the assistant or the run.Note that the
file_search
tool may output fewer than this number for a myriad of reasons:
- The total number of chunks is fewer than
max_num_results
.- The total token size of all the retrieved chunks exceeds the token >“budget” assigned to the
file_search
tool. Thefile_search
tool currently has a token bugdet of:- 4,000 tokens for
gpt-3.5-turbo
- 16,000 tokens for
gpt-4*
models
Source: https://platform.openai.com/docs/assistants/tools/file-search/customizing-file-search-settings
Thank you for that information! I see that by lowering the total number of chunks to 5, I can lower the cost down to about 4,000 tokens for a simple query. This still seems high, however, if you start running the numbers on a public website with a pre-sales bot. Any other suggestions to lower costs, or do we just have to wait for gpt-4 to become as inexpensive as gpt-3.5-turbo?
Hello, I have the same problem and I think that I know the problem, we can config a score_threshold that filter the score similarity chunk, but it doesn’t work for me, could you check it? If I put more than 0 in score_threshold, the file search doesn’t return chunks. Any solution?
A good ranker threshold is about 0.40, depending on the type of documents. You want it to reject turbocharger rebuild instructions when you are researching DNA or whatever else you might ask a chatbot (where it mistakenly uses file search). Still, you might have 1000 chunks so you always get the maximum.
On Assistants, the number of results parameter hasn’t worked for the longest time if it ever did, and it seems nobody at OpenAI cares. You always get 20.