Why is token use so high when using file search?

I’ve created an assistant that uses file search to limit answers to only those that I provide in a source file. When I do this, every question I ask triggers a charge of 18,000 tokens. Why so high? I’m using the gpt-4-turbo model.

You have an option to control for this through a combination of adjusting the chunk size and the parameter file_search.max_num_results. See below for more info:

Customizing File Search settings

You can customize how the file_search tool chunks your data and how many chunks it returns to the model context.

Chunking configuration

By default, max_chunk_size_tokens is set to 800 and chunk_overlap_tokens is set to 400, meaning every file is indexed by being split up into 800-token chunks, with 400-token overlap between consecutive chunks.

You can adjust this by setting chunking_strategy when adding files to the vector store. There are certain limitations to chunking_strategy:

  • max_chunk_size_tokens must be between 100 and 4096 inclusive.
  • chunk_overlap_tokens must be non-negative and should not exceed max_chunk_size_tokens / 2`.

Number of chunks

By default, the file_search tool outputs up to 20 chunks for gpt-4* models and up to 5 chunks for gpt-3.5-turbo. You can adjust this by setting file_search.max_num_results in the tool when creating the assistant or the run.

Note that the file_search tool may output fewer than this number for a myriad of reasons:

  • The total number of chunks is fewer than max_num_results.
  • The total token size of all the retrieved chunks exceeds the token >“budget” assigned to the file_search tool. The file_search tool currently has a token bugdet of:
  • 4,000 tokens for gpt-3.5-turbo
  • 16,000 tokens for gpt-4* models

Source: https://platform.openai.com/docs/assistants/tools/file-search/customizing-file-search-settings

Thank you for that information! I see that by lowering the total number of chunks to 5, I can lower the cost down to about 4,000 tokens for a simple query. This still seems high, however, if you start running the numbers on a public website with a pre-sales bot. Any other suggestions to lower costs, or do we just have to wait for gpt-4 to become as inexpensive as gpt-3.5-turbo?

Hello, I have the same problem and I think that I know the problem, we can config a score_threshold that filter the score similarity chunk, but it doesn’t work for me, could you check it? If I put more than 0 in score_threshold, the file search doesn’t return chunks. Any solution?

A good ranker threshold is about 0.40, depending on the type of documents. You want it to reject turbocharger rebuild instructions when you are researching DNA or whatever else you might ask a chatbot (where it mistakenly uses file search). Still, you might have 1000 chunks so you always get the maximum.

On Assistants, the number of results parameter hasn’t worked for the longest time if it ever did, and it seems nobody at OpenAI cares. You always get 20.