Why is token use so high when using file search?

waynewil1775 · June 11, 2024, 11:48am

I’ve created an assistant that uses file search to limit answers to only those that I provide in a source file. When I do this, every question I ask triggers a charge of 18,000 tokens. Why so high? I’m using the gpt-4-turbo model.

jr.2509 · June 11, 2024, 11:57am

You have an option to control for this through a combination of adjusting the chunk size and the parameter file_search.max_num_results. See below for more info:

Customizing File Search settings

You can customize how the file_search tool chunks your data and how many chunks it returns to the model context.

Chunking configuration

By default, max_chunk_size_tokens is set to 800 and chunk_overlap_tokens is set to 400, meaning every file is indexed by being split up into 800-token chunks, with 400-token overlap between consecutive chunks.

You can adjust this by setting chunking_strategy when adding files to the vector store. There are certain limitations to chunking_strategy:

max_chunk_size_tokens must be between 100 and 4096 inclusive.

chunk_overlap_tokens must be non-negative and should not exceed max_chunk_size_tokens / 2`.

Number of chunks

By default, the file_search tool outputs up to 20 chunks for gpt-4* models and up to 5 chunks for gpt-3.5-turbo. You can adjust this by setting file_search.max_num_results in the tool when creating the assistant or the run.

Note that the file_search tool may output fewer than this number for a myriad of reasons:

The total number of chunks is fewer than max_num_results.

The total token size of all the retrieved chunks exceeds the token >“budget” assigned to the file_search tool. The file_search tool currently has a token bugdet of:

4,000 tokens for gpt-3.5-turbo

16,000 tokens for gpt-4* models

Source: https://platform.openai.com/docs/assistants/tools/file-search/customizing-file-search-settings

waynewil1775 · June 11, 2024, 12:24pm

Thank you for that information! I see that by lowering the total number of chunks to 5, I can lower the cost down to about 4,000 tokens for a simple query. This still seems high, however, if you start running the numbers on a public website with a pre-sales bot. Any other suggestions to lower costs, or do we just have to wait for gpt-4 to become as inexpensive as gpt-3.5-turbo?

enrmarti2 · May 7, 2025, 4:18pm

Hello, I have the same problem and I think that I know the problem, we can config a score_threshold that filter the score similarity chunk, but it doesn’t work for me, could you check it? If I put more than 0 in score_threshold, the file search doesn’t return chunks. Any solution?

_j · May 7, 2025, 4:49pm

A good ranker threshold is about 0.40, depending on the type of documents. You want it to reject turbocharger rebuild instructions when you are researching DNA or whatever else you might ask a chatbot (where it mistakenly uses file search). Still, you might have 1000 chunks so you always get the maximum.

On Assistants, the number of results parameter hasn’t worked for the longest time if it ever did, and it seems nobody at OpenAI cares. You always get 20.

Topic		Replies	Views
Assistant API + gpt4o + filesearch uses more tokens then gpt3.5 API assistants-api	1	248	July 5, 2024
Understanding AI Assistant input token counts Prompting gpt-4 , lost-user , assistants-api	5	3442	June 26, 2024
High Costs and Input Tokens with Assistants API File Search API pricing , assistants-api , assistants-pricing , assistants-files	4	1576	October 31, 2024
Assistants API v2. Maximum number of chunks limit API	8	1916	October 31, 2024
Request too large for gpt-4o in organization Bugs	3	8660	October 9, 2024

Why is token use so high when using file search?

Customizing File Search settings

Related topics