High Costs and Input Tokens with Assistants API File Search

I recently conducted an experiment to understand the impact of file size on costs when using the Assistants API with File Search enabled. My findings were surprising: the costs didn’t consistently correlate with file size. I used files of less than 300 kb. This suggests that factors other than file size, possibly internal API processes, are influencing the costs significantly.

During a basic conversation involving three user messages (8 messages in total on the thread), around 80k input tokens were consumed to produce just 400 output tokens, with the vector store attached to the thread totaling 404560 kb. These inconsistencies in token usage and cost calculations are perplexing.

Despite only interacting with a small portion of the file in my tests, the costs were unexpectedly high, raising concerns about the practicality of deploying this feature in user-facing applications.

Furthermore, in a simple test with the File Search tool activated, a basic greeting like “Hello” followed by “Question me” without any uploaded file should account for only 4 input tokens according to the tokenizer tool. However, the API reported 1874 input tokens and 50 output tokens used on the playground, even though no files were uploaded and my current assistant instructions were just 338 tokens.

These anomalies in token usage and cost calculations are puzzling and potentially prohibitive for cost-effective application. I’m reaching out to the community for insights or suggestions on how to better manage or understand costs associated with File Search in the Assistants API. Any shared experiences or tips on this would be greatly appreciated.

1 Like

Having the same issue! thinking of just scarping the assistants idea and just using langchain with embeddings instead

Same, 18K tokens per call. But for us that’s still cheap.

I don’t know what they did to file search, but now is very good in playground at least. Find hard answers in many files.

Update: I repeated the same experiment using GPT-4o, now seems to use near half of the tokens than before (~40k).

2 Likes