Assistant Token Usage Random Increase On deployed server

mckay · February 12, 2024, 8:35pm

We have been working on an assistant solution that uses an assistant with several custom tools pull information from third party API’s. We have developed a Flask API server to work with a front-end UI to communicate with the Assistant and house the custom code functions used by several of the tools. When running this server in a development setting, we have been ok on token usage. However, when deploying this server as a web app through Azure we have had a huge issue of the tokens being used increasing by almost 10 times as much. I have reviewed the prompts and runs of the assistant, and they follow the same process with the only difference being in a production environment through a Gunicorn deployed Flask server. For reference we were using the GPT 4-1106-preview model of GPT when this happened but are switching to GPT 3.5 turbo for the reduced token cost.

Why would the token increase between being deployed vs just ran locally on localhost in development if the run process is the exact same?

Topic		Replies	Views
GPT API Token Usage Higher Than Expected API assistants-api	1	246	October 1, 2024
API assistant is using over 15k tokens API token , gpt4o	10	307	April 9, 2025
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	9	2149	January 12, 2026
Assistants API Token Inconsistencies API api	0	387	January 25, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	3108	May 24, 2024

Assistant Token Usage Random Increase On deployed server

Related topics