We have been working on an assistant solution that uses an assistant with several custom tools pull information from third party API’s. We have developed a Flask API server to work with a front-end UI to communicate with the Assistant and house the custom code functions used by several of the tools. When running this server in a development setting, we have been ok on token usage. However, when deploying this server as a web app through Azure we have had a huge issue of the tokens being used increasing by almost 10 times as much. I have reviewed the prompts and runs of the assistant, and they follow the same process with the only difference being in a production environment through a Gunicorn deployed Flask server. For reference we were using the GPT 4-1106-preview model of GPT when this happened but are switching to GPT 3.5 turbo for the reduced token cost.
Why would the token increase between being deployed vs just ran locally on localhost in development if the run process is the exact same?