Question about token usage per message in commercial chatbot

mohammedalaadhmy3 · September 17, 2025, 2:41pm

Hello everyone,

I’m building a commercial chatbot app with a group of developers. Right now, each user message — even if it’s just a single word — is consuming around 3,000–4,000 tokens.

The developer insists this is normal because the code sends the entire chat history, the system prompt, and all the functions/tools definitions together with every new message.

My argument is that this approach is not practical and very expensive, and that OpenAI provides solutions inside the platform (like Threads in the Assistants API, storing instructions once in the Assistant, and using function calling properly) to avoid resending everything every time.

The developer completely refuses and says his method is correct and the only practical way.

Could someone from the community clarify what is the standard/best practice here for token efficiency in commercial chatbots? Is it really “normal” for a short user message to cost 3k–4k tokens, or is this a sign of inefficient implementation?

Thanks a lot!

aprendendo.next · September 22, 2025, 2:20am

Welcome to this forum.

In short, yes your developers are right: you pay for all inputs and tools needed for the AI model to generate an answer.

The “memory” you refer to is sent all over again for the model to process, regardless the convenience the Assistants API (which is being deprecated btw) might give you to recover previous context.

Context storage (tools, system instructions and conversation history) and context processing are different things, the LLM needs to reprocess things all over again to generate a proper answer.

You can either limit the turns for the conversation or “forget” older messages to reduce costs, but it is not like you only pay for each new “hello” and the history is “already paid” (it is not), there is usually more things involved.

I recommend both you and your team to take a step back and try to put some extra effort into understanding each other a little better.

sharakusatoh · September 22, 2025, 3:52am

I know how you feel. However, AI models adjust their responses based on context. With only short user messages, the model has almost no information to work with and can only generate random, generic responses. With proper design, higher token usage actually leads to more distinctive, specialized, and contextually accurate answers.

For example, the commercial chatbots I design use around 4,000–5,000 tokens just for the system prompt. Fortunately, the OpenAI API provides a feature called Prompt Caching, which can help reduce token consumption. Of course, results vary depending on the developer’s skill, but in general, token usage is directly proportional to both the chatbot’s performance and its hallucination rate.

Topic		Replies	Views
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	3148	May 24, 2024
The cumulative token problem and role = system usage, options? API	9	4539	February 16, 2024
How many tokens is normal usage for asking a question? API chatgpt	7	19812	September 6, 2024
Retain past responses in memory without sending them again at every API request API gpt-4 , gpt-35-turbo , chatgpt	11	11620	January 25, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	9	2182	January 12, 2026

Question about token usage per message in commercial chatbot

Related topics