Tokens usage on Response API with previous message

itsvnk · July 28, 2025, 7:17pm

Hello All

I use the chat (text only) on APIs.

I like the responses API though it is not really production grade yet. I understand that, when previous_message_id is given, all the previous inputs are charged. Fair enough.

Now, what I don’t understand is how the input tokens are calculated. I need to know this so that I can budget users’ conversations accordingly.

In the following example, from the 2nd row onwards, the exact same system prompt was used, with every other parameter being EXACTLY the same. The output varies a bit, & that’s ok. I am just trying to figure out the proportional increase: it doesn’t make any mathematical sense to me, even if I apply a % for every message sent!

For the 1st row below, it is very low as it was the 1st message and there was no “previous response”

inputTokens outputTokens
148 775
1071 849
2068 814
3030 905
4083 904
5135 910
6193 937
7278 947
8373 947

Can someone help demystify this for me please?

Thanks!

aprendendo.next · July 28, 2025, 8:09pm

Are you passing the “system” prompt through a role, or using the instruction parameter?

It is usually not needed more than 1 system(developer) prompt (as a role) at the very first turn, and it will be preserved in the next ones.

Notice though, that the instruction parameter is volatile (is valid only for each request) and will not be carried over even with the use of previous_response_id.

What it seems is that in addition to the previous conversation, the difference might be that you are perhaps adding a new system prompt of 100~200 tokens (through role or instruction).

_j · July 28, 2025, 9:32pm

You are sending the same system message and/or user messages each time: actual result is 137 tokens of text if a single message, or 134 tokens of input if there are two messages (the containers take 4 tokens per message, then the final prompt for the AI to write is 3 tokens).

Since the initial input is 148, but the following inputs continue to add about that much, I’d conclude there is only one prompt message, no constant system message in “instructions” or only as the first chat turn of input.

Since you say you are using “exact same system prompt”: I conclude you are using the input messages incorrectly. You do not continue to send a system/developer message if using the server-side state, unless you want a chat history full of duplicated system messages for every user input!

Either send “system” in the input message list only once when starting a new session, or use the “instructions” API parameter to constantly insert a system message prefix before any chat history.

Turn	inputTokens	outputTokens	Δ input vs prev	prev‑assistant + 4	newest‑prompt (wrapper + content)	newest‑prompt content
1	148	775	–	–	unknown	unknown
2	1 071	849	923	779	144	140
3	2 068	814	997	853	144	140
4	3 030	905	962	818	144	140
5	4 083	904	1 053	909	144	140
6	5 135	910	1 052	908	144	140
7	6 193	937	1 058	914	144	140
8	7 278	947	1 085	941	144	140
9	8 373	947	1 095	951	144	140

A conversation state continues to be fed back into the model. Here’s an example with understandable traceable figures:

Turn 1 input:

system: 96 tokens + 4 token overhead = 100
user 1: 146 tokens + 4 token overhead = 150
prompt: 3 tokens
TOTAL: 253

Turn 1 output:

assistant: 46 tokens output

Turn 2:

system: 96 tokens + 4 token overhead = 100
user 1: 146 tokens + 4 token overhead = 150
assistant 1: if 46 tokens generated from before = add 50
user 2: 46 tokens of prompt text = add 50 tokens
prompt: 3 tokens
TOTAL: 353

Turn 3:

Keep piling on the new messages, with no management of length offered to you except your choice of error or finally dropping some messages at the model’s maximum input (which can be a million tokens).

Note: the only party encouraging you to use this server side state is OpenAI. You are locking yourself into their platform, trusting them to maintain your data and not lock you out or ban you from your organization account. Besides the fact it has no limitation to the length of recurring chat input.

Topic		Replies	Views
Responses API high token consumption API responses , responses-api	9	758	November 8, 2025
Lets break down the input/output token details together! API realtime	3	1781	October 6, 2024
Using the API the token count is off API	11	1895	December 29, 2025
Token consumption: Prompt tokens exponentially increase when using Threads (Assistants) API assistants-api	8	912	September 5, 2024
Prompt tokes are much lower than the number mentioned in the response API	6	220	January 10, 2025

Tokens usage on Response API with previous message

Turn 1 input:

Turn 1 output:

Turn 2:

Turn 3:

Related topics