Sending the same content doesn't reduce x-ratelimit-remaining-tokens

I noticed an interesting issue. Suppose the body on a chat API call remains the same. In that case, the response will generate a different assistant response, but the headers x-ratelimit-remaining-tokens and x-ratelimit-reset-tokens remain the same.

Example body:

{
	"model": "gpt-4",
	"temperature": 0.5,
	"messages": [
		{
			"role": "system",
			"content": "You are an English teacher"
		},
		{
			"role": "user",
			"content": "Check the spelling for this text: This is a test message."
		}
	]
}

For this specific case, the response would vary from time to time, and the tokens in the response itself correctly change. But the header values stay the same. This specific example:

x-ratelimit-remaining-tokens	9961
x-ratelimit-reset-tokens	234ms

The chances of this happening in real usage are small, but it’s still there.

3 Likes