I don't know where where my tokens are being used. I think it is wrong

Here is the situation. I am building an application that is using gpt-4 turbo.
During the process, I hit the limit, as can be seen here:

An error occured: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-1106-preview in organization org-<> on tokens per day (TPD): Limit 500000, Used 495461, Requested 4914. Please try again in 1m4.8s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

So I just increase my limits to 1.5M tokens (tier 2). Within 3 mins, and one API call later, I got this error:

An error occured: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-1106-preview in organization org-<> on tokens per day (TPD): Limit 1500000, Used 1498452, Requested 5013. Please try again in 3m19.584s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

There is no way I used a million tokens in under 3 mins. I think there is either a counting mistake or something else.

What could be going wrong here? Is there anyway to see my token usage, or API call logs or anything of that nature?

1 Like

Just to add to this, my billing shows usage of only $3.55, which can’t be how much 1.5M tokens cost.

Are you using Assistants? What tools are you using?

Just plain old API calls to GPT-4 Turbo.

Huh. My internet has been dial-up slow for the past week and I can’t really explore much but I swear it would show the tokens along with the cost at the panel.

It may be that the usage panel hasn’t been updated. I would trust the headers more.

To be safe it may make sense to revoke your API keys and monitor the usage panel.


This is what my usage panel looks like, so I am not bleeding money for sure.
No idea how I hit that limit. Using a different key, didn’t make a difference.

I’m saying revoke all your API keys and monitor your usage panel to see if it continuously climbs and eventually aligns with what the error message says. To determine if it’s a bug or if there is something wrong going on with your account

Well, I did revoke keys, and check things out. The dashboard remained constant for about 30 mins. For the most part, there is no movement on the dashboard at all. There is no evidence of me using 1M tokens anywhere on the dashboards.

I can’t imagine the dashboards lagging so much from reality.

There’s plenty of reasons that billing updates are throttled. Regardless it’s better to confirm it rather than leave things to the imagination.

So if you make a new request for a single token what are the headers?

I am using the python open AI library.

 llm_output = client.chat.completions.create(
            model="gpt-4-1106-preview",
            temperature=0,
            top_p=0.1,
            max_tokens=4096,
            seed=50000,  # set 50000 to achieve more deterministic outputs
            response_format={"type": "json_object"},
            messages=[
                {"role": "system", "content": system_message},
                {"role": "user", "content": prompt},
            ],
        )

This is what I sent it. I have started logging tokens per request now, and moved up another tier, this time carefully monitoring things from another organization.

I will get back to this thread post my findings. I have stopped using this org to do what you recommend, monitor usage.

Ah.

response_format={"type": "json_object"},

This has been known to cause issues with infinitely looping. Do you also instructions in your system_message to help push the model towards writing in JSON?

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.

May be the issue, still have no idea :sweat_smile:. Just crossing off the list. Not sure how it can cause 1 million tokens but who knows

I’m currently experiencing the exact same problem. It seems likely to be a bug.

The 4 Turbo model is a preview. Read from here about limits. There is a RPM (requests per minute limit) which is what you are hitting.

https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-free

  • The models gpt-4-1106-preview and gpt-4-vision-preview are currently under preview with restrictive rate limits that make them suitable for testing and evaluations, but not for production usage. We plan to increase these limits gradually in the coming weeks with an intention to match current gpt-4 rate limits once the models graduate from preview. As these models are adopted for production workloads we expect latency to increase modestly compared to this preview phase.