Spending limits per user or per API key

As far as I can tell, there is no way to set a spend limit per API key, or per user. If I’m wrong, please correct me.

Use case: when an organisation has a large number of members, any single member can accidentally overspend significantly by mistake. A solution is to maintain a low spend limit. However, this is impractical if the spend limit needs to be set high for other members of the organisation.

The only solution (as far as I can tell) is to create multiple organisations, which seems highly suboptimal.


A potential solution is to build an api middleware that re-direct the api request to OpenAI, and record usage. Please check openai-proxy: Control the usage for each api key (github.com)

1 Like

Hi there, I’m building a product for you to be able to limit user usage (API calls) and track more advanced llm-specific user analytics. I’m not sure if it would exactly fit your problem, but I can take that into account for the development! There’s more info at llmetrics.app if you are interested. :slight_smile:


That sounds like a useful project!

Great. It is an important need before OpenAI makes it literatureally happen. We also worked one (basically just a middleware every software requires), while pricing system is difficult as 1. users call streaming mode gpt-4 or chating 2. users aborts/stops the request earlier.

Currently, I am facing rate limit error as GPT-4 has 10k TPM. Why I am facing this because my colleague is also using same OpenAI key. Now in order to overcome this problem , if I will create multiple organization let say 5 organization for 5 users. Then for Org A, Max will use A key and for Org B Jax will use B key. Then A & B will have it’s own separate rate limit. Am I on right track? Please let me know.

You may do better by funding your account with more credits to get to the next usage limit tier, and consolidating your spending to reflect your importance. You can read about tiers by following the link within your account’s “limits” page.

Organizations don’t just come about at the press of a button, you get one per account (or another ‘personal’ that might have been created for legacy business accounts), and the only way to add another is to contact OpenAI, usually just when your organization is gone by being taken by another owner.

1 Like

The product in which I am working is going to be used by multiple users and every user will upload documents in the form of PDF which might have 400 to 500 pages. Now let’s assume that at 14 hours 25 minute. 5 users upload PDFs then due to 10k TPM for GPT-4 model, API will show rate-limit warning and will go in retry mode when the limit is reached for 25th minute. So next user request will go in retry mode. So how can I overcome this problem? Can you suggest me some solution.

Would you like your rate limits to instead look something like this?

Model Token limits Request and other limits
gpt-3.5-turbo 160,000 TPM 5,000 RPM
gpt-4 80,000 TPM 5,000 RPM
gpt-4-1106-preview 300,000 TPM/5,000,000 TPD 5,000 RPM

Then the answer from OpenAI is “pay up front”.

Tier Qualification GPT-4 TPM
Tier 1 $5 paid 10k
Tier 2 $50 paid and 7+ days since first successful payment 40k
Tier 3 $100 paid and 7+ days since first successful payment 80k
Tier 4 $250 paid and 14+ days since first successful payment 300k

At the very bottom of your rate limits account page, though, is a request form for an increase outside of the tier system, with a doubling being pre-filled for you and probably what they are set up to approve, and no guidance of what you should fill in. Provide information about your company, users, case, and escalating use… and why prepay isn’t an option.

Let me brief my problem case. Let say there are 5 users. Each user has 100 PDFs. Now at the same minute everyone uploads the PDF and then in the backend those PDFs will go to the custom function that I developed. So at this time, parallel processing will happen because 5 users are calling my function. So at the same minute 5 PDFs will be converted to embeddings. Now, at this point, will I get rate limit error becasue GPT4 has 10k TPM.

If this is the case, then what can be your suggestion.

1 Like

It is impossible to control spending, it is made for that purpose. I have been waiting for them to give a response to the excessive consumption of my api, which I generate every morning and leave them untouched, and surprise, the entire balance is consumed until I reach the limit.
Personally, it doesn’t look good.
The attention service is null.
but when I post this on the forum the administrators delete it instantly.

it will not work.
Something strange is happening and it seems that openai knows it.
but they don’t do anything.
API consumption does not correspond to real consumption.
After carrying out multiple investigations I was able to detect that the apis are consumed alone, without using them.
I have about 10 messages to OpenAI but I don’t get a response, when I post this on the forum they delete my messages quickly.

You have received a response from OpenAI on this forum.

1 Like

When I am querying into the index which I am using GPTVectorStoreIndex. While query is being performed, I am getting in terminal like.

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4677 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 30 tokens

Now, I understand that total embedding token usage is 30 tokens because that’s for converting my query to embeddings. But even my output is just a date or the entity name like Jack Ryan. Then also why Total LLM Token Usage is 4677.