TPM hitting when executing a thread

my company has set TPM to 30000 tokens limit when I upload a file in a thread which takes around 6000 tokens after 10-12 queries the thread hits the TPM limit and even after a long time if I rerun the thread with new query it hits the TPM, That means the thread requires the 30000 tokens to be executed. How do I solve this? I am building a data analysis chatbot for businesses to use so if the thread becomes useless it is a problem. When executing a query, I have an additional prompt/rule that I add in the backend. What can be a solution to this except increasing the TPM limit?

1 Like

The assistyants API can be making us of additional tokens for retreval of data from files you may have uploaded, this amount depends on the prompt and the nature of the data stored, so there is no general way to predict it.

The best solution is to load more credit to the account to raise the rate limit, or perhaps look to impliment your own RAG pipeline in order to control the token usage level.

The assistants API saves a great deal of additional code, but it is somewhat of a blackbox internally with regard to token usage.

3 Likes

Welcome to the dev forum @AyushAditya

The 30000 TPM is a Tier-1 limit on gpt-4o models.

There are two ways to go from here:

  1. Upgrade the organization’s tier.
  2. Use a model with higher TPM limits like 4o-mini
2 Likes

Also welcome @sps :smiley: I think you clicked the wrong name :rofl:

2 Likes

Fixed it :laughing: Always nice to see you here @Foxalabs

2 Likes

Is there any way to optimize the thread by reducing the history? or anything

Thank you, :smiley:
Is there any way to optimize the thread by reducing the history? or anything

If you’re using the File_Search tool, you can control the number of chunks the tool outputs and the max number of tokens in each chunk.
Here’s the docs for more info