Assistant Started Hitting TPM Limit With No Changes to Implementation

Hi all! I’ve been working on a RAG app that gets a ticket, then reads through some documents to find the proper billing code. It has been working pretty great but for some reason yesterday I stopped getting responses on one of my test accounts. I looked up the thread and found that I was now getting this error:
" Run failed

Rate limit reached for gpt-4-turbo-preview in organization org-[org-id] on tokens per min (TPM): Limit 30000, Used 4811, Requested 28049. Please try again in 5.72s. Visit https://platform.openai.com/account/rate-limits to learn more.

I hadn’t changed anything on my implementation and it is only broken on a few threads, so I am confused as to why this is happening.

For clarity the prompt for the test is: “Explain [billing code]” . If I just ask something that doesn’t require a lookup it seems to be working as expected, e.g. “Hello!”.

Has anyone else ran into this? Do I need to stop reusing threads for a user or not use them for so long? I thought the docs said they auto-trimmed but maybe I’m mistaken.

Thanks for any help or clarification!

Additional Note

My implementation is:

  1. User clicks a code button
  2. App automatically creates message of “Explain the [code number/name] code”
  3. I append the message to the existing thread
  4. I run the thread.