I am using the nodejs sdk.
I got “rate_limit_exceeded” error when running a thread with these settings:
stream: true,
tools: tools,
tool_choice: “required”,
parallel_tool_calls: false,
truncation_strategy: {
type: “last_messages”,
last_messages: 5,
},
After analyzing the logs I found that after some steps was completed the error was raised with "prompt_tokens":12125
which is way higher than the previous completed step "total_tokens":1836
. This was mid stream so no other prompts was sent to the model.
The strange thing is that I think that everything the run should have produced has already been produced. So I think instead of raising this error the run should’ve been completed at this point.
This is the progression of the usage through the run:
"usage":{"prompt_tokens":318,"completion_tokens":15,"total_tokens":333,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":362,"completion_tokens":17,"total_tokens":379,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":408,"completion_tokens":155,"total_tokens":563,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":621,"completion_tokens":133,"total_tokens":754,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":783,"completion_tokens":18,"total_tokens":801,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":831,"completion_tokens":120,"total_tokens":951,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":981,"completion_tokens":18,"total_tokens":999,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":1030,"completion_tokens":216,"total_tokens":1246,"prompt_token_details":{"cached_tokens":0}}
"usage":{"prompt_tokens":1277,"completion_tokens":153,"total_tokens":1430,"prompt_token_details":{"cached_tokens":1152}}
"usage":{"prompt_tokens":1459,"completion_tokens":167,"total_tokens":1626,"prompt_token_details":{"cached_tokens":1408}}
"usage":{"prompt_tokens":1653,"completion_tokens":127,"total_tokens":1780,"prompt_token_details":{"cached_tokens":1536}}
"usage":{"prompt_tokens":1812,"completion_tokens":24,"total_tokens":1836,"prompt_token_details":{"cached_tokens":1664}}
"usage":{"prompt_tokens":12125,"completion_tokens":1180,"total_tokens":13305,"prompt_token_details":{"cached_tokens":5760}}
Also can someone tell me what is the "“cached_tokens”?
I don’t know if this is a bug or something I don’t understand but either way I think this event of “rate_limit_exceeded” shouldn’t be this fatal . I mean there was no chance for me to intercept it and may be pause the execution for a couple of seconds and then resume the run.
The run failed and there was no way to resume it, So now I must undo everything it did then start it again and probably will have the same error again!
This is all based on my understanding to the whole thing so if I am wrong about something I would really appreciate it if someone would clarify.
Thanks,
Gado