Maximum tokens limit per request, also applicable to the Assistant API?

vonv · April 12, 2024, 8:05am

We are using the Completions API with streaming and occasionally get the max limit of 4096. We are currently at Usage Tier 3. To make I understand the limit, let me ask a few questions:

The TPM limit set in our account is applicable to all API calls within a minute right?
So if we only issue one request in a minute but the combined token count of the prompt and the response exceeds 4096 then we will get the “finish_reason=length” that means we hit the limit, is that correct?
Is the limit the same for the Assistant API? If so, if the response will be too long that it will exceed 4096, does it mean that for a single run we can get more then one message that collectively represents the response to our single prompt?

Topic		Replies	Views
Token Rate limit estimation clarification API	0	762	December 14, 2023
Doubt on prompt tokens and completion tokens API api	2	1774	April 18, 2024
OpenAI Assistant maximum token per Thread API gpt-4-turbo	11	11956	May 28, 2024
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	14345	January 11, 2024
How to understand new model limits? turbo API	3	1444	March 9, 2024

Maximum tokens limit per request, also applicable to the Assistant API?

Related topics