We are using the Completions API with streaming and occasionally get the max limit of 4096. We are currently at Usage Tier 3. To make I understand the limit, let me ask a few questions:
- The TPM limit set in our account is applicable to all API calls within a minute right?
- So if we only issue one request in a minute but the combined token count of the prompt and the response exceeds 4096 then we will get the “finish_reason=length” that means we hit the limit, is that correct?
- Is the limit the same for the Assistant API? If so, if the response will be too long that it will exceed 4096, does it mean that for a single run we can get more then one message that collectively represents the response to our single prompt?