I am working on a chatbot for which I need a large context window. Since gpt-4(o) has a context window of 128’000 tokens, I decided to use it.
Now I have run into the problem that I often get error messages like this:
RateLimitError: Error code: 429 - {‘error’: {‘message’: 'Request too large for gpt-4o in organization org-3cH9ytW9RJ5R0ZJMUH7cfDSj on tokens per min (TPM): Limit 30000, Requested 43385. The input or output tokens must be reduced in order to run successfully.
It took me a while to figure out what the problem is. Apparently, there is a rate limit on tokens per minute for the gpt-4o model that is set to 30’000 and this rate limit for TPM is different from the context length of 128’000.
And here comes my question: Even though the TPM limit is different from the context length, doesn’t this in the end amount to having a context length of max 30’000 tokens when using the gpt-4(o) model via the API?
My thought is that I am never able to insert more than 30’000 tokens (in fact less because the tokens generated in the answer probably also count in regards to the TPM limit) into the context window of the model since this would amount to exceeding the TPM limit.
If my thinking here is correct, doesn’t this mean that it is somewhat misleading by OpenAI to claim that the context length of gpt-4(o) context length is 128’000 when in fact it is not possible to make use of it due to the TPM rate?
Thank you for your insights and best wishes from Germany.