Token limits GT-4OMINI model-2024-07-18

I work with the GT-4OMINI model-2024-07-18 . I want to transmit a large amount of text for analysis and unification. I can’t figure out what the token limits are when transferring via the api? For example, I can:

  • transfer 100000 tokens to the entrance.
  • get 20000 tokens at the exit?
1 Like

Welcome to the community, @olgak007 !

It depends on your usage tier. You can find out more here…

https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-two

I’ve seen these limits. Thanks!
I don’t understand the maximum size of a one-time packet for input and output. I understand correctly:
The maximum input: 128,000 tokens
The maximum output: 16384 tokens
Amount for input and output: 128000+16384= 144384
https://platform.openai.com/docs/models#gpt-4o-mini
Or does using the API give you any additional restrictions? For example, no more than 4000 tokens in total for entry and exit?

The count of 128000 (125k, in fact) is the total combined input and output. It is the model’s context window length, an AI inference memory for both placing input and formation of a continuation ouput. Therefore, if you were to allow 3k tokens for a response (a length that is typical of what the model will produce before the output becomes questionable or against training), then you’ll have 122k of input space.

In practice, such a large input is not followed well. The attention mechanism rather works like a retrieval that can only extract certain facts at a time, and proximal reward learning on chat styles means it is more focused on the initial and final messages as what delivers reward-gathering results from that training.

The output limitatition is a model limit where it will be absolutely cut off, and where you can’t specify a larger cutoff. I’ve never approached that value without the output being terminated - the AI is not trained for writing or rewriting book chapters. The output being formed also acts as more input for the next recursive token generated one-at-a-time.

Don’t think of an API request as a “packet” - think of it as loading that input context into the otherwise stateless model, and then running the generation of language that appears after that (assistant response) until the AI is done writing its thought or done following the instruction and decides to stop, or your max_completion_tokens value is reached.