Trying to understand why I'm hitting token limit with API

The limitation is because the model has intense training, or perhaps even some injected governor, that compels it to stop writing and wrap up its output. This is a preconceived planned notion also. Ask for 40 descriptions, and they will be cut to half the size of 20 descriptions. Make an infallible prompt that reproduces lines of input into processed lines of output, and you will just be cut off arbitrarily.

22k tokens of input can be processed (and billed) almost instantly to a hidden state because of the attention masking techniques, but producing the following tokens takes computations, that apparently they don’t want you to even pay for. You don’t get a different model than one now extensively trained to make ChatGPT less expensive for OpenAI.

2 Likes