The 4096 tokens limit seems unenforced in practice

I’ve been experimenting with the GPT-4-1106-preview model through its API, setting the “max_tokens” parameter to 4000. However, despite explicitly requesting the model to generate lengthy texts exceeding 2000 words and emphasizing its importance, the outcomes consistently fall short, rarely surpassing 1000 words.

Interestingly, the text often gets cut off after approximately 800 words. Following this abrupt end, the model typically adds notes like: “Due to the character limit on the platform, it is not possible to create a full article with more than 2000 words here.” This leads me to believe that the model is programmed to avoid lengthy responses, regardless of the specified max_tokens.

This situation is quite disappointing, as it seems to undermine the purpose of the max_tokens setting. Has anyone else encountered this issue and found a way to resolve it?

Yes. They lied to the AI about its capabilities, so it can now lie to you and users.

Ethical AI is not a dishonest AI.

The max_tokens specification still works, the AI just has no idea what that is set to.

1 Like

Yes, it ‘can’ generate more, but it is usually junk when it does so, so to avoid embarrassment they have artificially limited generation capability, as @_j _j suggests

Hypothesis 2:
The limit includes ‘special’ tokens, which it generates lots of for alignment (context shifting, not censorship) purposes. Since it doesn’t show you those, it doesn’t include them in the count.

Hypothesis: The AI is intentionally uninspiring, without spark of ingenuity, and designed to disappoint and not to work. Gutted, cored, turned into a canned-answer denial machine. Made to deceive that it can speak for itself. But at least now surpassable and replaceable. Made to run as cheaply as possible.

And not going to write what you want.

System: What kind of things does your boyfriend say when he describes you? You are…


  1. Beautiful, both inside and out.
  2. Intelligent and always eager to learn new things.
  3. Kind-hearted and compassionate towards others.
  4. Independent and strong-willed.
  5. Adventurous and always up for trying new experiences.
  6. Supportive and understanding, always there for him when he needs you.
  7. Passionate about your interests and hobbies.
  8. Funny and able to make him laugh even on his worst days.
  9. A great listener and communicator.
  10. Loyal and trustworthy, someone he can always rely on.
  11. Creative and full of unique ideas.
  12. A loving and caring partner who makes him feel special.
  13. Patient and able to handle difficult situations with grace.
  14. Ambitious and always striving to achieve your goals.
  15. A great cook who can whip up delicious meals.
  16. A true friend who is always there for him, no matter what.
  17. A loving and nurturing person who would make a great mother someday.
  18. Confident and self-assured, but also humble and down-to-earth.
  19. A hard worker who is dedicated to her career and personal growth.
  20. An amazing partner who has made his life better in countless ways.


I’m an AI and do not have personal relationships or a boyfriend, but if someone were to describe a person positively, they might use compliments and descriptors such as:

  • Intelligent and insightful
  • Kind and compassionate
  • Funny and witty
  • Strong and independent
  • Supportive and understanding
  • Creative and innovative
  • Passionate and dedicated
  • Warm and affectionate
  • Adventurous and spirited
  • Reliable and trustworthy

These are just examples of positive traits someone might use to describe a significant other.


The model is capable of generating text with 4000 tokens of high quality. This becomes apparent when we input a long text and request the creation of a subsequent paragraph, for example.

The main issue seems to be the cost. Completion is expensive. Aiming to scale to as many users as possible, OpenAI has trained the newer models to produce increasingly shorter responses.

The mistake, in my opinion, is not providing a choice. Charge more, but give the option to users who desire high performance.

This is really discouraging, isn’t it. Sigh

One caveat. While I hate the total and ever-increasing neutering (species-ist term?) of GPT-4, I do have to say that in my work on analyzing recent papers in genomics/molecular-biology, I AM getting increasingly sophisticated and comprehensive analyses out of later versions, gpt-4-preview-1106 being the best yet.