GPT-4 Turbo Long response issues (Lazy ? Restricted to 1xxx tokens?)

Hello everybody,

The problem:

For the past few days, I have been unable to generate long responses (>1500 tokens) with GPT-4 Turbo (regardless of the version: 1106 or 0125). Previously, everything worked perfectly without any problems. I would specify in the prompt that I wanted an exhaustive response with more than X words or tokens, and it would work without any issues.

Today, it is no longer possible to do this with the same prompt. I get error after error (as shown in my screenshots).

Worse still, they have modified the old versions, not just the latest one, which makes hundreds of modules that I have created in my company completely inoperable.

Why display 4096 tokens in output if it can only really make 1000 tokens? Is this some kind of scam?
And it’s not as if every additional test or token after 1000 tokens is free …

Any ideas on how to work around this? Has anyone else experienced this?

Additional Information:

  • I am using the OpenAI API to access GPT-4 Turbo. (Tested by Playground and
  • I have tried different prompt formats and settings.
  • I have tried different versions (1106 and 0125)

I would appreciate any help or advice that you can offer.

Openai support was completely out of the loop and took days to respond…

Thank you.


What a scam? Announcing 4096 tokens in output and not being able to exceed 1500 tokens…

1 Like

I don’t know how you got there, but I had it complete for me:

assistant: The maximum number of tokens I shall generate in forming a response is
user: continue
real-0125: around 2048 tokens, which typically translates to approximately 750 words, depending on the complexity and structure of the language used. This limit allows me to provide detailed answers, explanations, and narratives across a wide range of topics, from simple queries to more complex discussions. If you have any specific questions or need information on a particular subject, feel free to ask!

Of course that is just what it knows about AI models, and you could make it say whatever you want with a system prompt. That isn’t as strong as what it can’t answer about, that makes just about everything wrap up around 800 tokens.

Unfortunately I’ve tested a large number of different prompts, methods and so on, but it’s impossible to get it to more than 1300-1500 tokens. And even getting there is very complicated.

There’s a big difference between what he says and what actually happens.
Because if you listen to them, you can do 4096 tokens in output… lol