Can somebody please explain this math(s)?
Using API: Tokens: 16384 Model: gpt-3.5-turbo-16k
Prompt: Write 30 paragraphs that summarise the key takeaways of the article below. Each paragraph should be 2-4 sentences in length.
Article:…
Response:This model’s maximum context length is 16385 tokens. However, you requested 21864 tokens (5480 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.",
New Prompt:Write 3 paragraphs…
This model’s maximum context length is 16385 tokens. However, you requested 21864 tokens (5480 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.
By my recogning a paragraph - 2-4 sentences is around 100 tokens.
The response should easily be able to contain 30 short paragraphs.
What is the point of a large model that can’t even create some summary points for an article?
Maybe remove the 5480 from your message from the max token?
16k means message + completion - you won’t get 16k answer.
use a frame to solve it ,just like this:Write a paper about climate change and generate an outline first, which includes 13 chapters, with each chapter consisting of about 1000 words. if use API, repeat calls in the process.
Have you found a way to deal with it? I have triend so much and nothing works as for now
Yeah, did you click the link? It goes to a prompt that works:
It appears to be trained to stop writing at around 1500 words. It’ll write a conclusion, close a paragraph, poem, whatever at around that point, no matter how you instruct it.
So the trick is to ask it to write several parts of around 1500 words in length and then just combine it into something bigger.
2 Likes
The whole point is to summarise a large article - the 5480 is uploading it. If a paragraph is about 100 tokens - there should be space to generate about 100 of them within the token limit.
I couldn’t agree more with @JustinC
The larger context, even for GPT-4-32k, is more about input context, not output length. To get more output, you need to coax it out of the model by hitting the model many times.
But the larger window is great when you need more input context without resorting to excessive truncation of the input.
But I still don’t have any hard data on whether they increased the number of attention heads for the larger context, or if they are diluting the attention over the larger context. Anyone?
1 Like
I really need more input length to keep context in novel creation.
The output could even be 2-4k.
I made an app called PromptMinutes (Prompt Minutes from meetings etc) available on Apple Store. Cut n paste your API key in the ‘action’ view, and you are good to go. You can record , transcribe, and summarize. My tests show an hour worth of audio, transcription, and summary cost approx. 0.75USD . There is also a field to enter your custom prompt.
1 Like
This is correct. Check out this working example:
[{“role”:“system”,“content”:“Step 1 - List 10 popular questions about generative AI.\nStep 2 - take the 1st question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 3 - take the 2nd question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable, lists and tables where applicable.\nStep 4 - take the 3rd question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 5 - take the 4th question from the list from Step 1 and write a 1000 word article using markdown formatting.\nStep 6 - take the 5th question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 7 - take the 6th question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 8 - take the 7th question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 9 - take the 8th question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 10 - take the 9th question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.\nStep 11 - take the 10th question from the list from Step 1 and write a 1000 word article using markdown formatting, lists and tables where applicable.“},{“role”:“user”,“content”:“Execute steps 1-11"}
1 Like
The GPT-3.5-turbo-16k model has a maximum response length of around 1500 tokens, regardless of the prompt used. Even in the Playground, the Maximum Length slider only goes up to 2048 tokens. The model’s increased token limit primarily benefits the input context rather than the output length. To generate longer responses, you can try sending multiple queries in a chat-like format, providing additional context with each subsequent message. However, the model is not designed to produce excessively long responses, and there is currently no way to force a specific response length, such as a 10k response.
I’m not sure that is completely accurate mosssmo, I can generate a 5000 token response with the 16k model.
3 Likes
Thanks, however increased contextual memory and output are what most would infer from 16K renewed token length.
Would you please share the prompt type used ? I am unable to generate that length even with a direct word count prompt.
Hi, I’m quite new to this world, so don’t mind me too much. I just wanted to comment that it works quite well for me:
- removing the max_token.
- Specify in the prompt the range of words you want me to use for the answer (or section of the answer).
The answer I receive usually contains, in a very high percentage, the requested amount of words.
In my case I ask it to write an article and I specify the range of words I want each section of the article to have.
For example, for the title I want it to be between 3 and 12 words.
The lead should be at least 100 words. The body should be x words.
And so on with the rest.
In the end I get an answer with the number of words I am looking for.
1 Like
Hi Jeffer, how where you able to generate 5K token response from the API ? Because I’m not able to get close…
1 Like
I’ll locate my prompt when I get back to my laptop but essentially it was asking for 120 paragraphs on a topic where each paragraph is 2 to 4 sentences in length.
Let me know how you go. 
2 Likes
For my specific use case this wont work, but thank you for the insights!
The folks over at Future Fiction Academy seem to have this figured out. It is a method and is iterative. Even though the output context seems to be around 1500 words (2048 in tokens?) partial outputs seem to work just fine.
For example the folks over there have this prompt where they take the same chapter written twice but with two different prompt focuses. One focusing more on detail and another that concentrates more on dialog. Which produces a ton of text. With the 16k context window for GPT Turbo they are able to paste both of those into a megaprompt that combines the best aspects of each to very good results.
It rarely finishes in one go usually just abruptly stopping mid sentence. Then you just say continue and it keeps going from where it left off and because the 16K context window allows for all the massive amount of text in the initial prompt and the 1500 words without filling up it is able to keep the narrative coherent.
What’s the length they are able to produce with that? Do you know?