Have you tried adding direction on actual word count? I found that beefed up my outputs sometimes. It depends though. Giving it an actual target, in characters, words, tokens, etc, could be worth a try. If it helps please let us know because any input on this would help me too!
Thanks SteveOk, yea I have tried word counts but it ignores it and returns around 1500 tokens.
$messages = array(
array(“role” => “user”, “content” => “Create a complete ebook called ‘How to enhance your business using AI’. We are using a 16k openAI token model so please write the full ebook using the maximum number of words possible.”)
So in the context of 16k tokens, what the real capabilities are here is being able to continuously send more context through the chat with each subsequent query.
Think of it more like the memory has improved rather than the output has increased. So when before you sent 6 messages or so depending on the length. ChatGPT would begin to forget the context of the conversation from chat message #1, like if you gave it certain instructions at the beginning. It wouldn’t follow them anymore the further on you went into the conversation. This was because of the token limit. Each time you send a new query, it sends the previous chat history for context to develop it’s next answer.
And with the limit in place it is only going to send the last X amount of words that would fit with the token. So with the increased token length, you are able to have longer conversations with an improved memory. Rather than thinking about it as being able to provide more length. It is true that it can provide more length, but you will have to continually seed it for more of what you want, and as the token increased, it will be able to create more coherent longer conversations.
@JustinC that’s interesting, does it mean that the engine’s working memory is like a trailing running total of its max token threshold, based on the sum of all of its outputs or outputs plus user input prompts?
Big thanks for your explanation JustinC. So I assume this is the same for the larger GPT4 models. Is there talk about enabling longer responses/output rather than just a benefit for the memory/input?
Does anyone know of a way to force longer responses from the 16k model or as you said JustinC, it’s not designed to provide longer responses and there is nothing you can do to force a 10k response for example?
Requesting multiple paragraphs seems to work quite well. For example…
$messages = array(
array("role" => "user", "content" => "Write 120 paragraphs for an article called 'How to make money using AI'. Each paragraph should be 2-4 sentences in length. This should be a how to guide with practical information.")
$data = array(
"messages" => $messages,
"model" => 'gpt-3.5-turbo-16k',
"temperature" => 0.25,
"max_tokens" => 15500,
"top_p" => 1,
"frequency_penalty" => 0,
"presence_penalty" => 0,
"stop" => ["STOP"],
I guess, that the input layers and the output layer are fixed at 1024 tokens even if there is a context window of 16k or 32k. For a transformer to have bigger layers than previously needs new training, which is too expensive with the overall size of the models.
OpenAI might use some kind of batching techniques to have bigger input windows through chunking the inputs/memory into pieces. And maybe some kind of embedding techniques per conversation. This is why the GPT-4 models might be more expensive than the smaller models. With a context of 16k, I think, they need to do several requests internally before submitting the response.
Thanks @Jeffers for this thread, @JustinC for a very helpful explanation and @smuzani for sharing a worked example which I’ve copied - great to see this work with 3.5 (you didn’t even need the more expensive 16k version!) . Does it work as well using the API?
I’m trying to summarise ‘long read’ articles e.g. approx 10k words. Ideally I’m looking for ~ 1000 words (10% of original length) as the summary. I’m also asking for the key topics to be extracted. I’ve been getting some strange results using 16k API.
Any advice on pros & cons of using 16k API versus chunking with original 3.5 API? The ‘mini-summaries’ from 3.5 are not always cohesive when stitched back together, but using overlapping input, or tagging mini-summary#1 on to input#2, increases costs and may not improve results much.
hi! i am having the very same behaviour. I will try to use the new “plugins” / “function” feature to maybe create a similar apporach than the new feature in the official chatgpt chat where a button pops up “generate more” … maybe this helps and someone of you can try it, too
While this makes sense in terms of chat - they have now released 16k on the API. The API does not have any chat history. I’m prepared to pay the extra costs to upload a large article, but I need to be able to produce decent output to make it worth it.
Can somebody please explain this math(s)?
Using API: Tokens: 16384 Model: gpt-3.5-turbo-16k
Prompt: Write 30 paragraphs that summarise the key takeaways of the article below. Each paragraph should be 2-4 sentences in length.
Response:This model’s maximum context length is 16385 tokens. However, you requested 21864 tokens (5480 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.",
New Prompt:Write 3 paragraphs…
This model’s maximum context length is 16385 tokens. However, you requested 21864 tokens (5480 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.
By my recogning a paragraph - 2-4 sentences is around 100 tokens.
The response should easily be able to contain 30 short paragraphs.
What is the point of a large model that can’t even create some summary points for an article?
use a frame to solve it ,just like this:Write a paper about climate change and generate an outline first, which includes 13 chapters, with each chapter consisting of about 1000 words. if use API, repeat calls in the process.