Longer GPT 3.5-turbo Output


I am using gpt-3.5-turbo-16k through API and I want to summarise some content. The content has about 12k tokens and I want the output to have at least 3k tokens. But the response only gets around 200-500 tokens range. How can I increase the response size? I tried with different prompts mentioning to use specific number of words or tokens. But it didn’t work. Can someone please help me with this?

Thanks in advance

You must not use the wording “summary”. You will get a trained summary length, a surefire recipe for getting output that is like all other short summaries the AI has been fine-tuned on writing.

If you put in 12k tokens, and merely ask for it to be “rewritten” in a new way, the inability for the AI to write long output (also by fine tuning) will also get you a shrunk output. Or you can say “based on the documentation above, write a new article”.

With that much input though to that model, I’ve found that it tends not to perform any kind of restructuring or rewriting on demand, and you get your exact text back for paying for max context.

1 Like

Thank you for the response. To give you more context on the task I do, I’m trying to summarise a meeting transcript and generate meeting minutes from that. Since a transcript is more than 16k tokens, what my idea was to chunk it and summarise each chunk and combine those summaries finally to generate the meeting minutes document. When summarising, I want a thorough summary keeping as much information as possible. I tried the command ‘Paraphrase’ and it improved the results a bit. Do you have any specific way or prompt for this?

You have a good plan. The only thing you might do is to tell the AI that it is working on multipart files, to only write the summary of that one part without a happy ending conclusion so it can be reassembled, and even give it the previous summary upon which it should continue writing without ending.

Around 2k is where your instructions start becoming lost in importance among the other mass of text.

1 Like

Thank you again for the valuable reply. I applied some of the things you told and I could get a bit longer output. The word ‘Paraphrase’ seems to generate a bit longer outputs.

Another concern I have is that even if we have longer summaries, the final meeting minutes document is a bit short. Is there a way to make it longer?

One “trick” that you can use: the older checkpoint AI models haven’t been so heavily trained on curtailing their output length.

You can try gpt-4-0314 for your final product, with its 8k context giving you something like 6000 → 1500+ - which is a lot of words.

1 Like

Thanks for that valuable point. I changed the model to gpt-3.5-turbo-16k-0613 and the response length was increased by a bit. Since I have to process large files, I have to use gpt 3.5 because of its high context window.

gpt-3.5-turbo-16k-0613 is recommended for longer output

1 Like

gpt-3.5-turbo-16k-0613 and gpt-3.5-turbo-16k are the same thing.

1 Like

nah a bit different from response while embedding as format markdown


I had that suspicion too. But I need a higher context that 8k from gpt-4. Since gpt-4-32k isn’t available for everybody, gpt 3.5 16k is the best option.

Thank you. Do you have any other tricks for getting a longer output?

give me example of prompt for a test getting a longer output

also the best tricks for getting a longer output is LLM Method’s

1 Like

Let’s check that assertion that they are different. I use a top_p setting near zero for near deterministic output…


1 Like

try this config in your playground


ignore a max tokens

1 Like

If you leave the temperature or top_p set at 0.5 or higher, you’re going to get different generations every time you run the model.

1 Like

this the best for all models as default

1 Like

Are you sure not this the best for all models?

The models selectable vary greatly in perplexity, and the desired output of your application will also, and even the quality of a particular world language at a particular temperature will change, so there is no “best for all models”, you’re going to have some idea that will inform the choice better…

But the point is that - being the same model, and the one without the date being a “stable name” alias - setting the sampling parameters to where you don’t get random token outputs reveals the sameness of the models if you don’t simply trust the true model name returned from your API call.

1 Like

In the Wilds

a config

1 Like

I used these settings. However, the output is still short. For 12k input, the output is around 6-7k maximum.