Cannot, for the life of me, get a detailed enough response

edmund · February 6, 2024, 11:54am

So, I have a use-case where I’m feeding in a large body of content (around 1500 words) as part of my prompt, and asking the model to basically restructure that content in a certain way that makes it more consumable.

The problem is, the model REFUSES to return a result longer than ~600 words long, which means that it loses enormous amounts of detail during the transformation, making the whole thing effectively useless because of all the lost detail.

I’ve tried all the obvious thing like telling it “include ALL information from the original content in the updated content”.
I’ve tried giving it instructions like minimum paragraph counts, or minimum word counts.
I’ve tried doing a first-pass completion where I extract all the key topics from the initial content, and then feed them back in to the second prompt saying “make sure you explicitly discuss all these topics”.

But no matter what I do, it caps out at around 600-650 words and I lose all the details.

Anyone got any ideas on how to address this?
For reference, I’m running “gpt-4-0125-preview” with a 4096 token limit. The responses I get back are 600-700 tokens maximum.

edmund · February 6, 2024, 12:00pm

Just to add to this, I’ve now tried getting the first output (the one around 600 words), and then feeding that BACK IN as part of the prompt, in addition to the original content, and telling the model “take this summarised input, add a bunch of extra detail to it, and return a more detailed version”.

Still caps out at ~650 words…

jlvanhulst · February 6, 2024, 5:07pm

You are not sharing what model you’re using and if this is completions of Assistants. If Assistant - I would have all instruction in the Assistant’s instruction and your DATA in thread message.

jr.2509 · February 6, 2024, 5:09pm

but have you ever managed to get more than 600-700 words of output in one request?

jlvanhulst · February 6, 2024, 5:16pm

Again, without sharing which model you use, how long your instructions are,API or Completions etc - its hard to (want to) help.
I know that I have a lot of 500+ word answers - but do you want me to go through the dozens of threads I have per day to find a 600+ response. And how would that help you if you are using completions and I am using Assistants.

jr.2509 · February 6, 2024, 5:57pm

I’m not the one asking the question I have been able to get up to 700 words (i.e. ~1000 tokens) in one request, mostly using GPT-4 models, setting max tokens at 4000, using prompt best practices etc.

But I think the question that has every so often come up is whether it is possible to get as output of a magnitude of 1500/2000+ tokens in one go. If you’ve been able to achieve that, it would be great if you could share any insights under which circumstances.

lmccallum · February 7, 2024, 5:47am

A couple of things you might try:

Include a whole example in your prompt - a sample input and the expected output. It sounds like you have enough room, token-wise, to do this.
Beware of any words in your prompt instructions that might suggest summarization or shortening or condensing. Since these tasks are so common, the LLM’s bias will lean that way, so you need to counteract that in your instructions. For example, “more consumable” suggests shorter and simpler text. I don’t know what your exact prompt is, but try using detailed instructions like: “take this text and add headings and sub-headings, numbering for list items, break up longer paragraphs into several smaller ones” and so on.
In my brief comparisons between GPT-4 and GPT-4-turbo, I found the former to be superior in following instructions. You are only using 4096 tokens, yet the turbo model accepts many multiples of that. My suspicion is that turbo is optimized for huge pieces of text. If you don’t need that many tokens for your task, turbo might not be the best model. I would try GPT-4 instead and see if it’s better.

I hope this helps a bit!

PaulBellow · February 7, 2024, 7:00am

This is a good semi-related post. They’re using a fine-tuned gpt-3.5 to get quality output 1k+ words consistently…

I’ll be running some of my own tests soon.

Oops. This is an even more recent update!

I really wish OpenAI would make a GPT-4-Turbo-FICTION trained on fiction. There’s a few OpenSource models going that route.

larissapecis · February 7, 2024, 2:20pm

Share the prompt with us. One of my prompts has around 1500 and I manage to get more than 4,000 on the output. There might be a problem with your API parameters.

Have you asked for chatGPT help? Have you experimented with other prompts in the Playground?

By the way,

For reference, I’m running “gpt-4-0125-preview” with a 4096 token limit. The responses I get back are 600-700 tokens maximum.

gpt-4-0125-preview is way more succinct than gpt-4. Try GPT-4 to see the difference as well.

edmund · February 11, 2024, 2:31am

OK - update time.

I THINK I’ve come up with… as good of a solution as possible to this issue. It’s a multi-step process to get what I want, I’ll describe the details here.

I’ve basically taken a similar concept to what NVIDIA does with its DLAA Anti-aliasing technique, and then applied it to text content.

The basic principle goes something like:

Get base content, use AI to upscale it to a relatively “low quality, but high detail” version of the base content. Using the DLAA reference, this basically means upscaling something like a 1080p image to something like 8k.
Use another AI pass to then downscale that “low quality, but high detail” content to a higher-quality, intermediate level of detail version. The logic being that AI models are good at summarising, and good at “cleaning up” content, so by blowing the content up really big, and then doing another pass to compress it back down, you end up with a much more consistent result than if you tried to do a single pass from the original content, directly to the intermediate content.

What that looks like in the context of text content is the following:

Use the model to break up your base text content into “chapters” or “sections”. Get it to return each section of the base content (verbatim, you don’t want it making any changes), along with a “section heading/title” for each section and a single “document heading/title” for the whole document. The section headings help to keep the AI on-task in subsequent steps and the document title helps to give AI a “northstar” piece of context to bound it’s outputs moving forward.
For each chapter/section, you now want the AI to generate a BIG chunk of content for each section. Feed each section back individually, along with its section title and the overall document title for context (but NOT the full original content, otherwise the model breaks out of the bounds of the section it’s meant to be focussing on). So if you have 5 “chapters”, you’ll have five separate calls back to the API, one for each chapter.
Make sure to prompt the model to include quite a lot of detail in this step, you WANT to get duplicated information across each section during this phase, it’s fine. You should end this phase with a really detailed block of content for each section (significantly more detailed than you “need”), but probably lots of duplication, and each section of content will be quite disjointed from each other, if you tried just concatenating them together, it would be a total mess.
Final step, the downscaling. Basically at this point, you just want to feed all of the detailed blocks of content that you have generated back into the AI in a single call, giving it a prompt to essentially recombine them into a single cohesive document. You need to be pretty explicit here, telling it to make the minimum required changes to make them work as a single document, otherwise it just reverts to its base instinct and gives you a super compressed 600-word summary. It’s important here too to give each section an index, and include it’s section title when feeding them all back into the AI. The index tells the model what order they need to go in, and the title gives the model bounded context to know how much of each section to preserve, and how much to throw away.

You’ll obviously need to make tweaks etc based on your own scenario to get this perfect, but I’ve gotten it to the point now where it’s consistently outputting the amount of content, in the amount of detail, that I’m looking for - without creating any kind of disjointed janky final output.

I’m using “gpt-4-0125-preview” - I found base GPT-4 a bit too “independent”, it doesn’t like following really rigid instructions like is needed for this task. I suspect a lot of that is down to the lack of the response_format: { type: "json_object" } option in base GPT-4. This config setting is a MUST in order to get the data in/out to be consistently structured in a way that you can work with here.

Needless to say, but there are a lot of steps here and a lot of points at which the model can do dumb stuff. You will need to implement really good validation at each stage + error handling.

A few notes:

I tried basically all the suggestions from here, some got me “better” results (as in, got me to 750 words rather than 650), some made it worse.
Moving to GPT-4 base actually made it worse, surprisingly enough.
Including whole examples in my prompt made effectively no difference (surprisingly).
Hilariously - the thing that got me the “best” result, prior to this method documented here, was financially blackmailing the model. I basically told it as part of the prompt that it was generating content that had been requested by a publisher, and the publisher was only going to pay for the content if it was over one thousand words.
Always use the written version of numbers when dealing with GPT, not the numeric version - it’s MUCH better at understanding “one thousand” vs “1000”.
This gentle psychological abuse managed to net me an average of 850-875 words per response, vs the 650/675 I was getting before. Still not 1000, but better. Sadly, cranking up the the number the “publisher” was asking for, did not result in longer outputs than ~850.

edmund · February 11, 2024, 2:41am

I literally included the model and max token count in the last line of my post. Really don’t understand why you feel the need to be quite THIS aggressive in your replies.

jlvanhulst · February 11, 2024, 3:18pm

hey - certsinly apologize if it came out that way! I’m always super interested in these case and love to try them out myself - so always looking for those kind of details. You clearly did provide the model - so again apologies for missing that.

jlvanhulst · February 11, 2024, 3:30pm

One other thought I would add, for your prompt that I have noticed when working with things like creating summaries is to get into a fair amount of details on the output. Creating chapter like layout with detailed descriptions of what you want in subsections (including suggestions about the numbers of words etc) might help.

Also to consider - did you try adding the input content as a file - instead of in the prompt?

niranjankt · February 22, 2024, 10:02am

Thanks for this detailed explanation. I’ll give it a shot.

Topic		Replies	Views
Longer GPT 3.5-turbo Output Prompting gpt-35-turbo , api	23	4322	December 8, 2023
Gpt-3.5-turbo-16k Maximum Response Length Prompting api	33	34519	December 13, 2023
Can't get a model to follow a specific length / word count Prompting chatgpt	25	1165	December 19, 2024
GPT one paragraph reply? Condensation/Summary for core ideas (keep content depth) Prompting api	4	1734	February 10, 2024
New gpt-4-turbo-preview saying it can't help on complex prompt Prompting gpt-4 , api , gpt-4-turbo	7	2579	January 29, 2024

Cannot, for the life of me, get a detailed enough response

Related topics