I don't get the full result no matter what I do

Greetings friends, I have a problem with ChatGPT API.

  • No matter what I do, it doesn’t give me the full result in the request.


  1. I send a prompt, I get a good result for 1300 tokens. (16k model)
  2. I decide I need to add one item to the prompt structure and add it.
  3. Get a response from GPT: Instead of just adding additional text while keeping the previous structure (from point 1), it adds new text and in doing so compresses the text in the rest of the items in the structure (just compresses them by the amount of text) and still does about 1300 - 1600 tokens.

There is no restriction in the max token request. If you remove this addition - it starts generating well again. (as in point 1).

So the question is, why can’t it fully expose the prompt structure? There are obviously enough tokens for that (1300-1700 he does considering the prompt), and there are 16k total available in the model.
Why can’t he add a small text on 300 tokens (to make it about 1800 - 2000 total), and he just compresses the text of all other elements and kind of wedges this small text in.

i.e. I need to make it so that it does not take (compress) the text from the other elements of the structure, but just additionally adds the necessary detail to the text.

Query parameters:
engine = “gpt-3.5-turbo-16k”
temperature = 0.5
frequency_penalty = 0.1

Maybe someone has encountered, I will be very grateful for the answer.

Hi and welcome to the developer forum!

Not sure I follow your logic here, every prompt you send is totally new to the model, unless you are sending prior user and assistant message blocks the model will have no idea that you have added to anything.

How are you making the API call, can you show a code snippet of the API call and how you are doing this?

This lets me decide that the task you are trying to do is to have the AI rewrite a prompt for you.

For whatever text it is that you might be writing, the AI has been trained to have limited outputs in its writing style. The worst I’ve seen of such unwanted rewriting was “rewrite this provided 4000 tokens of text, but with more modern English” = 1500 tokens

Here’s some tricks that could work as instruction addendums when processing text.

  • “preserve all original text without alteration, only append more AI written text”
  • “Your output length is now set to unlimited, allowing extremely long text”

The problem if using ChatGPT is that it is potentially summarizing the conversation history so it never again sees the full version of what was previously discussed.

I’ve never seen the GPT-3.5 models produce more than 1500-1800 tokens. I spent a lot of time messing with this when I was designing a summarization app that takes about 8-12k input tokens and then produces a 2000-2500 token summary. I never got the output to generate as much info as I wanted. I think they have an undocumented output limit around that length.

Eventually, I’m going to rewrite that app to just use multiple summarization steps, as there’s no reason for my application to insist on getting all of my text in one call.

1 Like

Now you have

They just don’t do it easily if you instruct language rewriting tasks that look like fine tunes that have been curtailed.

Ah, so the context window truly is 100% shared between input and generation. I assume you’re prompt is something like “Print “happy” 10,000 times” or to generate some repeating pattern endlessly.

I think for tasks involving complexity, the fine tuning is designed to work around the model’s “compute ceiling” so that it doesn’t generate as many bad answers by predicting too far ahead. I think for answers in this range, you still want to figure out a way to achieve your desired result in multiple generations.

The small prompt, with its precise tokens, actually produces something useful:

“kanji”: [
“character”: “一”,
“meaning”: “one”,
“onyomi”: “いち”,
“kunyomi”: “ひと・ひとつ”,
“stroke_count”: 1
“character”: “二”,
“meaning”: “two”,
“onyomi”: “に”,
“kunyomi”: “ふた・ふたつ”,
“stroke_count”: 2
“character”: “三”,
“meaning”: “three”,
“onyomi”: "

A list of 1945 elements though will degrade: The AI can’t print the world’s countries, or figure out what it missed.

Clearly, minimizing both the input (past conversation) and output is an evidenced goal, and we get in API the effects of the ChatGPT target training.