The output is not a summary, it is truncation at the maximum context length remaining.
The full input text is not just “rather long” — it is 7879 tokens!
The output you paste is 306 tokens
7879 tokens + 306 tokens =8185 tokens. The context length of GPT-4 is 8,192 tokens.
The only model that would satisfy this is gpt-3.5-turbo-16k-0613
, 16,385 tokens context length, with the output not limited to only 4k like new models.
so we run. I get the exact effect seen before on such large context tasks: Nearly zero difference. When 8k input is maxed for rewriting, the only thing the AI does regardless of instruction is produce the same output right back at you.
So just like I find your input to have ellipsis from chunks of audio, you must process this in chunks of tokens, more like 700, to give the AI maximum capability to improve the quality without the urge to reduce the input.