Issues with Truncated Responses

I’m working on creating a prompt to generate multiple choice quizzes for teachers. The prompt is working pretty good (need to output in HTML, which makes it a bit more difficult), the only problem I have is the responses seemed to arbitrarily truncate themselves.

I’m using model gpt-3.5-turbo-16k, set my max tokens super high (10,000), and my responses are getting truncated around 3,500 total tokens (including the prompt).

usage: Object
    prompt_tokens: 630
    completion_tokens: 2817
    total_tokens: 3447

Wondering if there’s anything you guys can think of that might cause the response to truncate even without hitting the max tokens. (It stops midway through a HTML tag, no chance that it completed the response).

This AI model shouldn’t be trained as highly on curtailing the length of output as much as newest models, but still has that behavior where it will refuse to write long specified outputs, and when you present it 100 things to do instead of 10, it has the foresight to make the individual outputs very small and the need to stop at item 50 of 100 even when you have it produce item numbers also.

The output stops because the AI decided to emit a stop sequence token, closing the assistant chat message. You can see this when the finish_reason in the API response is indeed “stop”. This token that allows the AI to end its own output also cannot be demoted by the use of the logit_bias API parameter. The large context may simply grow to the point where the AI can’t give attention to its instruction and thinks it is done writing.

I have found that even on 8k input or context total on this model, AI can no longer perform adequately. An “improve quality sentence-by-sentence” prompt gets you the same text back, and a “rewrite this” gives you 1/3 the size.

I expect you will have much higher quality in asking for less individual outputs in a response (besides not encountering this problem). That will mean more system prompts and instructions in total, of course.

You can offset that prompting cost by using gpt-3.5-turbo-0613, the same model quality with 4k context…giving you 4k total tokens just like 16k is acting, but at half the price.

Then halve the price again by using the new batch feature, where you can produce a file containing all the API calls, which will be performed in a 24 hour window.

Thanks for the response!!

Just want to clarify here. So basically, in the context of a multiple choice quiz, if we ask it to create 50 questions, by question 25 or so it will lose it’s own context and just stop generating? Even if we are only at about 600 prompt_tokens and 3000 completion_tokens?

And you’re recommending that maybe I create multi step prompts where it loops and creates 1 question at a time?

If you discover the AI doesn’t write many question at once:

  • You ask it to write fewer questions at once.

If the AI would write the same questions again because you don’t have many ideas of how to ask differently:

  • Provide the previous “user” input and what “assistant” wrote as chat history.