Chat Instruct response being truncated, reason given: finish_length

I’m switching from gpt-3.5-turbo-0613 to gpt-3.5-turbo-instruct and I’m running into an issue.

Here’s an example prompt:

  "model": "gpt-3.5-turbo-instruct",
  "prompt": "Using the following statements as your corpus, write a clear, actionable, and factual sentence -- and one sentence only -- using proper grammar, punctuation, and capitalization. Use no more than twenty words and should be as concise as practical. Avoid referencing the task or the text in your response. For example, if the statements are about the benefits of regular exercise, a desirable summary could be: 'Engaging in regular exercise improves physical and mental health, enhancing overall well-being.':\n\nYou get challenging projects to work on and you have a chance to make a real impact.\nDynamic nature of work and challenging projects."

and here’s the response:

  "id": "cmpl-83XGzqj3iDEkZLgZDzTXbcRuPcSXt",
  "object": "text_completion",
  "created": 1695853577,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
      "text": "\n\nEngage in dynamic, impactful work on challenging projects to hone skills and make",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
  "usage": {
    "prompt_tokens": 122,
    "completion_tokens": 16,
    "total_tokens": 138

What’s happening is that consistently the text field is truncated. This was not a problem with gpt-3.5-turbo-0613 using the same prompt. I’ve gone over the documentation but am not seeing anything I should be doing. Am I missing something?

With ChatCompletions, the default max_tokens is infinite.
With the Completions endpoint, the default max_tokens is 16.
You need to set it to the maximum length of the desired output (reserved from the context length) if you expect more than a few words.

An example with options spelled out:

    response = openai.Completion.create(
        prompt      = string
        model       = model_name,
        temperature = temperature, # start at 0.6
        max_tokens  = max_tokens,  # maximum response length
        stop        = "", # often needed for completions
        top_p       = top_p, # reduction from 1 to 0.95 useful
        presence_penalty = 0.0,  # penalties -2.0 - 2.0
        frequency_penalty = 0.0,  # frequency = cumulative score
        n           = 1,  # gets you multiple trials
        stream      = True,
        logit_bias  = {"100066": -1},  # example, '~\n\n' token
        user        = "site_user-id", # optional customer tracking
1 Like

I see. Thank you very much. I must have missed that.