Token limit during completion

How to tell when token context length is exceeded due to the response? If the input alone exceeds it, OpenAI responds with error code context_length_exceeded . But it seems that if it is exceeded while generating a response, it just truncates the response and does not throw an error, which causes a lot of downstream problems.

You could create a script that broadcasts an error message if token length is exceeded.

Welcome to the OpenAI community @ball

Here’s how the chat completion response looks like:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0613",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

If the response was cut-off due to exceeded context length during generation, the property finish_reason will have the value length.

If everything goes right, the value will be stop

3 Likes

This is the correct answer:

" If the response was cut-off due to exceeded context length during generation, the property finish_reason will have the value length ."

This is helpful. Thank you. Now I need to figure out how finish_reason gets exposed in Langchain. Seems like there is some information at github /langchain-ai/langchainjs/issues/2099 but I haven’t been able to get to the right data yet. Anyway, thank you for your help.