Bug: Infinity Stream Loop with response_format

When response_format parameter set to ‘json_object’ leading to an infinite stream loop

POST /v1/chat/completions HTTP/1.1
Host: api.openai.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:120.0) Gecko/20100101 Firefox/120.0
Accept: text/event-stream

{
    "response_format": {
        "type": "json_object"
    },
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Tell me a joke"
        },
        {
            "role": "assistant",
            "content": null,
            "tool_calls": [
                {
                    "id": "call_FZP4cgNT6LmG5TAaBfrFAwzO",
                    "type": "function",
                    "function": {
                        "name": "Joke",
                        "arguments": "{\n  \"about\": \"\"\n}"
                    }
                }
            ]
        },
        {
            "tool_call_id": "call_FZP4cgNT6LmG5TAaBfrFAwzO",
            "role": "tool",
            "name": "Joke",
            "content": "json: Why should I use this tool?"
        }
    ],
    "top_p": 1,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "model": "gpt-3.5-turbo-1106",
    "stream": true
}

Screen Recording 2023-11-27 at 12.25.15 AM

You have stream set to true which will generate a new data packet for almost every token produced, are you sure it is not that?

Thanks for sharing this. This is a known limitation – change the system message to something that mentions JSON like “You are a helpful assistant designed to output JSON” for a fix. This is documented here: https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format and here: https://platform.openai.com/docs/guides/text-generation/json-mode

We’re exploring better ways to prevent this from happening longer term.

2 Likes

I’m sure it’s a bug, it will keep generating new tokens forever.

Yes, as confirmed by nikunj. Thanks for your post, it helps refine and build the API.

Regardless whether the special GPT-4 AI model dumps a whole bunch of nonsense repeating tokens or says “I have no idea what you want me to produce”, you can expect unsatisfactory output with that prompting.

The model can’t magically understand what type of json to produce without further specification, and training AI on some default would only degrade the quality of other specifications.

It looks like you want to chat with an AI. So let’s do a system message that is actually going to produce useful chat output you can parse and more demonstration, with full-on schema programming:

You are a helpful AI assistant.
Your output is to an API.
Response to user and metadata will be extracted from output json.
Except for tool calls, create only valid json complying to schema.

// json output schema
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "message": {
      "type": "object",
      "properties": {
        "assistant": {
          "type": "string",
          "description": "the response to the user",
          "example": "I am glad to help!"
        },
        "topic": {
          "type": "string",
          "description": "subject of recent discussion",
          "example": "baseball"
        },
        "user_mood": {
          "type": "string",
          "description": "user is happy, upset, etc",
          "example": "neutral"
        }
      },
      "required": [
        "assistant",
        "topic",
        "user_mood"
      ]
    },
    "length_of_conversation_turns": {
      "type": "number",
       "description": "total user and assistant conversation exchange turns"
    },
    "conversation_turns_on_topic": {
      "type": "number",
      "description": "number of recent turns engaging newest topic"
    }
  },
  "required": [
    "message",
    "length_of_conversation_turns",
    "conversation_turns_on_topic"
  ]
}

You will see that quality specification kind of makes the json model invoked by response format pointless.

(I also show the AI informing you how much you can lop off the conversation history)