Bug: Infinity Stream Loop with response_format

m.ali · November 26, 2023, 9:32pm

When response_format parameter set to ‘json_object’ leading to an infinite stream loop

POST /v1/chat/completions HTTP/1.1
Host: api.openai.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:120.0) Gecko/20100101 Firefox/120.0
Accept: text/event-stream

{
    "response_format": {
        "type": "json_object"
    },
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Tell me a joke"
        },
        {
            "role": "assistant",
            "content": null,
            "tool_calls": [
                {
                    "id": "call_FZP4cgNT6LmG5TAaBfrFAwzO",
                    "type": "function",
                    "function": {
                        "name": "Joke",
                        "arguments": "{\n  \"about\": \"\"\n}"
                    }
                }
            ]
        },
        {
            "tool_call_id": "call_FZP4cgNT6LmG5TAaBfrFAwzO",
            "role": "tool",
            "name": "Joke",
            "content": "json: Why should I use this tool?"
        }
    ],
    "top_p": 1,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "model": "gpt-3.5-turbo-1106",
    "stream": true
}

m.ali · November 26, 2023, 9:35pm

Screen Recording 2023-11-27 at 12.25.15 AM

Foxalabs · November 27, 2023, 1:22am

You have stream set to true which will generate a new data packet for almost every token produced, are you sure it is not that?

nikunj · November 27, 2023, 1:45am

Thanks for sharing this. This is a known limitation – change the system message to something that mentions JSON like “You are a helpful assistant designed to output JSON” for a fix. This is documented here: https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format and here: https://platform.openai.com/docs/guides/text-generation/json-mode

We’re exploring better ways to prevent this from happening longer term.

m.ali · November 27, 2023, 1:14pm

I’m sure it’s a bug, it will keep generating new tokens forever.

Foxalabs · November 27, 2023, 4:03pm

Yes, as confirmed by nikunj. Thanks for your post, it helps refine and build the API.

_j · November 27, 2023, 5:24pm

Regardless whether the special GPT-4 AI model dumps a whole bunch of nonsense repeating tokens or says “I have no idea what you want me to produce”, you can expect unsatisfactory output with that prompting.

The model can’t magically understand what type of json to produce without further specification, and training AI on some default would only degrade the quality of other specifications.

It looks like you want to chat with an AI. So let’s do a system message that is actually going to produce useful chat output you can parse and more demonstration, with full-on schema programming:

You are a helpful AI assistant.
Your output is to an API.
Response to user and metadata will be extracted from output json.
Except for tool calls, create only valid json complying to schema.

// json output schema
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "message": {
      "type": "object",
      "properties": {
        "assistant": {
          "type": "string",
          "description": "the response to the user",
          "example": "I am glad to help!"
        },
        "topic": {
          "type": "string",
          "description": "subject of recent discussion",
          "example": "baseball"
        },
        "user_mood": {
          "type": "string",
          "description": "user is happy, upset, etc",
          "example": "neutral"
        }
      },
      "required": [
        "assistant",
        "topic",
        "user_mood"
      ]
    },
    "length_of_conversation_turns": {
      "type": "number",
       "description": "total user and assistant conversation exchange turns"
    },
    "conversation_turns_on_topic": {
      "type": "number",
      "description": "number of recent turns engaging newest topic"
    }
  },
  "required": [
    "message",
    "length_of_conversation_turns",
    "conversation_turns_on_topic"
  ]
}

You will see that quality specification kind of makes the json model invoked by response format pointless.

(I also show the AI informing you how much you can lop off the conversation history)

Topic		Replies	Views
Gpt-4-0125-preview seems to have a 4k total token limit? Bugs	4	1262	March 4, 2024
Json format causes infinite "\n \n \n \n" in response API gpt-4 , api , json-mode	20	8810	February 21, 2025
Response_format=json_object returns invalid json with finish_reason=stop Bugs json-mode	7	214	January 7, 2025
JSON Mode with GPT-4 turbo stops after 1050 token Bugs api , gpt-4-turbo	26	4696	February 5, 2024
Something wrong with the new json mode API	1	1716	November 8, 2023

Bug: Infinity Stream Loop with response_format

Related topics