Json format causes infinite "\n \n \n \n" in response

I am trying to use json format to get json response, it worked well when I give same short example, but when I use production data to test it, prompt_token = 2966, then it start to response with all “\n \n \n \n”, till max token. Then, I use the same prompt without response_format, it worked well though it’s not a json object.

(test with a another shorter user message still failed, is that the “#” matter?)

model: gpt-1106
usingAzure: True

To Reproduce

tsg = """
# TSG for debugging openai
## 1. install openai
### 1.1. install openai with pip
### 1.2. install openai with conda
## 2. import openai
### 2.1. import openai in python
### 2.2. import openai in jupyter notebook
## 3. use openai
### 3.1. use openai in python
### 3.2. use openai in jupyter notebook
"""
messages = [
                {
                    "role": "system",
                    "content": 
                        """
                        You are an expert of reading troubeshooting guidance.
                        The users will provide you a troubleshooting guide in markdown format, which consists of several steps.
                        You need to break down the document into steps based on the text semantics and markdown structure and return JSON format
                        Note that: (1) Ignore the document title if it does not indicate a step. (2) Only do text slicing from front to back, can't lose any content of the step. (3) Maintain the original text in task_description without any summarization or abbreviation. (4) Don't lose the prefix and serial number of the title displayed in the document. (5) If the step itself has a title in document, the task_title should use the original content.
                        You will respond with the list of steps as a JSON object. Here's an example of your output format: 
                        [{
                            "task_title": "",
                            "task_description": "",
                        },
                        {
                            "task_title": "",
                            "task_description": "",
                        }].
                        Here is an example of the input markdown document:
                            # Troubleshooting guide for buying a puppy
                            ## 1. know what puppy you want
                            ### 1.1. you could surf the internet to find the puppy you want
                            ### 1.2. visit friends who have puppies to see if you like them
                            ## 2. buy healthy puppies
                            ### 2.1. you could go to puppy selling websites to find healthy puppies, if you prefer buying puppies online, please go to step 3 for more information
                            ### 2.2. you could go to pet stores to find healthy puppies
                            ## 3. buy puppies online
                            here is a list of puppy selling websites: www.happydog.com, www.puppy.com, www.puppylove.com
                        Here is an example of the output json object:
                        [{
                            "task_title": "1. know what puppy you want",
                            "task_description": "### 1.1. you could surf the internet to find the puppy you want\n### 1.2. visit friends who have puppies to see if you like them"
                        },
                        {
                            "task_title": "2. buy healthy puppies",
                            "task_description": "### 2.1. you could go to puppy selling websites to find healthy puppies, if you prefer buying puppies online, please go to step 3 for more information\n### 2.2. you could go to pet stores to find healthy puppies"
                        },
                        {
                            "task_title": "3. buy puppies online",
                            "task_description": "here is a list of puppy selling websites: www.happydog.com, www.puppy.com, www.puppylove.com"
                        }
                        ]
                        """
                },
                {
                    "role": "user", 
                    "content": tsg
                }
            ]

response = llm.client.chat.completions.create(
            model = llm.engine,
            messages = messages,
            response_format={"type": "json_object"},
            temperature = 0,
        )```

when I shorten the system message, it worked. Howerver, when the user message longer, it generate “xxxxx \n \n \n …” and reached 4096 token, is that because the generation too long? but why disable json format is ok?

is that json format takes more tokens because of the whitespace for formatting?

\n        {\n            "steps": [\n                {\n                    "task_title": "1. Find out which requests timeout",\n                    "task_description": "Generate and Run kusto query:```

I’m facing the same issue with json_mode and my fine-tuned gpt-3.5-turbo-1106 model. I always get tons of \n and \t when calling via my API even though the playground responds just fine

1 Like

It seems like you guys face this:

  • When using JSON mode, always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don’t include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don’t forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.

Which is weird considering you have json mentioned.

Does ist exceed your token count at a certain point?

This is a bug we see with the assistant api on the same model when we try to generate JSON.

the output token exceeded, though I have no idea why it generate so many ‘\n’ to waste my token? the input token is nearly 3k, I think it’s really short

Did you find a fix for this, I’m getting a similar problem even though I’m very clear in specifying JSON.

1 Like

and yet this says JSON is ensured…

It also supports our new JSON mode, which ensures the model will respond with valid JSON

When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON object.

You have to read through the claims to imagine what the technology actually does.

But the fault seems to be in mistraining. Despite what are probably some logit_bias games.

  • The AI should not be outputting nulls or carriage returns as its first tokens
  • Tabs produce poorer-quality joining with other tokens
  • The AI is even more prone to repetition - what you see

Let’s crank the frequency penalty to max until the AI runs out of tokens it can’t produce again, and ask about lobsters (individual tokens in single quotes)

‘’ ’ \n’ ’ ’ ‘\t’ ‘\t’ ‘{"’ ‘Success’ ‘":’ ’ true’ ‘,’ ’ “’ ‘Data’ '”:’ ’ {“’ ‘Response’ '”:’ ’ “’ ‘The’ ’ Maine’ ’ lobster’ ’ is’ ’ not’ ’ a’ ’ different’ ’ breed’ ‘.’ ’ It’ ’ is’ ’ the’ ’ same’ ’ species’ ’ as’ ’ other’ ’ North’ ’ American’ ’ lob’ ‘sters’ ’ (’ ‘Hom’ ‘arus’ ’ american’ ‘us’ ‘),’ ’ but’ ’ it’ ’ has’ ’ been’ ’ suggested’ ’ that’ ’ the’ ’ cold’ ’ water’ ’ of’ ’ the’ ’ North’ ’ Atlantic’ ’ off’ ’ the’ ’ coast’ ’ of’ ’ Maine’ ’ may’ ’ contribute’ ’ to’ ’ its’ ’ superior’ ’ taste’ ’ and’ ’ quality’ '.”’ ‘}}\n’ None

========================
‘’ ‘\n’ ’ ’ ‘\t’ ‘\t’ ‘{"’ ‘text’ ‘":’ ’ “’ ‘Yes’ ‘,’ ’ the’ ’ Maine’ ’ lobster’ ‘,’ ’ also’ ’ known’ ’ as’ ’ the’ ’ American’ ’ lobster’ ‘,’ ’ is’ ’ a’ ’ distinct’ ’ species’ ’ from’ ’ other’ ’ types’ ’ of’ ’ lob’ ‘sters’ ’ found’ ’ in’ ’ different’ ’ regions’ ‘.’ ’ It’ ’ is’ ’ known’ ’ for’ ’ its’ ’ delicious’ ’ taste’ ’ and’ ’ is’ ’ highly’ ’ prized’ ’ in’ ’ culinary’ ’ dishes’ '.”’ ‘}\n’ None

' 
 \n'
  '
\t'
\t'
{"'
Success'

So without json being specified, and a schema and key specification, there’s several easily-repeated tokens to exhaust before getting to json: tabs, two-space indents, linefeed for no reason. AI is also happy to repeat back my input on different system messages because it is eager to loop.

logit_bias dictionary could remove these and also the waste of indented multi-line json. logprobs now could reveal more.

Guys, I think encountering a very specific situation where the model trumps. I use json mode regularly an several use-cases and don’t encounter this problem (I only used to initially when didn’t setup the prompts-params properly).

1 Like

Having the same issue on all models, never happened before, and doesn’t appen with the same prompt executed on other data…I have no clue how to fix this.

1 Like

we had some limited success here, introducing logit bias for \n

"logit_bias": {1734:-100}

But it’s certainly not an optimal solution. :confused:

Noticed the same behavior. This makes JSON mode completely unusable at the moment.

1 Like

Seeing this with gpt-4-turbo-preview. My system prompts tell it to respond in JSON and I set response_format={"type": "json_object"} when I create the completion.

It only happens sometimes but it does seem to happen more once the message list I am sending gets large.

1 Like

I wrote a short script to repro the bug. Tried to provide it as a github link but apparently can’t use links here. This assumes your API key is available as an env variable. After total tokens gets around 3k you will eventually get a response back which is infinite newlines.

from openai import OpenAI
import json

client = OpenAI()
model = "gpt-4-turbo-preview"

system_message_text = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit
amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim
id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem
ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit
amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et
dolore.
Reply in JSON format using this schema {'response' : your response}
"""


system_message = {"role": "system", "content": system_message_text}
user_message = {"role": "user", "content": "Give me a random fact."}
msg_array = [system_message, user_message]

while True:
    completion = client.chat.completions.create(model=model, response_format={"type": "json_object"},
                                                messages=msg_array)
    print (completion)
    ai_response = json.loads(completion.choices[0].message.content)["response"]
    msg_array.append({"role": "assistant", "content": ai_response})
    msg_array.append(user_message)

Also getting the same issue when streaming JSON. Sometimes it works, but half the time it will just stream empty chunks that either have ‘\n’ or ’ ’ as their content, breaking my app. I have tried all of the models that support JSON mode. No luck.

First: Your quality of prompting alone should be enough that correct JSON is produced all the time without needing this JSON response mode parameter. Only then can you consider turning it on.

It seems to operate outside the domain of logit_bias or sampling you are in control of, rewriting runs of tokens to be “compliant” - in this case absurdly compliant that JSONs have linefeeds and tabs in them.

It wouldn’t be an issue at all if the AI models weren’t ridiculously trained on spitting out requests for JSON by a user in a markdown container. That is another thing you must deny the AI by prompting.

I have solved it. If you are stringing together ‘user’ and ‘assistant’ messages and you instructed JSON output in your system prompt, then it forgets the instruction after about 4 message exchange pairs (even with gpt-4-turbo). So you need to remind it to output JSON in your most recent ‘user’ message.

Just use the pattern:

user_prompt_msg ={'role': 'user', 'content': f"""(Unspoken NOTE: Don't forget to respond in structured JSON using the schema defined above!) {query}. """}

api_messages_list = [sys_prompt_msg] + central_messages_list + [user_prompt_msg]
...
model='gpt-3.5-turbo-0125',
messages= api_messages_list,
response_format={ "type": "json_object" },
stream=True,
...

This seems to do the trick. Obviously, you will need to implement a mechanism that limits the number of message objects in central_message_list to avoid hitting token limits.

But ever since I’ve been using this pattern, it has always been streaming valid JSON, since it is reminded in the most recent message.

Let me know if this works for you!

2 Likes

I was under the impression that if I don’t specify a conversation_id in the ChatGPT API request, it would automatically start a new session. However, I’m not sure if the “message exchange pairs” mentioned in the documentation refer to the chat session itself.

Could someone please clarify the relationship between message exchange pairs and chat sessions in the ChatGPT API? If I don’t provide a conversation_id, does it mean that each API request is treated as a new, independent chat session?

Chat Completion is a stateless API, meaning that there is no such thing is chat session. Each request is separate and its only way to stay consistent in a conversation - is to send its history in some form (whether full or summarised) with each request.

2 Likes

Any updates on how to reasonably fix this ? I’m still having the same issue, despite adding the system message as the most recent message so it does not “forget” about it (which makes no sense since it fails some times even after the first user message! and they promote it as a 128k context model…).