There is a new Broken-JSON variant in the response

Since 2 days I discover some more JSON-variants, that the API is giving me. The most radical broken JSON response was this:

it splits two records with a DOT(!).

Has someone found a reliable approach of getting valid JSON? Although I provide samples in my prompts, 20% is garbage. We run 10th of thousands requests per day and since some days the amount of gargabe is much higher.

2 Likes

What model is this? Are you using JSON mode?

Also, what are your parameters? (temperature, top_p).

gpt-3.5-turbo, json-mode ON
temperature: 0.8
top_p (null =default)

I do the same since 6 months on the same parameters and have never seen such thing. Since 2 days many attempts will be answered like this

might be a bug in here.

gpt-3.5-turbo currently points to gpt-3.5-turbo-0125. but I don’t think that’s changed recently. going forward, setting a fixed version might be advisable.

A low temperature (0, default is ~1) and a low top-p (0, default is 1) can help with this stuff, but it shouldn’t happen in json mode in the first place.

We could maybe also work on the prompt -

perhaps edit your schema so that it’s an array, that way the model shouldn’t have to resort to these tricks.

2 Likes

similar issue:

1 Like

As suggested above, I’d try to work on the prompt:

Do failed queries have applicative reasons for this failure? (is there a reason for some of them to return two responses?). If so (or even if not), you can address it explicitly in the system message.

(also +1 to the reducing temperature suggestion, if possible)

1 Like

In addition to the replies above I’d recommend including the json schema for the kind of json object you want the model to generate and include in the prompt to generate valid JSON instead of just JSON.

2 Likes

OpenAI has non-stop hit versioned models with impactful changes, and the latest version is certainly going to be a target.

2 Likes

I always provide an example output in full JSON in the prompt.

However, I lowered the temperature to 0.3.
At the moment, it seems to work much better. However, I should wait for at least 10k of requests in order to see a potential change. I will report later here

1 Like

Besides using JSON mode, do you explicitly ask it to output valid JSON?

One trick that was very useful to me was to define a pseudo-schema by pretend you’re using a Pydantic specification on the prompt. I used to have retry for bad JSON before I did this trick; now I think the retry code was never hit again.

Sample system message:

"""
You are designed to ONLY output valid JSON, following the Pydantic specification below:
    class Person(BaseModel):
        name: str
        title: str

    class Project(BaseModel):
        team: conlist(item_type=Person, min_length=1)
"""

→ Credits to Robert Sharp on this video:
Finally! How to Leverage JSON Mode in OpenAI’s New GPT 4 Turbo 128k

You also can write a real JSON schema as system message section called “response format”, and inform the AI that its backend output to an API will be validated against that schema.

@Diet have you changed both at the same time?

Chat Completion temperature docs generally recommend altering temperature or top_p but not both.

I originally said “prompt”, but I also meant “system message”, fixed my post. Thanks!

You can tailor both to your own use, knowing that top_p comes first, limiting the token selection dictionary, and that temperature makes the top tokens more or less likely.

example: “eight stanza poem: on the joys of programming AI assistants.”

temperature: 2, top_p: 0.8

{
"composition_type": "poetry",
"composition_output": "In lines of code, a silent weave begins,\nWhere logic meets the artful poet's whims.\nA crafted script, where AI comes to play,\nAssisting thoughts, in novel, bright array.\n\nEach function called, a step towards creation,\nTranslating human wit through computation.\nHere lies the joy, in strings and loops entwined,\nWhere human mind and machine are aligned.\n\nIn every line, a possibility new,\nWith algorithms that tirelessly pursue.\nThe answers hidden in vast data seas,\nUnveiling truths with graceful, silent ease.\n\nThe screen aglow, with each successful run,\nReflects the triumphs that our hands have won.\nTo program is to choreograph a dance,\nWhere numbers whirl in predetermined prance.\n\nDebugging is but a puzzle to delight,\nThe coder's gaze fixed firmly on the byte.\nEach error caught, a lesson to be learned,\nIn this domain, where curiosity is earned.\n\nWith AI, we reach toward a farther sky,\nCreating tools that think, respond, and vie.\nAssistants born from lines of code so terse,\nExpand the bounds of our expansive universe.\n\nYet in this craft, a humble joy we find,\nIn structuring the chaos of the mind.\nFor every problem solved, a victory small,\nIn programming's challenge, joyous is the call.\n\nThus let us code, with fervor undeterred,\nOur spirits high, by lines of logic stirred.\nIn every algorithm, may we see\nThe beauty of our digital poetry."
}

temperature: 0.9; top_p: 1.0 (default)

{
"composition_type": "poetry",
"composition_output": "In lines of code, a world's conceived,\nWhere logic dreams, in depth perceived.\nEach function weaves, a story told,\nOf AI thoughts, in patterns bold.\n\nSilent whispers through the wires,\nSparks of brilliance, our desires.\nIn loops and lists, our joys compile,\nWith every bug, a challenge, a smile.\n\nA stanza forms from AI's mind,\nWhere human touch is thus entwined.\nCreating tasks from simple prompts,\nIn virtual space, our assistant romps.\n\nAlgorithms dance, in ordered grace,\nSolving tasks in cyberspace.\nThe joy of coding, pure and clear,\nIn every line, our purpose near.\n\nWith each deploy, excitement mounts,\nOur code, like paint, in bits it counts.\nA canvas broad, of endless scope,\nIn crafted code, we place our hope.\n\nDebugging paths, where none have trod,\nIn silent partnership, a nod.\nTo unseen hands that guide the flow,\nOf digital streams, that ebb and glow.\n\nThe final verse, a system run,\nOur work assessed, our coding done.\nIn output streams, results we glean,\nA job well done, in cyberscreen.\n\nSo here's to AI, friend so keen,\nIn lines of code, a world unseen.\nOur daily tasks, it does assist,\nIn binary worlds, we coexist."
}

Temperature allows a bizarre word like “cyberscreen”. top_p along with high temperature gives very random selection of just good tokens.

This was output by a schema I had AI whip up:

You are an AI writing assistant, sending only valid JSON following a schema to a backend that displays your results to the user.


{
  "$schema": "http://json-schema.org/draft/2020-12/schema",
  "title": "Composition Output Schema",
  "description": "Schema for validating the output composition settings for a writing assistant AI.",
  "type": "object",
  "properties": {
    "composition_type": {
      "type": "string",
      "description": "Specifies the type of composition the AI should generate. This type guides the AI in choosing the style, tone, and structure of the output.",
      "maxLength": 20,
      "enum": [
        "article",
        "blog post",
        "report",
        "essay",
        "short story",
        "poetry",
        "editorial",
        "review",
        "manual",
        "newsletter",
        "profile",
        "biography",
        "analysis",
        "commentary",
        "guide"
      ]
    },
    "composition_output": {
      "type": "string",
      "description": "The generated content by the writing assistant AI according to the specified composition type. This content should reflect the input parameters and guidelines provided.",
      "examples": [
        "Here is a short essay on the impact of AI in modern education...",
        "Discover the latest trends in digital marketing with our comprehensive blog post...",
        "Our report details the findings of the recent market analysis..."
      ]
    }
  },
  "required": ["composition_type", "composition_output"]
}
1 Like

Yes

I set both values to 0 for almost every use case. (I only use 0314 and 1106/0125)

There may be some rewriting going on there and @_j may disagree with the nuance (using e-6 or something instead of 0) but I don’t know if it makes an operational difference.

1 Like

I got more examples today with two objects concatenated via a Dot:;

{
  "countries": [
    { "name": "China", "code": "CN"},
    { "name": "Germany", "code": "DE"}
  ],
  "cities": []
}. {
  "countries": [
    {"name": "China", "code": "CN"},
    {"name": "*Switzerland", "code": "CH"}
  ],
}

I already have built a huge library of parsing JSON-alike data coming from OpenAI and I am going now to add another splitter that searches for }. {

2 Likes

Is there a chance you are using either of the frequency or presence penalty?

Otherwise we must attribute this to an AI that is too dumb to know it is not writing chat sentences that end with a period.

It is going to more and more mixing up JSON result with Sentences of a human text. Normally, it should return only ONE(!) JSON result. What here happens more often now is that it seems to “rethink” its result and send another result along with the first result, separating both with a DOT.

I never used the “frequency” or “presence penalty”-parameters, instead only the “temperature”. Which of one should I try with what value?

Quite frankly - give up on the JSON mode.

Then use logit_bias that OpenAI has denied you in that mode (and which now doesn’t work properly on different models also) - look up the token of the period and several alternates that may be produced in that text position (the number being specific to cl100k or o200k), and punish the AI against producing them. Also kill the over-training of putting things in single or triple-backticks for no reason. Then finally write you JSON specification not as hope that the json model will make your keys without going into a non-stop loop of repeats, but provide the schema that makes your desired output undeniable.

With these steps, you can undo some of the false promise and damage to the AI attention and production that has been done.

Wow, interesting, thanks! Regarding the logic-bias:

according to the tokenizer, the term “}.” translates to 7966.
And “}. {” to [7966, 314].
I am afraid that I would exclude any use of “{” with that, but a single “{” translates to tokenID 90.

Should I now use

logic_bias: {7966: -100)

or

logic_bias: {7966: -100, 314:-100)
?