Make JSON output more likely

lesswire1 · April 15, 2024, 9:04pm

Often you need the output of the LLM to be in json format. Even when you include phrases such as “the output must be in JSON format” or variations thereof, the LLM sometimes outputs something like “here is the output in json” or “```json”.
This makes the task of downstream parsing of the output of LLM fail (ie, not valid json).

A simple trick to increase the likelihood of LLM creating a pure json as output is to use the “logic_bias” parameter of openAI API. This parameter is a dictionary that specifies a bias (-100 means not likely and 100 means very likely) a token_id associated with characters to which you want to make more or less likely to appear in the output.
For example, the following increases the probability of “{” and “}” and decrease the probability of ``` or ‘’’ in the output.

  logit_bias: {
    "90": 10, //  token ID for "{"
    "92": 10,  //  token ID for "}"
    "19317": -10, // token ID for "'''"
    "19317": -10, // token ID for "'''"
    "74694": -10 // token ID for "```"   
  }

LinqLover · April 16, 2024, 12:04am

Did you use JSON mode?

https://platform.openai.com/docs/guides/text-generation/json-mode

Diet · April 16, 2024, 12:22am

I wouldn’t muck about with logit bias for this, because it can taint the rest of your generation. (depending on what you’re doing, ofc)

In detail:

"90": 10, // token ID for "{" could motivate the model to open more sub objects than expected in certain contexts

"92": 10, // token ID for "}" could motivate the model to prematurely close objects in certain contexts

I’d rather lower the temp and tell the model how to start. once you’re over the initial { hump, there’s not much that can go wrong

That said, logit bias is still a good tool to have under one’s belt!

jr.2509 · April 16, 2024, 4:01am

We had a similar thread recently. Can confirm that using JSON mode overcomes the problem with the unwanted prefix, e.g. ```json.

_j · April 16, 2024, 4:11am

Hump? You can be explicit in your instructions:

// AI Response from assistant

MANDATORY: Every single response produced after the "assistant" prompt must begin with the exact characters shown within the single-quotes: '{"function_name": "'

MANDATORY: Every single response then continues with valid JSON output complying to the included JSON schema, and will be validated, allowing no deviation.

Note: There is no user to communicate with directly. AI JSON output response is provided directly to an external API interface backend.

// JSON Schema for assistant response

(real json schema as you’d validate AI output with…)

Also, JSON can be an array starting with [

Diet · April 16, 2024, 9:09am

I believe you came up with this

Start your response with {

(or [, depending ofc)

Using that as the very last line the chat model sees tends to yield pretty good results. After that, it’s mostly smooth sailing (apart from the apostrophe issue, which can be overcome by using the right schema hint)

Now, of course, if our dear AI providers allowed us to take off the training wheels with these chat models, we’d have a lot more latitude.

_j · April 16, 2024, 9:27am

Yes, other competitive AI (ahem) allows assistant prompt completion, where you could place that JSON opening and a key directly after the internal “assistant” that is automatically added. It works a treat to write the start of the JSON for -instruct on OpenAI completions, set the additional stop sequence to the end of JSON, and pay output rate for only the varying internal content. Just need someone at OpenAI go give ChatML that, and to not treat the developer as an untrustworthy child like the assistants endpoint does.

chiheb.design · April 17, 2024, 8:00am

Nice Tip , I developed a tool ( aipify dot co ) specifically for this issue, and now all my side projects are powered by it.

Topic		Replies	Views
JSON Response + logit_bias API	4	809	March 15, 2024
There is a new Broken-JSON variant in the response API gpt-4 , api	21	1141	June 1, 2024
How to get 100% valid JSON answers? Prompting gpt-4 , gpt-35-turbo , chatgpt , api	16	8086	June 11, 2024
Valid json every time? Prompting	17	11897	January 3, 2024
Ensure JSON response format API	23	44739	February 19, 2024

Make JSON output more likely

Related topics