[Feature Request] Function Calling - Easily enforcing valid JSON schema following

Hi,

I was very excited to see the new function calling feature, then quickly disappointed to see that it doesn’t guarantee valid JSON. This was particularly surprising to me as I’ve personally implemented a valid JSON enforcer on a local LLM before, so unless OpenAI is doing something particularly different, it should be easy for OpenAI to do.

They can enforce any context free grammar quite easily. Simply track which tokens are valid according to the CFG at each sampling step, and dynamically mask tokens (like logit bias) to force the model to follow the CFG.

This has an additional advantage: it saves compute and reduces response time! When conforming to a JSON schema, there’s often only a single valid option for the next token. In this case, consulting the model can be skipped.

Long term I’d love to see support for enforcing arbitrary user specified CFGs, but properly enforcing JSON schemas in the existing API would be great!

A small related triviality: It would be nice to be able to get a pure JSON response without wrapping it in a function call. That wastes a few tokens and requires me to do a (tiny) amount of extra unpacking work.

4 Likes

Even before the new model came out yesterday, I found using a couple of samples to be 99% accurate when it comes to generating a specific format, be it JSON or CSV

In my experience, it really depends how detailed the JSON schema is. For basic things, I find the same, but for more complex things it’s sometimes a little less reliable. This is with 3.5 rather than 4. I’ve not even found a prompt that reliably returns the JSON without any other text as much as 99% of the time, but Function Calling should fix that at least!

Part of the cost of it being unreliable is that I’m forced to implement a manual schema checker on the result. That’s fine, I can do it, but across every user across every JSON request in their code, that’s quite a lot of unnecessary boilerplate.

IMO it’s simply ergonomically bad to have an API that takes a schema and PROBABLY follows it, when it could provide a nice guarantee. Needing to pull down results in a structured form for consumption by “dumb” software is an incredibly common/core requirement and should be catered to as well as possible.

A separate but related point: The JSON schema format is quite verbose. I usually prefer to specify my JSON format for the model using TypeScript syntax.

First the example from the docs in JSON Schema format:

{
    "type": "object",
    "properties": {
        "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA",
        },
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
    },
    "required": ["location"],
}

Now as TypeScript:

{
    // The city and state, e.g. San Francisco, CA
    location: string;
    unit?: "celsius" | "fahrenheit";
}

The latter is significantly more compact, so would save a few tokens. The models already seem to understand it well enough. It would be cool if this could be supported in the future.

2 Likes

I’m using 4 for my generation, and they are a bit on the complex side as well, so it might just differ case to case.

Also, sorry if I’m wrong in understanding this, but how would you define a guarantee ? GPT is a LLM and while it might be good at generating a lot of different things, something like getting a JSON back might not be the most feasible as it is at its core a probability based model and will throw up an error here and there.

I had to do stuff like this to force it in edgecases:

Extract x and y from userinput.

Response with { ‘x’: ‘value of x if specified else leave empty’,…}

Only answer with valid json.

Userinput like “hi” or “test” really made it hard.
But now I can use plugins in a Chat with gpt3.5 :melting_face:

How would you define a guarantee?

As I mentioned briefly in my original post, you can force an LLM to conform to a context free grammar. As it generates each token, the model produces a probability distribution over all possible next tokens, then samples from that distribution. We can mask out certain tokens, preventing them from being considered so that the next token is sampled only from the remaining subset that we allow. At each step, we only allow tokens that are valid according to the CFG.

Let’s say that our schema is as follows:

{
    // The city and state, e.g. San Francisco, CA
    location: string;
    unit?: "celsius" | "fahrenheit";
}

We can break this up into stages. The first tokens are forced, and don’t even need to sample the model at all. Linebreaks removed for simplicity.

{"location": "

Now we consult the model, allowing it to produce arbitrary tokens until it produces an unescaped ending quote.

{"location": "San Francisco, CA"

The unit property is optional, so the model is now allowed to choose between two options. The next token must be either a comma or a closing brace. We force this by masking the tokens.

If it chooses the brace, then we’re done, end of generation.

If it chooses the comma, then the next few tokens are forced.

{"location": "San Francisco, CA", "unit": "

Now we use masking again to restrict the model to only generating celcius or fahrenheit.

After that, all remaining tokens are forced again.

{"location":"San Francisco, CA", "unit": "celcius"}

Using this approach, we’ve got the output forced to be valid according to the schema. It’s a probability based model, but we force it to sample from only tokens which are valid in the context. We’ve also reduced the number of times we’ve had to call the model, because for tokens with only a single valid option we can just fill them in from the schema. This saves compute and speeds up response time.

This process of obtaining the next valid tokens mask can be automated quite easily from a JSON schema. It can also easily be done for any context free grammar, using the techniques used in tools such as parser generators.

3 Likes

When I understand it correctly, this can only be done, if you have access to the inference code for the model. But GPT-3 through GPT-4 is a closed model.
The CFG will work for open-source models, where you can actively influence the inference of the model in the layers of the decoder.

Yes. I’m proposing that the API should support this.

Firstly, by automatically enforcing the JSON schema in the new Function Calling API. They already have everything they need in the API, they just need to add the enforcement in the backend.

Secondly, and more long term, I’d like them to also add an API option to support an arbitrarily specified CFG. They would take in something like Backus-Naur form (or a more practical/accessible CFG notation) and enforce compliance.

2 Likes

While this could be useful in some cases, TypeScript interfaces have less features than json-schema. For example, minimum, maximum, minLength, maxLength, pattern property validation, dependency validations, enum/const and more.
I do agree that if those are not needed, then TypeScript interfaces are more concise.

I have resolved this problem with a validator and error messages. I have no problems…most of the time. For a lot of different json-structures I am using a help utility instead of trying to put too much content in system message. Also, I am using predefined user/assistant messages in order to get desired outcome.

OpenAI should really pay attention to this - how to get their attention about this ?

As you say, it would be actually easy to enforce the format by only evaluating the tokens that fits the provided schema. OpenAI has made GPT4, so they can do that

I can’t think of how many applications need a json output… but have to rely on correctors, schema pre-preprompts and some (low prob?) invalid output, while openai just has do to what you describe.

This was posted in June Now we’re in October 2023 and still nothing has changed about this…

There’s now a very weak version of this in place. The model can be forced to adhere to JSON syntax, but not to follow a specific schema, so it’s still fairly useless. We still have to validate the returned value, all this change brings is that we don’t have to consider the case of the syntax being invalid. Except that we do, because apparently it can technically still spew out infinite whitespace.

Picking on an employee I’ve seen active here recently: @ted-at-openai, do you think we’ll get a good constrained schema adherence feature sometime soon? It’s clear you’ve got most of the pieces in place now with the constrained JSON syntax, so it feels a little silly that you’re announcing “GPT-4 Turbo is more likely to return the right function parameters” when you should be able to fairly easily make it 100% accurate. By adhering to a standard format, you’d also fix that infinite whitespace issue. This would hugely improve the usability of this API.

1 Like

It’s clear you’ve got most of the pieces in place now with the constrained JSON syntax, so it feels a little silly that you’re announcing “GPT-4 Turbo is more likely to return the right function parameters” when you should be able to fairly easily make it 100% accurate

I completely agree.

Also with the new json mode you still have those cons compared to your method:

  • Need to somehow provide the schema in the prompt
  • Write the word “json” somewhere in the prompt (what the ?)
  • Deal with data-useless whitespaces

The only one rational reason I see why - they might not want to implement it - is the model alignment.
With the schema enforcement you may be able to force it to write stuff it was
originally prevented from.

To be 100 % honest, I tried to get an output that did not exactly match the prompt schema, I myself could not do that so far.

This full schema enforcement should still be a thing.

Can Strict JSON help in this case? Might not be OpenAI native at this point, but might ease the pain…

In github, search for: tanchongmin/strictjson

The basic idea:
“Fit everything into a string, because it works”
You will get everything back as a string, and you can then convert it to int, float, code, array up to your liking
With strict_text, you can get any kind of answers including those with lots of ’ or " or { or } or
You don’t even need to match brackets { or quotation marks ’ in the json fields for this to work
Fewer features than vanilla Strict JSON (such as list-based constraining, dynamic inputs), but you can always just type it out in system prompt yourself

Well I’m glad that this is finally a thing : you can now specify a json schema and have it enforced perfectly 100% of the time.
Except for the newest o1 models.