[Feature Request] Function Calling - Easily enforcing valid JSON schema following


I was very excited to see the new function calling feature, then quickly disappointed to see that it doesn’t guarantee valid JSON. This was particularly surprising to me as I’ve personally implemented a valid JSON enforcer on a local LLM before, so unless OpenAI is doing something particularly different, it should be easy for OpenAI to do.

They can enforce any context free grammar quite easily. Simply track which tokens are valid according to the CFG at each sampling step, and dynamically mask tokens (like logit bias) to force the model to follow the CFG.

This has an additional advantage: it saves compute and reduces response time! When conforming to a JSON schema, there’s often only a single valid option for the next token. In this case, consulting the model can be skipped.

Long term I’d love to see support for enforcing arbitrary user specified CFGs, but properly enforcing JSON schemas in the existing API would be great!

A small related triviality: It would be nice to be able to get a pure JSON response without wrapping it in a function call. That wastes a few tokens and requires me to do a (tiny) amount of extra unpacking work.


Even before the new model came out yesterday, I found using a couple of samples to be 99% accurate when it comes to generating a specific format, be it JSON or CSV

In my experience, it really depends how detailed the JSON schema is. For basic things, I find the same, but for more complex things it’s sometimes a little less reliable. This is with 3.5 rather than 4. I’ve not even found a prompt that reliably returns the JSON without any other text as much as 99% of the time, but Function Calling should fix that at least!

Part of the cost of it being unreliable is that I’m forced to implement a manual schema checker on the result. That’s fine, I can do it, but across every user across every JSON request in their code, that’s quite a lot of unnecessary boilerplate.

IMO it’s simply ergonomically bad to have an API that takes a schema and PROBABLY follows it, when it could provide a nice guarantee. Needing to pull down results in a structured form for consumption by “dumb” software is an incredibly common/core requirement and should be catered to as well as possible.

A separate but related point: The JSON schema format is quite verbose. I usually prefer to specify my JSON format for the model using TypeScript syntax.

First the example from the docs in JSON Schema format:

    "type": "object",
    "properties": {
        "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA",
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
    "required": ["location"],

Now as TypeScript:

    // The city and state, e.g. San Francisco, CA
    location: string;
    unit?: "celsius" | "fahrenheit";

The latter is significantly more compact, so would save a few tokens. The models already seem to understand it well enough. It would be cool if this could be supported in the future.


I’m using 4 for my generation, and they are a bit on the complex side as well, so it might just differ case to case.

Also, sorry if I’m wrong in understanding this, but how would you define a guarantee ? GPT is a LLM and while it might be good at generating a lot of different things, something like getting a JSON back might not be the most feasible as it is at its core a probability based model and will throw up an error here and there.

I had to do stuff like this to force it in edgecases:

Extract x and y from userinput.

Response with { ‘x’: ‘value of x if specified else leave empty’,…}

Only answer with valid json.

Userinput like “hi” or “test” really made it hard.
But now I can use plugins in a Chat with gpt3.5 :melting_face:

How would you define a guarantee?

As I mentioned briefly in my original post, you can force an LLM to conform to a context free grammar. As it generates each token, the model produces a probability distribution over all possible next tokens, then samples from that distribution. We can mask out certain tokens, preventing them from being considered so that the next token is sampled only from the remaining subset that we allow. At each step, we only allow tokens that are valid according to the CFG.

Let’s say that our schema is as follows:

    // The city and state, e.g. San Francisco, CA
    location: string;
    unit?: "celsius" | "fahrenheit";

We can break this up into stages. The first tokens are forced, and don’t even need to sample the model at all. Linebreaks removed for simplicity.

{"location": "

Now we consult the model, allowing it to produce arbitrary tokens until it produces an unescaped ending quote.

{"location": "San Francisco, CA"

The unit property is optional, so the model is now allowed to choose between two options. The next token must be either a comma or a closing brace. We force this by masking the tokens.

If it chooses the brace, then we’re done, end of generation.

If it chooses the comma, then the next few tokens are forced.

{"location": "San Francisco, CA", "unit": "

Now we use masking again to restrict the model to only generating celcius or fahrenheit.

After that, all remaining tokens are forced again.

{"location":"San Francisco, CA", "unit": "celcius"}

Using this approach, we’ve got the output forced to be valid according to the schema. It’s a probability based model, but we force it to sample from only tokens which are valid in the context. We’ve also reduced the number of times we’ve had to call the model, because for tokens with only a single valid option we can just fill them in from the schema. This saves compute and speeds up response time.

This process of obtaining the next valid tokens mask can be automated quite easily from a JSON schema. It can also easily be done for any context free grammar, using the techniques used in tools such as parser generators.

1 Like

When I understand it correctly, this can only be done, if you have access to the inference code for the model. But GPT-3 through GPT-4 is a closed model.
The CFG will work for open-source models, where you can actively influence the inference of the model in the layers of the decoder.

Yes. I’m proposing that the API should support this.

Firstly, by automatically enforcing the JSON schema in the new Function Calling API. They already have everything they need in the API, they just need to add the enforcement in the backend.

Secondly, and more long term, I’d like them to also add an API option to support an arbitrarily specified CFG. They would take in something like Backus-Naur form (or a more practical/accessible CFG notation) and enforce compliance.


While this could be useful in some cases, TypeScript interfaces have less features than json-schema. For example, minimum, maximum, minLength, maxLength, pattern property validation, dependency validations, enum/const and more.
I do agree that if those are not needed, then TypeScript interfaces are more concise.

I have resolved this problem with a validator and error messages. I have no problems…most of the time. For a lot of different json-structures I am using a help utility instead of trying to put too much content in system message. Also, I am using predefined user/assistant messages in order to get desired outcome.