How would you define a guarantee?
As I mentioned briefly in my original post, you can force an LLM to conform to a context free grammar. As it generates each token, the model produces a probability distribution over all possible next tokens, then samples from that distribution. We can mask out certain tokens, preventing them from being considered so that the next token is sampled only from the remaining subset that we allow. At each step, we only allow tokens that are valid according to the CFG.
Let’s say that our schema is as follows:
{
// The city and state, e.g. San Francisco, CA
location: string;
unit?: "celsius" | "fahrenheit";
}
We can break this up into stages. The first tokens are forced, and don’t even need to sample the model at all. Linebreaks removed for simplicity.
{"location": "
Now we consult the model, allowing it to produce arbitrary tokens until it produces an unescaped ending quote.
{"location": "San Francisco, CA"
The unit property is optional, so the model is now allowed to choose between two options. The next token must be either a comma or a closing brace. We force this by masking the tokens.
If it chooses the brace, then we’re done, end of generation.
If it chooses the comma, then the next few tokens are forced.
{"location": "San Francisco, CA", "unit": "
Now we use masking again to restrict the model to only generating celcius or fahrenheit.
After that, all remaining tokens are forced again.
{"location":"San Francisco, CA", "unit": "celcius"}
Using this approach, we’ve got the output forced to be valid according to the schema. It’s a probability based model, but we force it to sample from only tokens which are valid in the context. We’ve also reduced the number of times we’ve had to call the model, because for tokens with only a single valid option we can just fill them in from the schema. This saves compute and speeds up response time.
This process of obtaining the next valid tokens mask can be automated quite easily from a JSON schema. It can also easily be done for any context free grammar, using the techniques used in tools such as parser generators.