Gpt 5: Custom Lark tool outputs are not guaranteed to conform to the CFG?

Welcome to the community, @Michal_Moskal!

Appreciate you stopping by.

1 Like

@Michal_Moskal

Thanks.

Since I am not a fan of post with bad info, I am considering taking it down. However it could also serve as an example of what not to do.

Your thoughts.

1 Like

I mean in general it’s the right approach, so I would leave it.

BTW, you can test your grammar syntax offline with llguidance python package. If you see crashes there, you can report them to GitHub ¡ Where software is built

2 Likes

Thanks for the information and hats off, just finished reading LLGuidance: Making Structured Outputs Go Brrr - wow.

If you don’t mind me asking, without lookbehinds and given the other constraints of the implementation, can we construct similar grammars in a terser way?

start: "<think>" "\n"think_content "</think>" address

think_content: think_char{0,100}
think_char: SAFE_CHAR | SAFE_LT_SEQUENCE
SAFE_LT_SEQUENCE: "<" ( NOT_SLASH_OR_T | "/" (NOT_T | "t" (NOT_H | "h" (NOT_I | "i" (NOT_N | "n" (NOT_K | "k" NOT_GT))))) | "t" (NOT_H | "h" (NOT_I | "i" (NOT_N | "n" (NOT_K | "k" NOT_GT)))) )
                         
SAFE_CHAR:      /[^<]/
NOT_SLASH_OR_T: /[^\/t]/
NOT_T:          /[^t]/
NOT_H:          /[^h]/
NOT_I:          /[^i]/
NOT_N:          /[^n]/
NOT_K:          /[^k]/
NOT_GT:         /[^>]/
                         
address: %json {
 "type": "object",
 "properties": {
    "street": { "type": "string" },
    "city": { "type": "string" },
    "zip": { "type": "number" }
 },
 "additionalProperties": false,
 "required": ["street", "city", "zip"]
}

The key here is the limited thinking budget that can be set to an arbitrary limit, before outputting the JSON schema I need, which the API itself doesn’t currently support.

Note: this sometimes gets stuck on the street/city/zip strings, producing an invalid JSON.

2 Likes

Thanks for pointing us in that direction.

Like the research you are doing and the practical application for use with LLMs, e.g.

These were unexpected bonuses:

  • Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.

  • “Lean Formalization of Extended Regular Expression Matching with Lookarounds” (PDF) by Ekaterina Zhuchko, Margus Veanes and Gabriel Ebner


For those looking for “Regex Decision Procedures in Extended RE#” by Ian Erik Varatalu, Margus Veanes, Ekaterina Zhuchko and Juhan Ernits

the paper is available publicly at


Tossing this idea out for anyone who might want something to do.

Create a public free tool to help others create the CFG, something like

https://platform.openai.com/chat/edit?models=gpt-5&optimize=true

but that is specific to creating the CFG.

It should also create examples of valid input sequences as sometimes seeing generated sequences is faster for spotting an invalid input than doing the parsing in ones head.

1 Like