Welcome to the community, @Michal_Moskal!
Appreciate you stopping by.
Welcome to the community, @Michal_Moskal!
Appreciate you stopping by.
Thanks.
Since I am not a fan of post with bad info, I am considering taking it down. However it could also serve as an example of what not to do.
Your thoughts.
I mean in general itâs the right approach, so I would leave it.
BTW, you can test your grammar syntax offline with llguidance python package. If you see crashes there, you can report them to GitHub ¡ Where software is built
Thanks for the information and hats off, just finished reading LLGuidance: Making Structured Outputs Go Brrr - wow.
If you donât mind me asking, without lookbehinds and given the other constraints of the implementation, can we construct similar grammars in a terser way?
start: "<think>" "\n"think_content "</think>" address
think_content: think_char{0,100}
think_char: SAFE_CHAR | SAFE_LT_SEQUENCE
SAFE_LT_SEQUENCE: "<" ( NOT_SLASH_OR_T | "/" (NOT_T | "t" (NOT_H | "h" (NOT_I | "i" (NOT_N | "n" (NOT_K | "k" NOT_GT))))) | "t" (NOT_H | "h" (NOT_I | "i" (NOT_N | "n" (NOT_K | "k" NOT_GT)))) )
SAFE_CHAR: /[^<]/
NOT_SLASH_OR_T: /[^\/t]/
NOT_T: /[^t]/
NOT_H: /[^h]/
NOT_I: /[^i]/
NOT_N: /[^n]/
NOT_K: /[^k]/
NOT_GT: /[^>]/
address: %json {
"type": "object",
"properties": {
"street": { "type": "string" },
"city": { "type": "string" },
"zip": { "type": "number" }
},
"additionalProperties": false,
"required": ["street", "city", "zip"]
}
The key here is the limited thinking budget that can be set to an arbitrary limit, before outputting the JSON schema I need, which the API itself doesnât currently support.
Note: this sometimes gets stuck on the street/city/zip strings, producing an invalid JSON.
Thanks for pointing us in that direction.
Like the research you are doing and the practical application for use with LLMs, e.g.
These were unexpected bonuses:
Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use caseâwhile reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.
âLean Formalization of Extended Regular Expression Matching with Lookaroundsâ (PDF) by Ekaterina Zhuchko, Margus Veanes and Gabriel Ebner
For those looking for âRegex Decision Procedures in Extended RE#â by Ian Erik Varatalu, Margus Veanes, Ekaterina Zhuchko and Juhan Ernits
the paper is available publicly at
Tossing this idea out for anyone who might want something to do.
Create a public free tool to help others create the CFG, something like
https://platform.openai.com/chat/edit?models=gpt-5&optimize=true
but that is specific to creating the CFG.
It should also create examples of valid input sequences as sometimes seeing generated sequences is faster for spotting an invalid input than doing the parsing in ones head.