Specifying formal grammar for a domain-specific-language

I have created a simple domain-specific query language SIMPLE-QL for which I have a formal grammar. I would like chatgpt to convert english into this simple query language. This is so that if my users forget my grammer but are close, chatgpt can fix their query for them. I have had a bit of success passing in some simple documentation and examples as part of the prompt (see below). It actually works pretty well with GPT-4 (via ChatGPT), but less well with GPT-3 which is what I have API access to. I’m wondering if I could feed it a formal grammar definition instead? Or is there a better way to structure the documentation and examples? Has anyone tried to do something similar? Any help/thoughts/pointers appreciated!

Hi ChatGPT, here is the documentation for a query language called SIMPLE-QL:


SIMPLE-QL grammar: <stride> <filters*> <metric> <rate>% by <segments> <time-range>

Examples:

  • daily bookings 3w
  • monthly usa quotes 4y
  • weekly usa auto-quote quote-conversion 32w

<stride> (required)

  • One of: total, annual, quarterly, monthly, weekly, daily
  • If no time grouping is specified it default to total.

<time-range> (required)
A flexible way to specify a time range, any of:

  • A specified number of past days (d), weeks (w), months (m) or years (y): 3w, 3y, 18m, 90d
  • A year: 2022, 2021
  • A date range: 2020-01-01 to 2020-06-01

<filters> (optional)

  • an alphanumeric label

<rate> (optional)

  • an alphanumeric label with a % after it

<segments> (optional)

  • one or more alphanumeric labels separated by a comma

Examples:

  • monthly bookings usa% 12m
  • weekly online orders logged-in% 2021
  • total completed-orders in-house% 30d
  • total bookings by process 90d
  • monthly quote-conversion by quote-source 3y
  • weekly usa bookings by office 90d
  • usa quote-conversion by country,language 90d
  • daily revenue 2022

Please return an SIMPLE-QL query for: usa% of monthly revenue for the past 24 months

This returns something that is close, but not quite correct: monthly usa% revenue 24m
It should be: monthly revenue usa% 24m

1 Like

It returned the correct response to me using this prompt and using 3.5. However, I can see how responses may not be consistent. One thing that helps is to be clear and complete in your representation. Here is my reformulation of your prompt. Let me know if it works out.
Hi ChatGPT, here is the documentation for a query language called SIMPLE-QL:

SIMPLE-QL grammar: <stride*> <filters**> <metric*> <rate*>% by <segments*> <time-range*>

Examples:

daily bookings 3w
monthly usa quotes 4y
weekly usa auto-quote quote-conversion 32w

<stride*> (required, must be first)

One of: total, annual, quarterly, monthly, weekly, daily
If no time grouping is specified it default to total.

<filters**> (optional, must come after <stride*> and before <metric*> if present)

an alphanumeric label

<metric*> (optional, must come after <filters*> and before <rate*> if present)

an alphanumeric label

<rate*> (optional, must come before "% by " and after <metric*> )

an alphanumeric label with a % after it
<segments*> (optional, must come after "% by " and before <time range*> if included)

one or more alphanumeric labels separated by a comma

<time-range*> (required, must be last)
A flexible way to specify a time range, any of:

A specified number of past days (d), weeks (w), months (m) or years (y): 3w, 3y, 18m, 90d
A year: 2022, 2021
A date range: 2020-01-01 to 2020-06-01

Examples:

monthly bookings usa% 12m
weekly online orders logged-in% 2021
total completed-orders in-house% 30d
total bookings by process 90d
monthly quote-conversion by quote-source 3y
weekly usa bookings by office 90d
usa quote-conversion by country,language 90d
daily revenue 2022

Please return an SIMPLE-QL query for: usa% of monthly revenue for the past 24 months

Please excuse the extra asterisks but it wasn’t displaying the content here without that. I don’t know if it’s because it’s rich text format or what. You’ll notice that the changes I made included putting the “metric” tag into the list of descriptors, properly ordering them, and adding metadata about what order each should be in. In addition to that, if I were you, in the examples, I would include the tag equivalent of the output you display so that there is a 1:1 association and no ambiguity.

Even so, 3 is no 4, so YMMV.

2 Likes

This is super helpful, thank you. Let me try a few versions based on your feedback. Have you seen other examples like this? Where people specify a custom grammar or domain-specific-language? The closest thing I can think of is defining programming APIs and having openai convert english into api calls.

No, not exactly. i had GPT 4 cook up a variant on existing xml frameworks and then deploy some content into that successfully but I didn’t specify the grammar in the prompt as I’m not familiar enough with how xml is constructed to do so. I’m thinking about trying it myself though.

I think that formatting is generally a sticky area for 3.5. I find that the more structure I give it, the more likely it is to get it right. So, for example, I asked it to return some data and some associated metadata to me in a tabular format. That worked out fantastic by comparison to prior attempts. Another thing for you to try may be to create a multi-step prompt where step 1 is to organize the data in a tabular format and step 2 is to translate the data from table into your grammar.

3 Likes

I would experiment with removing the definition of the grammar and instead providing a lot more example.

It’s easy to forget since GPT is so good, but all its learning are by example :slight_smile:

Ideally, you’d fine-tune it, but it’s more complicated and expensive.

3 Likes

FYI, I decided to ask ChatGPT itself!

Prompt:

I have created my own simple query command language (SIMPLE-QL) with a formal grammar. How can I communicate that SIMPLE-QL formal grammar to you so that you can help me convert natural text into my simple query language?

Response:

Great! In order to communicate the formal grammar of your SIMPLE-QL language, you can provide me with a formal specification or documentation that outlines the syntax and semantics of the language.

Here are some tips on how to communicate the SIMPLE-QL formal grammar to me:

  1. Provide a clear and concise description of the syntax and semantics of the language. This should include details such as the allowed keywords, data types, operators, and syntax rules.
  2. If possible, provide a context-free grammar (CFG) or Backus-Naur Form (BNF) notation for your language. This notation can be used to formally describe the syntax of your language and can be easily parsed by a computer.
  3. Provide some example queries in SIMPLE-QL to demonstrate the syntax and semantics of the language.
  4. If your language has any unique or non-standard features, be sure to explain them in detail.

Once I have a clear understanding of the formal grammar of your SIMPLE-QL language, I can help you convert natural text into your query language.

I’m gonna try to create a BNF form of my grammar (currently its a nearley.js grammar), feed it as a prompt and see how that goes!

2 Likes

Keen to know if you made any progress with this!

If you have not seen it this paper may be of interest: [2305.19234] Grammar Prompting for Domain-Specific Language Generation with Large Language Models

…this Chain-of-Thought approach to grammar prompting may be worth looking into.

2 Likes