A sanity check for future plugins to access private SQL databases

I am not sure how all of you not talking about the fact that the sql these models produce are VERY bad. We did extensive tests for a long time now, and around 40% of produced SQL have common syntax errors:

  1. Window functions are not allowed in WHERE
  2. must appear in the GROUP BY clause
  3. More than one row returned by subquery
    etc, etc.

We tried EVERYTHING, and we just can’t get rid of these errors, no matter how you design your prompt, fine tune, or replace models.
You also can’t fix those automatically for many reasons.

Did you also get these errors?

1 Like

I agree. It is hopeless. For further evidence, skip forward to half way through this video:

https://c-phrase.com/videos/ChatGPT%20versus%20C-Phrase%20for%20Natural%20Language%20to%20SQL.mp4

1 Like

Thanks, it is interesting, what C-Phrase is based on ?
Do you think the world would ever have NLP to SQL ?
One idea we had is to create 100-500 queries and their sqls, and let the AI find out which of these queries fits the best to the user’s NLP query, and pick this SQL. Problem with this is that you will have to then change field names in the hardcoded sql, to fit the user’s query (e.g company name, specific numer such as col > 240, etc).

We are making huge efforts to find a solution to this, but nothing really provide real freedom of queries.

1 Like

How complex is your database?
Have you considered fine-tuning a model?

Of course. It can now. It won’t be 100% accurate, but it doesn’t need to be.
Knowing & having an accurate representation of its confidence helps, a lot.

Out of curiosity what does your database contain?

1 Like

This is a fascinating discussion, and we too are worried about privacy. I have plugin dev access now. My primary interest is using ChatGPT to access our Confluence data. Now Confluence has announced they will be offering a paid AI product, but I don’t know the cost or if it will be as powerful.

But for the 3rd party ones currently on there, yes, some of them do make sense. I asked for a particular recipe, then the Instacart plugin took the ingredient list and added it to my shopping cart. And I bought them. That was genuinely useful.

And I think Open Table may also benefit, although it needs some improvements right now.

1 Like

Well we dont need 100% accuracy but if every other request there is a syntax error, we can’t automate it really for our users since the query just fail.
the errors : “window is not allowed in where” and “must appear in the GROUP BY clause” - happens literally every request with a window function, looks like somebody really trained it wrongly.

We are managing multiple dbs, finance, IoT, etc. It is a complicated system.

I get this strange feeling that we do something wrong, because you are the only person online who I found agreeing with me that it just doesn’t work. Everywhere you check, nobody says anything about these specific syntax errors, like they don’t even exist.

I am not sure why gpt does these errors while managing to create such complex sqls

With us, these 3 errors happens EVERY other query. It is really annoying.

1 Like

Have you considered a pipeline instead of a single AI agent?

What if you separated the concerns? One to create, one to refine, and one to confirm, with some recursion until it has passed the final AI agent boss (As an example). Any n failed QA tasks can be evaluated for further training.

Come to think of it, that seems like how we perform jobs here. Woah.
Let’s take advantage before AI Rights becomes a thing and we’re required to speak with (AI)HR.

1 Like

Indicative of a prompt issue. Have you used a rigid prompt development process?

Have you tried specifying the flavor of SQL you are using in the prompt (ex. Postgres)?

Yes of course the prompt is very precised and we tried every possible prompt.
I doubt it is a prompt problem, because it understands the query precisely, it knows what i want, but it makes beginners SQL mistakes and syntax errors that are not about understanding the prompt, but about knowing SQL well enough.
You cant use LAG inside WHERE for instance, this has nothing to do with my prompt or the db.

Bill thanks, see my last comment here, this has nothing to do with the prompt, but the very basics of SQL.

Ok, so why not just run it through GPT again with a different prompt of “Fix any syntax errors”?

And/or

“Please compare this query with the original intent to determine if it’s appropriate”?

If these syntax errors were fixed, would it actually be an appropriate query?

Are you providing some few-shot examples?

Have you tried asking GPT to produce a mock structure of your database to compare with?
For example, you could ask it for a mock-up structure for a grocery store transaction database, and compare it to your own. There may be some insights found there?

Thanks, it is interesting, what C-Phrase is based on ?

It is based on what I am calling a combinatorial approach to natural language interfaces.

See http://umu.diva-portal.org/smash/get/diva2:1181044/FULLTEXT01.pdf
Or https://c-phrase.com/videos/Rehearsal%20for%20IUI-22.mp4

The technique is not ML-based although I can see LLM work perhaps helping out with
building the lexicon. I think C-Phrase is pretty reliable by now. There is a long history of these types of
efforts, although I must say that my system is based on some concepts from the Minimalist
Program in linguistics as well as what is called semantically tractability in the literature.

It is like the smooth brained counterpart to the highly convoluted brain of ChatGPT. But it
fits the NL->SQL problem a lot better.

Of course. It can now. It won’t be 100% accurate

It needs to be very close to 100% to be accepted for use over SQL databases.
For this problem, if you miss by an inch you might as well miss by a mile. Again, look at a comparison between
ChatGPT and C-Phrase in the videos here:

https://c-phrase.com/videos.html

In particular I draw your attention to this video: https://c-phrase.com/videos/ChatGPT%20versus%20C-Phrase%20for%20Natural%20Language%20to%20SQL.mp4

ChatGPT really bombed on this. And I doubt much can be done about it, although I am open to be convinced otherwise.

Still there are some very useful things that LLMs can do for SQL databases. Things like letting people build up high quality database states very quickly. Again see the videos.

Regards,
Michael

Every time a client has told me this, it’s almost always been the prompt. Not 100% of the cases, but almost in every case we resolved the issue through prompt engineering. maybe your case is an exception, but you should publish an example and the outcome that you tend to see.

As they say in the legal business, objection – the prosecution is asserting intimate knowledge that exists only in the defendant’s head. :wink: We don’t really know what reasoning is ongoing. But that was my next question - have you asked GPT how it arrived at these flawed conclusions?

One of the principles of good prompts is to give the model time to reason. Are you doing that?

There was a time when I felt similar sentiments. Then I started using a rigid prompt-tuning process that tests, ranks, and versions every change. I started using analytics to inch closer to reliable and predictable outputs. I did not make much progress until I treated the English side of the system the way I create and manage code. I was flailing and would often forget that I had tried many things. It’s the same sensation as walking in circles in the forest without realizing it.

Share a prompt or two - maybe we can spot something.

1 Like

Alternatively, we might say that the developers who implemented AGI in C-Phrase really bombed on that project. :wink:

Who implemented the C-Phrase GPT integration, and has that developer published the approach used? Have they exposed the methodology and allowed it to be peer-reviewed?

For every GPT success story, there seem to be 100 stories that demonstrate it is absolutely worthless. Worthless outcomes are rewarded with clicks or sometimes product purchases that are swayed in their own favor. C-Phrase has an interest in putting a finer point on GPT failure and the video seems to convey a certain glee that it fails so poorly. As such, I would tend to avoid this as a measure of failure unless the presentation were squeaky clean and at least attempted to be unbiased.

I see so many cases where new AI implementers get many things wrong. I was one of them.

1 Like

The TICKIT database is basically open. So is Northwind. So are a lot databases that approximate a real use case. Do you have any links to examples where ChatGPT robustly translates NL to SQL over databases like these?

Great thoughts. Completely agree.
I was, and possibly still am one of them as well. All part of the process.

@michael.minock I watched your video, and couldn’t see the comparison to GPT.
In fact, I was thinking that a synergy with GPT would work really well with your system.
The visual interactivity is really nice.

But, after looking at the admin interface, it seems like it’s a database in itself. So I’m just more confused on how it’s being compared to GPT
I’m still exploring though.

I know, curious if they revealed the prompts they use and the underlying approach.

I’m not sure what you mean by “databases like these”. While I could dig in and learn more about the database, it might be a little more productive for me or others in the community to see how GPT prompts are being used. Typically, that helps to clarify the approach and how it may be unable to achieve the results.

I can’t imagine they haven’t published the prompts that they use.

I do not think this concept would meet regulatory compliance for handling PCI, better just to connect your database in an environment you control. Still it could see it being useful for non sensitive tasks.