thanks wclayf, indeed embedding seems not to apply in my case, because none of the task it is used for (search, clustering, recommandations etc.) is actually doing what chatgpt does, i.e. completion of a given input text… or am I wrong ?
About fine tuning, this seems to be promising but I am afraid of something, if a new user subscribes to my website, does re-training the model (fine tuning it) with this added example will make it (the model) able to answer a given question about this very user ? I have strong doubts about this
You may not even be talking about fine-tuning or embedding. You may be looking at functions which retrieve data directly from your existing databases. In which case, you are talking Text to SQL Generation: Text to SQL generation
“Show me recent orders of tennis balls.”
select prodDesc from orders where prodDesc like ‘%tennis balls%’ limit 10 desc
Then return this info to the user.
Sounds like you might want to investigate that avenue.
I had always thought the “Text to SQL” generation was more of a tool for software developers to use to help them craft SQL, but your example of using it for custom query generation is interesting. However the challenge there is stopping users from issuing a DELETE or UPDATE to the DB!
I deleted my comment, because I think I might have gotten that wrong. I think I was too narrowly understanding the difference between embedding and fine-tuning, and my hunch is that embedding will work for you. Sorry for the misleading post.
The example I gave will require coding as well as API calls. You simply write your code to NEVER execute UPDATE or DELETE requests from the AI. And, of course, you give your AI instructions in the system message to NEVER issue them.
I am not actually doing this myself as I don’t have a need for it, but having used SQL for the past 30+ years and LLMs for the past 8 months, I see pretty clearly how it works. I actually have code now that executes hard-coded sql based upon AI responses. There are a bunch of folks here who are actually implementing the methodology I described whom you may want to query about this. I also believe there are already plugins for this.
Why this over embeddings? My understanding from @aymeric75 original question is that he wishes to allow users to query his existing database. Why embed your existing database when you can use the AI to simply query it?
I see exactly what you mean. Great points. It’s better to use SQL against a corporate database to pull information from it, than to try to “train” AI on that database, when the SQL results will be exactly correct, for queries simple enough to be done via SQL.
Another way to solve the security risk is to perhaps run the SQL on a DB “role” that doesn’t even have any updating privileges at all. This way you don’t have to “trust” the AI to generate “safe” SQL nor do you have to parse it before submission to search for UPDATE, DELETE, etc.
I’m sure in the future (or already) there will be entire product lines that query DBs this way, and eventually in 5 or 10 years, the large DB providers will be providing it as native query functions.
The AI “system” prompt might be smart enough (if not now, then soon) to allow you to tell the AI which tables it’s allowed to join to which other tables, and how many joins, or inner-queries (sub-queries, etc) that it’s allowed to use, to make it safer. But definitely until this kind of approach has been thoroughly studied I would’t trust it myself. Too much risk. I just found it novel and interesting.
User submits text request → Request is submitted to model along with message which contains restrictions → Model evaluates and returns SQL statement or error message → if SQL statement returned, that SQL is submitted to model again for double-verification against restrictions → If SQL statement is safe, then it is processed and output returned as html to user.
In addition to the model checks, my code also checks for forbidden statements and only submits SQL to read-only database.
Very impressive conversation and tool. I’ve only ever used GPT-3.5-turbo so far, but am continually shocked because no matter how difficult a coding question I’ve asked it, it’s never failed. This is far better than passing any Turing test. This is Superhuman or Godlike reasoning. Thanks for sharing it.
What I am beginning to discover about gpt-3.5-turbo-16k is that it is “prompt-heavy”. By that I mean that gpt-4 will normally return the results I am looking for with minimal prompting. It just seems to know the right thing to do.
gpt-3.5-turbo-16k is like the dyslexic 12 year old child prodigy. You have to really explain what you want, then explain it again – and hopefully the model will understand. It also has a tendency to forget what you said in the beginning. But, when you get the prompts right, it can do a fairly decent job at small, specific tasks.
I am not using it as my primary chat completion model, but utilizing it for: