Turning chatgpt API into a assistant for a (complex) website

aymeric75 · August 26, 2023, 4:40am

Hello,

Here is my use case: I have an e-commerce website and it’s, as you can imagine, fairly complex, meaning you have a databse of users, orders, products etc.

I want to develop a small app that uses chatgpt API in order to prompt it about any information about my website (for personal purpose).

I have see on the API reference that there is a possibility to fine-tune the model, for example, but also to use Embeddings…

My question is simple: what is, according to you, the best solution to use chatgpt API for my use case ? embeddins ? fine tuning ? anything else ?

Thanks a lot

Aymeric

jochenschultz · August 26, 2023, 7:43am

I don’t think this can be answered with a simple use this and it will work, since there is no “one size fits all” solution.

It depends on your product’s data structure (e.g. when you have packages with polymorphism in it forget about vector db)…

I think the best would be to try out a few solutions in a simple CLI python script.
Maybe try langchain - you might be lucky.

I’d say to make it good enough for production you should at least calculate 3 months of production time.

aymeric75 · August 26, 2023, 3:14pm

thanks for your answer Jochen, in my case it’s an e-commerce (let’s say Woocomerce+Wordpress) with 1500 customers, 900 products spread out in 20 categories, and the orders, etc…

I am not sure I understand what you mean by package with polymorphism…

jochenschultz · August 26, 2023, 3:19pm

In Wordpress you will have to deal with that.

The product data is stored polymorphic.

I suggest to change to another less messy system.

For Wordpress shops I’d say 6 months.

aymeric75 · August 27, 2023, 7:36am

Sorry but I still don’t understand what polymorphism is in this case (I have looked on Google but nothing)

aymeric75 · August 27, 2023, 8:05am

thanks wclayf, indeed embedding seems not to apply in my case, because none of the task it is used for (search, clustering, recommandations etc.) is actually doing what chatgpt does, i.e. completion of a given input text… or am I wrong ?

About fine tuning, this seems to be promising but I am afraid of something, if a new user subscribes to my website, does re-training the model (fine tuning it) with this added example will make it (the model) able to answer a given question about this very user ? I have strong doubts about this

aymeric75 · August 27, 2023, 9:43am

I think I could also use function calling because you can add the call of a (python) function to any request to the completion API, then the output of the function can be used as input to the prompt…

The only problem is that, all the given examples use Python, but I want to be able to use Js functions instead

SomebodySysop · August 27, 2023, 10:13am

I found it helps to really understand the difference between embedding and fine tuning: https://www.youtube.com/watch?v=9qq6HTr7Ocw&t=110s&ab_channel=DavidShapiro~AI

You may not even be talking about fine-tuning or embedding. You may be looking at functions which retrieve data directly from your existing databases. In which case, you are talking Text to SQL Generation: Text to SQL generation

“Show me recent orders of tennis balls.”

select prodDesc from orders where prodDesc like ‘%tennis balls%’ limit 10 desc

Then return this info to the user.

Sounds like you might want to investigate that avenue.

wclayf · August 27, 2023, 5:48pm

I had always thought the “Text to SQL” generation was more of a tool for software developers to use to help them craft SQL, but your example of using it for custom query generation is interesting. However the challenge there is stopping users from issuing a DELETE or UPDATE to the DB!

wclayf · August 27, 2023, 5:55pm

I deleted my comment, because I think I might have gotten that wrong. I think I was too narrowly understanding the difference between embedding and fine-tuning, and my hunch is that embedding will work for you. Sorry for the misleading post.

SomebodySysop · August 27, 2023, 9:27pm

The example I gave will require coding as well as API calls. You simply write your code to NEVER execute UPDATE or DELETE requests from the AI. And, of course, you give your AI instructions in the system message to NEVER issue them.

I am not actually doing this myself as I don’t have a need for it, but having used SQL for the past 30+ years and LLMs for the past 8 months, I see pretty clearly how it works. I actually have code now that executes hard-coded sql based upon AI responses. There are a bunch of folks here who are actually implementing the methodology I described whom you may want to query about this. I also believe there are already plugins for this.

Why this over embeddings? My understanding from @aymeric75 original question is that he wishes to allow users to query his existing database. Why embed your existing database when you can use the AI to simply query it?

wclayf · August 27, 2023, 9:35pm

I see exactly what you mean. Great points. It’s better to use SQL against a corporate database to pull information from it, than to try to “train” AI on that database, when the SQL results will be exactly correct, for queries simple enough to be done via SQL.

Another way to solve the security risk is to perhaps run the SQL on a DB “role” that doesn’t even have any updating privileges at all. This way you don’t have to “trust” the AI to generate “safe” SQL nor do you have to parse it before submission to search for UPDATE, DELETE, etc.

I’m sure in the future (or already) there will be entire product lines that query DBs this way, and eventually in 5 or 10 years, the large DB providers will be providing it as native query functions.

jochenschultz · August 27, 2023, 9:42pm

I am quiet sure you can down a database with an evil query, when you have enough data and create multiple cartesian products in subqueries…

wclayf · August 27, 2023, 9:59pm

The AI “system” prompt might be smart enough (if not now, then soon) to allow you to tell the AI which tables it’s allowed to join to which other tables, and how many joins, or inner-queries (sub-queries, etc) that it’s allowed to use, to make it safer. But definitely until this kind of approach has been thoroughly studied I would’t trust it myself. Too much risk. I just found it novel and interesting.

jochenschultz · August 27, 2023, 10:27pm

I mean it is a little off topic in terms of openai developer community, but it hits the nail…

SomebodySysop · August 28, 2023, 11:16pm

Here is someone who appears to be doing it successfully:

https://blazorhelpwebsite.com/ViewBlogPost/6067

aymeric75 · August 29, 2023, 4:49am

Nice, seems very similar to langchain (the one I onto now), but I will explore it, thanks

SomebodySysop · September 2, 2023, 9:42pm

So, I have put together my own text to SQL query system. It only uses two tables, and is restricted to generating analytics about my query log activity. It was surprisingly easy to do. Here is the development conversation: https://chat.openai.com/share/411300eb-bb09-4b96-ac4b-f167eb39b03b

As for bad SQL, this is my process:

User submits text request → Request is submitted to model along with message which contains restrictions → Model evaluates and returns SQL statement or error message → if SQL statement returned, that SQL is submitted to model again for double-verification against restrictions → If SQL statement is safe, then it is processed and output returned as html to user.

In addition to the model checks, my code also checks for forbidden statements and only submits SQL to read-only database.

Using gpt-3.5-turbo-16k. No one is more surprised than me at how well it works.

wclayf · September 2, 2023, 11:15pm

Very impressive conversation and tool. I’ve only ever used GPT-3.5-turbo so far, but am continually shocked because no matter how difficult a coding question I’ve asked it, it’s never failed. This is far better than passing any Turing test. This is Superhuman or Godlike reasoning. Thanks for sharing it.

SomebodySysop · September 3, 2023, 7:46am

What I am beginning to discover about gpt-3.5-turbo-16k is that it is “prompt-heavy”. By that I mean that gpt-4 will normally return the results I am looking for with minimal prompting. It just seems to know the right thing to do.

gpt-3.5-turbo-16k is like the dyslexic 12 year old child prodigy. You have to really explain what you want, then explain it again – and hopefully the model will understand. It also has a tendency to forget what you said in the beginning. But, when you get the prompts right, it can do a fairly decent job at small, specific tasks.

I am not using it as my primary chat completion model, but utilizing it for:

generating query concepts (the Hyde concept)
text to sql
categorization
ranking
summarizations
generating questions that text answers

Topic		Replies	Views
Generate SQL queries combining prompt engineering and fine-tuning API	4	7952	December 24, 2023
How do i create a custom model for text to sql queries? API	13	11521	December 18, 2023
Creating a support chat bot for my business API	4	3629	December 18, 2023
Using GPT3 on database schamas Prompting	16	5133	April 12, 2023
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	4064	September 23, 2024

Turning chatgpt API into a assistant for a (complex) website

Related topics