I am building an Assistant chatbot that gives users answers to questions from our sales database on wine. To avoid having it parse through a 30k token JSON with every function call containing every kind of wine from every country, I want it to intelligently input the parameter based on the user’s question.
Assistant: “On average sauvignon blanc sold 15 bottles per location.”
I have a JSON that has all products, (e.g. “sauvignon blanc”: “SAUVIGNON_BLANC_GLOBAL”) as key value pairs and would like to use fine tuning to “teach” GPT this taxonomy.
Is this possible/appropriate with fine tuning or should I be using another tool?
Also I did read that people were having issues with fine-tuned models as Assistants but I can use completions worst case.
As your use case is fixed, i.e. enquires about your sales database, it would seem to be a good idea to build the solution as a sequence of steps.
Step 1 take the user query
Step 2 convert the query into one or more SQL statements to be performed on a read only duplicate of the actual sales database for security.
Step 3 Run those SQL statements on the data and record the resulting data.
Step 4 Pass the returned SQL results to the model as part of the prompt and include the original query and ask the model to provide output.
Thanks Foxabilo. I should have mentioned wine_sales() is a call to our internal API and I’m adding the category parameter is to cut down on the context tokens.
So is your suggestion to have the Assistant pass through “sauvignon blanc” and then have my function interpret that as “SAUVIGNON_BLANC_GLOBAL”? That’s a possibility, but not ideal, as when we include regions I would have to account for a few different scenarios.
User: “How are sauvignon blanc from CA sales?”
Assistant may then put the user input as “CA sauvignon blanc”, “sauvignon blanc from CA”, “sauvignon blanc CA”, “california sauvignon blanc”, etc.
Would prefer if the Assistant was fine-tuned into understanding that sauvignon blanc from CA can only be “SAUVIGNON_BLANC_CA”.
Whether you embed or use SQL function as suggested by @Foxalabs , it will be up to you to “train” the Assistant to know which wine is which. In your case, it seems to me SQL is the best route. Your system message should define your database tables and fields. Then, you will need to provide instruction on what to access and when.
It could be tedious to set up initially, but once done, if done correctly, it should give you what you are looking for.
Your only other alternative is to embed all the databases into a vector store and then play around with various implementations of cosine similarity, along with system instructions, to “fine-tune” your searches. I’ve found Weaviate to be particularly good at providing a number of options for retrieving the best results.