Hey my man,
I am currently teaching myself how to use Pinecone, what vector DB’s are etc.
I am also pretty new to programming, and use Javascript.
You seem pretty much on point with what you are trying to do however, you are over complicating it a little bit.
Vector DB search’s are, to the layman, pretty much magic, we won’t talk about the math, but the quality of the search is amazing. You will need to play around with the different types of measurements to check what works best for you. I think it was “euclidean” was what I was using but you might have to do a quick google search.
If you query a VdB with:
- What are your hours of operations?
- What days are you open?
- When can I come in next?
All of these search queries are going to relate to things like hours of operations.
The way you are formatting and processing your data isn’t optimal.
I would reformat your JSON data so that it isn’t in the question-answer format you currently have. For hours of operations I would have something like:
Hours of Operation:
M -
T-
W-…
…
After you have provided this information I would add a couple of tags to it.
[hours of operations, opening hours, availability] .
!!!(NOTE: Pinecone allows you to add metaTags to the stored embedding, I just haven’t played with this enough yet and the tags in the embedding is a functional away to achieve the result, probs not best practice, but it works well )!!!
This is what I would add into the VdB. It is easy to search and easily defineable as to what it is.
In regards to the vegetarian food I would go and I would create your menu. Inside the menu you have all of your different food categories and I would create each category as an individual data object.
So Menu
vegetarian
Item 1:
Description:
Ingredients:
Dietary: GF, Veg, etc
Then we want to add data tags
[menu item, vegetarian meals]
Depending on the size of your menu, and your acceptable token-cost you may be able to keep this in one embedding.
Additionally you could create an embedding for each meal type, VdB allow you to get many search queries in order of relevance.
What you then do is allow the user to ask their chatBot question, embed the question, query the VdB, get the id for the data and retrieve the data from your JSON (a VdB does not store the text).
You then get this question, and the user data, built a prompt template, insert the question and user data into the prompt template and boom, your chatbot will be mint.
TLDR:
You aren’t structuring your data right, you are trying to use an almost chatbot module to define your answers. This is a little dangerous as your chatbot isn’t using data to determine it’s answer but your preset questions and answers, this might produce issues if
- the queried question is to close to the answer,
- the queried question is worded weirdly,
I propose sit down, think of all the possible questions you want to answer and then build your data around what data answers those questions, not what the answers to the questions are, embed that and supply that to the openaiApi rather then what it should be answering.
Good Luck.
P.s. Hi, I’m Jayden, I really need to break into this industry and get some work experience behind me. If you need a developer, who will work for free, and you have a project where I can learn something please hit me up. I’d love to slave for you.