First of all, I have spent 12+ hours looking into this very problem but I’ve hit a wall, so any help would be much appreciated.
I have 10,000+ toy products in our company. Each product has a reasonably in-depth description (imagine what’s on the back of the box, for example).
When I ask openAI (text-davinci-003 does the best) the question “Recommend me 3 products for an 8 year old who doesn’t like to go outside”, it yields fantastic results (!!).
Naturally, however, it’s not going to apply to my products.
Going backwards, If I feed a product description to openAI, I can ask “… would this be suitable for an 8 year old who doesn’t like to go outside” and it answers as you’d expect.
My question is: how do I “search” through my database of products by asking a prompt to openAI.
Ideally I would not create a model as I simply do not have the resources to be able to manually build a big enough data-set. If I must go down the model training route, I’d need openAI to help build and train that very model! I’m the only IT guy at the company and I’m on part-time, so my resources are restricted!
Thank you - I actually got this working (on a small data set). It works okay, but it’s not as good as the chatbot can do.
For example, I fed chatbox the description of a puzzle with 1000+ pieces, and then asked if it would be suitable for a 4 year old. Chatbot determined that it would be more appropriate for a 4yr old to look at puzzles with 20-30 pieces…
Maybe after getting the information with embedding endpoint, you switch back to text-davicini-003 model. You would probably need a 500+ token prompt that includes relevant information about the product the embedding endpoint delivered you.
I have been looking around for a similar solution and honestly surprised there are not a lot of resources about this although it should be the concern of many digital businesses. @PaulBellow does your solution propose to send the whole dataset of products in every prompt? this would be very expensive i assume right? Maybe this way it is better to fine-tune a model but again as @tank-cal-20 mentioned this would be an exhaustive task.
Thank you for sharing your knowledge
I’m currently working on something quite similar, but I’m still testing different approaches. My current approach is to describe my database design, and then the assistant can call my custom function with parameters it created to retrieve data from database. It’s kind of working for simpler questions, but it’s still struggling with more complex ones.
Did you have any ides in mind how to approach this problem?
Simply upload your data in a machine readable format, CVS, XML, HTML etc… and then turn on the retrieval system, your data will automatically be used as context when answering queries, check out the documentation for details.
If you have live data that changes regularly then it sounds like you may need to spend some time building up a query you can run on your dataset to produce a smaller subset, then run the model over that, it will cut down your costs.
If the thing you need to do makes use of large amounts of the data, then this may well be the cheapest way of doing this task, expensive as it is, but if it is simple addition task or some trivial account summation, you should consider traditional code for low costs and speed.