Using OpenAI to search database for products

First of all, I have spent 12+ hours looking into this very problem but I’ve hit a wall, so any help would be much appreciated.

I have 10,000+ toy products in our company. Each product has a reasonably in-depth description (imagine what’s on the back of the box, for example).

When I ask openAI (text-davinci-003 does the best) the question “Recommend me 3 products for an 8 year old who doesn’t like to go outside”, it yields fantastic results (!!).

Naturally, however, it’s not going to apply to my products.

Going backwards, If I feed a product description to openAI, I can ask “… would this be suitable for an 8 year old who doesn’t like to go outside” and it answers as you’d expect.

My question is: how do I “search” through my database of products by asking a prompt to openAI.

Ideally I would not create a model as I simply do not have the resources to be able to manually build a big enough data-set. If I must go down the model training route, I’d need openAI to help build and train that very model! I’m the only IT guy at the company and I’m on part-time, so my resources are restricted!

Can anyone point me in the right direction?

1 Like

you could generate embeddings of the description (or a customized description), and then search for semantic similarity between those embeddings and the query

3 Likes

Thank you - I actually got this working (on a small data set). It works okay, but it’s not as good as the chatbot can do.

For example, I fed chatbox the description of a puzzle with 1000+ pieces, and then asked if it would be suitable for a 4 year old. Chatbot determined that it would be more appropriate for a 4yr old to look at puzzles with 20-30 pieces…

Hey, welcome to the community!

Maybe after getting the information with embedding endpoint, you switch back to text-davicini-003 model. You would probably need a 500+ token prompt that includes relevant information about the product the embedding endpoint delivered you.

Hope this helps!

Hello,
I have been looking around for a similar solution and honestly surprised there are not a lot of resources about this although it should be the concern of many digital businesses. @PaulBellow does your solution propose to send the whole dataset of products in every prompt? this would be very expensive i assume right? Maybe this way it is better to fine-tune a model but again as @tank-cal-20 mentioned this would be an exhaustive task.
Thank you for sharing your knowledge