Using OpenAI to search database for products

First of all, I have spent 12+ hours looking into this very problem but I’ve hit a wall, so any help would be much appreciated.

I have 10,000+ toy products in our company. Each product has a reasonably in-depth description (imagine what’s on the back of the box, for example).

When I ask openAI (text-davinci-003 does the best) the question “Recommend me 3 products for an 8 year old who doesn’t like to go outside”, it yields fantastic results (!!).

Naturally, however, it’s not going to apply to my products.

Going backwards, If I feed a product description to openAI, I can ask “… would this be suitable for an 8 year old who doesn’t like to go outside” and it answers as you’d expect.

My question is: how do I “search” through my database of products by asking a prompt to openAI.

Ideally I would not create a model as I simply do not have the resources to be able to manually build a big enough data-set. If I must go down the model training route, I’d need openAI to help build and train that very model! I’m the only IT guy at the company and I’m on part-time, so my resources are restricted!

Can anyone point me in the right direction?

1 Like

you could generate embeddings of the description (or a customized description), and then search for semantic similarity between those embeddings and the query


Thank you - I actually got this working (on a small data set). It works okay, but it’s not as good as the chatbot can do.

For example, I fed chatbox the description of a puzzle with 1000+ pieces, and then asked if it would be suitable for a 4 year old. Chatbot determined that it would be more appropriate for a 4yr old to look at puzzles with 20-30 pieces…

Hey, welcome to the community!

Maybe after getting the information with embedding endpoint, you switch back to text-davicini-003 model. You would probably need a 500+ token prompt that includes relevant information about the product the embedding endpoint delivered you.

Hope this helps!

I have been looking around for a similar solution and honestly surprised there are not a lot of resources about this although it should be the concern of many digital businesses. @PaulBellow does your solution propose to send the whole dataset of products in every prompt? this would be very expensive i assume right? Maybe this way it is better to fine-tune a model but again as @tank-cal-20 mentioned this would be an exhaustive task.
Thank you for sharing your knowledge

1 Like

Obviously so much has changed since this last comment. Has anyone solved this? Would be keen to hear your approach / see an example of where you managed to get to?

I’m currently working on something quite similar, but I’m still testing different approaches. My current approach is to describe my database design, and then the assistant can call my custom function with parameters it created to retrieve data from database. It’s kind of working for simpler questions, but it’s still struggling with more complex ones.

Did you have any ides in mind how to approach this problem?

You could take a look at assistants for just this case

What approach would you suggest when using Assistants? Something similar to what I described above or do you have any other suggestions? Thanks!

Simply upload your data in a machine readable format, CVS, XML, HTML etc… and then turn on the retrieval system, your data will automatically be used as context when answering queries, check out the documentation for details.

My database has over 50k items that change regularly, so I’m not sure if it’s feasible to upload the whole database. Wouldn’t that be too expensive / use too many tokens?

If you have live data that changes regularly then it sounds like you may need to spend some time building up a query you can run on your dataset to produce a smaller subset, then run the model over that, it will cut down your costs.

If the thing you need to do makes use of large amounts of the data, then this may well be the cheapest way of doing this task, expensive as it is, but if it is simple addition task or some trivial account summation, you should consider traditional code for low costs and speed.

I am also doing a POC on something similar. would like to connect the model with database for products. Any solution for this use case? Please let me know.
Use case:

  1. We need to identify the right product from database through the product image.