Using OpenAI to search database for products

tank-cal-20 · December 28, 2022, 8:21am

First of all, I have spent 12+ hours looking into this very problem but I’ve hit a wall, so any help would be much appreciated.

I have 10,000+ toy products in our company. Each product has a reasonably in-depth description (imagine what’s on the back of the box, for example).

When I ask openAI (text-davinci-003 does the best) the question “Recommend me 3 products for an 8 year old who doesn’t like to go outside”, it yields fantastic results (!!).

Naturally, however, it’s not going to apply to my products.

Going backwards, If I feed a product description to openAI, I can ask “… would this be suitable for an 8 year old who doesn’t like to go outside” and it answers as you’d expect.

My question is: how do I “search” through my database of products by asking a prompt to openAI.

Ideally I would not create a model as I simply do not have the resources to be able to manually build a big enough data-set. If I must go down the model training route, I’d need openAI to help build and train that very model! I’m the only IT guy at the company and I’m on part-time, so my resources are restricted!

Can anyone point me in the right direction?

nunodonato · December 28, 2022, 9:07am

you could generate embeddings of the description (or a customized description), and then search for semantic similarity between those embeddings and the query

tank-cal-20 · December 28, 2022, 10:45am

Thank you - I actually got this working (on a small data set). It works okay, but it’s not as good as the chatbot can do.

For example, I fed chatbox the description of a puzzle with 1000+ pieces, and then asked if it would be suitable for a 4 year old. Chatbot determined that it would be more appropriate for a 4yr old to look at puzzles with 20-30 pieces…

PaulBellow · December 28, 2022, 4:13pm

Hey, welcome to the community!

Maybe after getting the information with embedding endpoint, you switch back to text-davicini-003 model. You would probably need a 500+ token prompt that includes relevant information about the product the embedding endpoint delivered you.

Hope this helps!

madaher · March 13, 2023, 7:10pm

Hello,
I have been looking around for a similar solution and honestly surprised there are not a lot of resources about this although it should be the concern of many digital businesses. @PaulBellow does your solution propose to send the whole dataset of products in every prompt? this would be very expensive i assume right? Maybe this way it is better to fine-tune a model but again as @tank-cal-20 mentioned this would be an exhaustive task.
Thank you for sharing your knowledge

estav · November 13, 2023, 11:18am

Obviously so much has changed since this last comment. Has anyone solved this? Would be keen to hear your approach / see an example of where you managed to get to?

syntichsizer · November 13, 2023, 1:59pm

I’m currently working on something quite similar, but I’m still testing different approaches. My current approach is to describe my database design, and then the assistant can call my custom function with parameters it created to retrieve data from database. It’s kind of working for simpler questions, but it’s still struggling with more complex ones.

Did you have any ides in mind how to approach this problem?

Foxalabs · November 13, 2023, 2:25pm

You could take a look at assistants for just this case

syntichsizer · November 13, 2023, 2:49pm

What approach would you suggest when using Assistants? Something similar to what I described above or do you have any other suggestions? Thanks!

Foxalabs · November 13, 2023, 2:51pm

Simply upload your data in a machine readable format, CVS, XML, HTML etc… and then turn on the retrieval system, your data will automatically be used as context when answering queries, check out the documentation for details.

syntichsizer · November 14, 2023, 9:39am

My database has over 50k items that change regularly, so I’m not sure if it’s feasible to upload the whole database. Wouldn’t that be too expensive / use too many tokens?

Foxalabs · November 14, 2023, 10:26am

If you have live data that changes regularly then it sounds like you may need to spend some time building up a query you can run on your dataset to produce a smaller subset, then run the model over that, it will cut down your costs.

If the thing you need to do makes use of large amounts of the data, then this may well be the cheapest way of doing this task, expensive as it is, but if it is simple addition task or some trivial account summation, you should consider traditional code for low costs and speed.

vsraman85 · November 21, 2023, 3:45pm

Hi,
I am also doing a POC on something similar. would like to connect the model with database for products. Any solution for this use case? Please let me know.
Use case:

We need to identify the right product from database through the product image.

Topic		Replies	Views
Using GPT to Search & Pull Recommendations from a Database? API	23	10089	August 22, 2024
Need some help regarding best practices for my use case API	5	353	February 10, 2024
Searching 'products' using natural language querying, API	9	2571	December 19, 2023
AI Search using big ammount Data without VECTOR Prompting chatgpt , assistants-api	3	217	November 27, 2024
Using ChatGPT with Product Data API	2	495	August 28, 2024

Using OpenAI to search database for products

Related topics