Using GPT to Search & Pull Recommendations from a Database?

outdone-jon · April 30, 2023, 9:53pm

Hi! Like the title says, I’m wondering if I can use the GPT API to search a product database I’m building and make product recommendations based on user inputs in my web app.

Essentially a user would describe attributes about themselves and I’m hoping GPT could search my product database to find and present the most relevant recommendations.

I’ve seen similar questions asked here previously, but none was exactly this scenario + the answers weren’t totally clear.

IanBod · April 30, 2023, 11:00pm

Does your database content change regularly?

If not then you want to hit the Embeddings API with the text content from your database. Store the vector that is returned in your database. Then, when a user makes a query, you pass that query to the Embeddings API and compare the returned vector to each vector in your database.

My implementation is to store the vector as a CSV string in a TEXT field. The vector is used atomically so no need to break it down. Then to compare the query vector to each vector in your database and store the results in a temporary table that you can easily search in order.

The vector comparison is done with Cosine Similarity.

outdone-jon · May 3, 2023, 3:35am

Thanks for the response, Ian! The database is fairly static, so it sounds like this might be a viable route. I’m not very familiar with the embeddings API. Are you saying that each row in the database would be assigned a unique vector?

outdone-jon · May 3, 2023, 3:54am

It seems like this might only work if the query was very similar to the product descriptions. Meaning, if the user is telling us “I’m looking for ____ type of product”, we could user the embeddings to find a product in our database that’s the closest match.

Instead we’re asking the user to describe themselves and are hoping GPT can extrapolate to recommend products based on the attributes of the user. Does that make sense? Do you still think the embedding approach would work here?

Gaouzief · May 17, 2023, 2:50pm

w’re working on the exact same problem it seems
has you search evolved since this message ? I’m leaning towards this approach:

pre-processing: narrow the data set to be “searched” by GPT by starting the conversation with a “tag” selection of product ranges, and also keywords"
feed only the remaining data to gpt through the prompt repeating it in every response
if the dataset is still too large, cut the process in 2 or 4 (wizard) based on product categories or whatever splits your data

but we have not tested this yet, we are waiting for access to GPT4-32K

outdone-jon · May 17, 2023, 9:28pm

That’s a great idea! Using tags to filter might actually reduce the database to a size that the entire new subset could fit in the prompt.

Of course, that’d still be a super token-heavy approach versus GPT just “knowing” our entire database to begin with. But I really like this idea as a start, so I’m gonna mess around with it.

I’d be interested to hearing more about your specific use case and what you’re working on!

Gaouzief · May 19, 2023, 12:01pm

an in-store sales recommendation chatbot

sperez · May 19, 2023, 1:04pm

I’m not sure this will work, but it might be cheaper to operate. Ask ChatGPT to read your produce descriptions and then have it pretend to be a user describing attributes of themselves. You can ask it to generate as many (100s?) of these user descriptions as you want. You can even create one or two for each product manually as examples for ChatGPT to iterate on. Create embeddings for each generated user self-description, associate with the product in question, and push the embedding into something like pinecone. Then, when a new user comes to your site and provides their self-description, create embeddings for that and query pinecone with it. Creating and querying with embeddings is cheaper than having to invoke ChatGPT for each user. This way you just have to use it for each new product or product update.

outdone-jon · May 21, 2023, 11:26pm

That’s interesting. So you’re saying that GPT would then calculate the semantic distance from any new user description and compare it to one in our dataset — and therefore, it’d know corresponding good recommendations?

nelson · May 22, 2023, 3:34am

Hi Jon,

Sounds like your database is very static, and assuming your database can export to a CSV file, you can turn the CSV into embeddings and run GPT on top of it.
Here is a thread on how to use it on top of CSV files…

outdone-jon · May 23, 2023, 12:30am

Hi Nelson! Thanks for the insight. That sounds like a good approach if my use case was to build a chatbot that could answer questions about our product list, but I’m looking for GPT to make a selection from a product list based on attributes of our user, so it’s slightly different. And I’m not sure it would work in that sense.

Do you have any thoughts on that?

nelson · May 23, 2023, 10:17pm

Hi Jon,
Do you mind sharing a sample CSV file of your products?
I assume you will like GPT to answer product questions and do product recommendations right?
Thanks @outdone-jon

jochenschultz · May 23, 2023, 10:37pm

You can build a plugin instead of using embeddings. Which also gives you more customer opportunities and other options.
And then let the model ask the api.

outdone-jon · May 24, 2023, 1:07am

Are there things you can do when building the plugin that you couldn’t do building through the API?

jochenschultz · May 24, 2023, 6:08am

You can put it into chatgpt… It’s called plugin first approach

sperez · May 26, 2023, 4:47am

Once the user provides their description you just call the OpenAI embeddings endpoint and then use that to query pinecone (or any vector db). That query calculates the semantic distances and can return all the “hits” along with their distance. A pinecone query (they even have a free tier and you can host your own vector db very easily) is dirt cheap compared to ChatGPT’s API. Of course you can mix-in as much “chat” as you want (e.g. asking the user to provide their description).

diego.cardenas · November 10, 2023, 5:23am

Hey to everyone here! I think this is a critical issue a lot of uf are facing. Has this changed with the new “Retrieval”?

roopak · February 21, 2024, 2:50pm

Have anyone found a solution or a correct approach for this?

SomebodySysop · February 22, 2024, 1:23am

Which part do you have a question about? Creating embeddings for your data, or calling your existing data from it’s existing database?

Here are some thoughts on the latter:

OpenAI Text to SQL
- Prompt Example - Natural Language to SQL
- https://platform.openai.com/examples/default-sql-translate
OpenAI Functions calling the Poor Developers Vector Database
- OpenAI Functions Calling the Poor Developer's Vector Database
Private Chat with CSV Data (@DevGirl explanation):
- Private Chat with CSV data - #7 by DevGirl

roopak · February 22, 2024, 1:40am

my 1st question is about the approach:

can it be done with gpt4-1106-preview, retrieval, functioncalling and file
or it be done using vector db

in vector embedding

which model would be better for creating embeddings

in assistant api, with retrieval - when tried to upload a .txt or .md file - it gives an error mime type not supported application/octet-stream

Topic		Replies	Views
Using GPT to build a Recommender System — Looking for Help API plugin-development , api	9	12009	December 19, 2023
Using OpenAI to search database for products API	12	5462	November 21, 2023
About the usage of ChatGPT Embedding API	9	4419	August 18, 2023
Best method of injecting relatively large amount of context to be leveraged in a response API	10	11195	December 17, 2023
Reducing Cost of GPT 4 by using embeddings Prompting	23	10497	May 4, 2023

Using GPT to Search & Pull Recommendations from a Database?

Related topics