Using GPT to build a Recommender System — Looking for Help

I’m building a web app that will allow users to describe themselves — age, gender, hobbies, etc. — and with this info I’d like to recommend products from a database I’ve built.

GPT tends to perform quite well with this recommendation decision making when given a short list of products. But the problem is that I cannot just feed the full database into a prompt and ask GPT to decide — this would be way too text-heavy.

Additionally, I don’t know if embeddings are the route. If I were asking the users to describe a product they would like, I could use the embeddings to determine which products most closely matches the users description, but (as far as I understand it), I couldn’t ask GPT to search those embeddings to find the best recommendation based on the user’s personal characteristics. Is that correct?

And, importantly, with any approach, I’d like to be able to establish a feedback loop to improve the engine over time.

If anyone has any thoughts or experience with this, I’m all ears. For now, I’ve been using a custom-built ML recommender system, but this is very inert and difficult to scale. It seems like there must be a way I could leverage and train an LLM here.

Also, if you’re interested in helping me work on this problem in a larger capacity, please reach out!

I think you will need to use embedding to get this done, if it is something really serious, OpenAI embedding APIs are a very good choice.

Maybe also a VectorDB such as ChromaDB could be another good choice, I found it very easy to use, free, and ChatGPT understood how to interact with it pretty well, you can store your documents with and add metadata and retrieve them, they support sentence transformers and Instructor model for embedding, which is very good for most use cases.

You will then need to teach ChatGPT, your full intent and add good description for each endpoint, GPT-4 will do it.

And of course the retrieval plugin example in Github is a very good starting point.

Anything else I think will be all about what you write in the “description_for_model” and your endpoints description, once it taught the capabilities of the endpoint and it gets the intent, it will get it right pretty much consistently, I mean in terms of crafting the right query for the task.

Semantic search will work well for any task if the query constructed correctly IMO, but also I think we shouldn’t be confined to any specific solution :slight_smile: , if the data is stored anywhere and GPT-4 can access to it via the endpoint of the plugin even if it is a file, there is no limit really to what it can do! But of course a vector db will give you a better performance and pretty precise results, which won’t exceed the characters limits as well which is critical to maintain the conversation context.

You should explore the variety of these solutions with some sort of a mock and see which one would be best fit the purpose.


1 Like

One approach you could take is to turn this around.

Ask ChatGPT to tell you for each product what the ages range, hobbies etc are for a product. Or even just describe the kind of person who would want the product.

You can then rake embeddings of this data and match it against the users description.

This is an interesting problem.

In the docs, there a sample use-case for recommendations with embeddings.

However you will have to decide on the attributes to use from the user-bio to compare.

I like this approach as well. I was having difficulties “finding” some information when performing semantic search, and I decided to “convert” the information that I was looking for using chatgpt.

Instead of
query->embeddings->compare to embeddings of products

I used
Query → embeddings → compare to embeddings of “converted” products

Mind explaining that? I lost you a bit

Thanks for the reply! Looking into this now

I wrote a tutorial on using OpenAI embeddings to build a recommender system very easily in JS

LMK what you think :slight_smile:

1 Like

Please reply so I can learn on my end too.

suppose you have a list of a thousand products, perfume 1, perfume 2… perfume 1000

If you use the standard route, you will need to cross “profile-embeddings” x “perfume x - embeddings”, which wont help you.

My suggested approach would be to generate some “characteristics” on the perfumes.

These characteristics could be used as:
“rigid filter”: perfume 1-500 = male; perfume 501-1000 = female sort of filter.

or these meta-characteristics could be used to base your embeddings queries as well.

another example:
Earlier you had:
“perfume 1”(which is not informative)

you could use gpt 3.5 to create:
title: “perfume 1”
“fruity scent”
“ideal public: male and female”
“the perfume for an adventurous person that has a busy routine”

Now the dialog could look like this:
“hey, i want a perfume, my profile is this and that” => embeddings, which is compared to the embeddings of the meta characteristics of the perfumes.
“the perfume for an adventurous person that has a busy routine”