Need a recommendation service

Hi all;

I need a recommendation service/model that uses some specific properties per item.

My use case is a system that recommends events for political volunteers. There can be search words entered by the volunteer, and if so, those should have significant influence on the returned events and their order. But often there are no search words and still, it needs to find the best match for the volunteer.

What it does have is several properties. It has the user’s and the event’s location. Every event has exactly 1 interest (canvassing, phone bank, etc.) and the user sets what interests they prefer in their profile. Every event also has 0 – N tags and again the user sets what tags they are interested in in their profile. Every event is for 1+ candidates and the user sets what candidates they are interested in helping in their profile. The user can also optionally write up why they are volunteering and what they hope to accomplish by their efforts. This is added to their profile.

I need a recommendation engine that can use the location (how far is it, and if it’s online, then distanced is less important), does it have a matching interest, and how good a match on tags (in a way that events that over select tags come up lower). And use that optional write up content the volunteer entered, if they did that. And it needs to take into account when the event is, with the optimum being 1 week out.

And… Once a volunteer has signed up for an event, then similar events should rank higher. Similar can be the event candidate, description, interests, tags, day of the week, time of day, duration, etc. A volunteer can also mark an event as no or as interested (which is different from attending) and these settings should also influence what other events are the best match.

And… Once an event is over the volunteer rates the event and the event manager rates the volunteer. That data needs to be used to rank higher events that the volunteer rates highly and the manager rates the volunteer as valuable at.

And of course, if there are other properties that should be used, tell me what they are and how to use them.

Is there a recommendation engine that can accomplish all this? What I want is the top 30 events (potentially out of 10,000), ranked in order.

And it needs to be a system where all the events are fed to it once, updated as they are changed, and then multiple queries are run against it. Feeding ChatGPT all 10,000 events each time I need the top 30 - not a scalable approach.

Thanks – dave

Hey there!

This sounds like a lot of work, and it’s going to take you a lot of time, but it’s definitely doable. What you are describing is essentially a pitch for building an entire app and potential ecosystem around this particular app.

You are right in that ChatGPT is not really going to be of much help here. Actually, I wouldn’t even think about LLMs here yet for a while, because any use of that in an app like this would be more like a bonus feature than a core mechanism of the system.

To answer your question:

This is typically what tech startups build themselves when it comes to app development. This is the “core”, the essence of everything a system and the company is typically built around. There’s not really a good way to package and distribute core algorithms like this, because typically that’s the proprietary part of any development project and people’s needs are so vastly different.

Therefore, your options are two-fold:

  1. Hire someone (or a team) to build this for you. This is not going to be an overnight project.

or

  1. Learn and build DIY style! It’ll save you money, and you’ll eventually be able to build it exactly how you want it. In order to do so however, this will require a good amount of reading and understanding how to get from this high-level overview to low-level implementation. If you need help on understanding where to get started, ChatGPT is honestly a really good tool to help you bridge gaps in knowledge, and it can point you in the right directions to start.

Here, the problem isn’t necessarily that it’s difficult, but that it’s time consuming.

1 Like

Definitely option #2 as that will be a great way to learn how to apply AI.

Any suggestions on where/how to get started? My applications stack is C#, Blazor, Azure so prefer using that as opposed to Python. But if the best way is Python, so be it.

Thanks - Dave

Recommender engines aren’t super hard, but they might take some time to tune.

I’ll use the term product/SKU here for your events.

Here’s how they generally work:

  1. Find similar users by rank
  2. Suggest products to the user that cohort members/peers have purchased, but this user hasn’t.

1. Find Users by Rank

The easiest way to do this is by creating a similarity score:

  1. make a list of all products this user has purchased
  2. compare this to every other user’s list
  3. rank peers by list matches

This can later be optimized by finding clusters and creating cluster archetypes (basically HNSW, to a degree), but could be overkill. You could theoretically use HNSW, tbh, but your vector will grow with the number of products - which will make it difficult in practice :thinking:

1.1. Embedding Adaptation

Instead of comparing a bunch of list items, what you could try instead:

  1. for each user, take their product list - and embed the producs (description, keywords, etc)
  2. compute the arithmetic mean, or perhaps a running average of these embeddings
    • enhancement: weigh the embeddings by ratings: poorly rated skus might be multiplied by 0, evicting them completely
    • enhancement: include an embedding of the user profile in the mix
  3. Find users with the highest cosine similarity, and base your prediction on them

The embedding approach isn’t actually that different from the matrix approach. What’s the difference? In the classical approach you compute the dot product of the one-hot binary encoding, here you (may) save dimensions by using embedding vectors instead, which will remain constant in dimension.

2. Product Prediction

Easy: You just go through your top ranked user(s) and give n (maybe 5) un-, or under-purchased products. That’s your result.

However, since you’re working with expiring SKUs, it doesn’t really make sense to recommend old or past products, so the discovery of new products is important.

Here, you can use embeddings again. Take your top picks (from your similar users), and for each of these, find matches after filtering for your hard criteria (especially time, in this case). Those will then be your results.

3. Tuning

Depending on how much you value diversity, taking the top 5 product matches of the most similar products to the most similar users might not be the best approach, because you may accidentally get too similar results. E.g.: You’re on amazon and you buy a mechanical pencil. Suddenly you’re flooded with suggestions for pencil lead and nothing else. But that really depends on your data.

What a lot of sites also do, is not only compare by user history, but also compare your cart directly, and then mix or rerank the results. Additionally, you might want to throw sponsored content in there, and who knows what else. I personally think reranking is an ugly solution, but I don’t have a better one. You might want to invoke @curt.kennedy if you wanna go down that route if he has time.

But if you want to keep it absolutely super simple: averaging up the embeddings of a user’s past choices might be enough to compute a ranking for likely future candidates. Then you don’t have to deal with all the peer nonsense - although for new users without a history, finding a peer (based on description) and then suggesting the first n events based on that peer’s history would probably be a nice feature.

1 Like

First off, thank you.

Second you’re using several terms I am not familiar with (HNSW, embed the products, cosine similarity, etc.). Is there an online introduction class you recommend? Preferably Microsoft or Pluralsight.

Third, is this using any AI service or complex algorithm? Or am I implementing this all myself, calculating numbers for sets of products and looking for close matches?

And forth, this is the biggie, this is not for products. This is an app that lists volunteer opportunities for political candidates. Past “purchases” in this case are events that are over. Most volunteers have volunteered at 1 - 3 events so not a large sample set per single. And I’ve got thousands of events, each with 2 - 12 volunteers.

So do I look for similarity between these events and match the ones that are close? And similar is a big question because location, day, and time of day can matter a lot. My three volunteer efforts were on a Saturday - was that random or does Saturday matter to me?

And the biggie… this is most important to a new volunteer who has not yet volunteered for anything (i.e. has made zero purchases). How do I recommend to them?

thanks - dave

https://platform.openai.com/docs/guides/embeddings

you can ignore HNSW for now, you’ll come across it later

:slight_smile:

it’s pretty much the same thing. your products just expire. I started writing it and then noticed that I was writing it for products (which is what I used to do), but there’s no real difference other than what I mentioned

If you have two identical events, but on different days, the event with the same as prior days will always rank higher than the ones on a different day.

That said, I guess that is part of the difficult tuning part.

Do you use peers to determine the what, and exclusively the user’s past to determine the when? Then you have two different rankers, and you’ll need to rerank.

You can theoretically incorporate the data into your vector (day of the week, time of the day, season of the year) - I haven’t gotten that far yet. Here’s a thread: https://community.openai.com/t/temporal-linear-coding-with-embeddings/

1 Like

Thank you, this gives me a lot to move forward with.

1 Like