Hi all;
Sorry, this is a giant question asking about everything around this. But if I’m on the right track here, and you all can help me with these questions, then I think I’ll have it. And I think the Azure services I will use for this call down into OpenAI.
Use Case
I have written an app that manages events for political parties and campaigns. The most valuable feature in this app is to present volunteers with prospective events ranked in the order that places the events they will find most interesting at the top. Just as Google became so compelling because it would put what you were looking for as the first entry of the results 90% of the time.
There are a couple of significant constraints for this use case. The first is the events expire (once they’re happened, they’re no longer available). A website showing concerts faces the same issue. I can use signups to events in the past for the modeling, but then have to recommend similar events that are in the future.
The second is I will have little to no “purchase history” with most volunteers. For a new volunteer it’s incredibly important to recommend events they find interesting. Otherwise they leave. But they have no history of signing up. What I do have is a fair number of properties they have set that I can use as features to find NN volunteers and see what they signed up for.
There are two cases where this will be called. One is a search where there is a search text entry and I need to find the best matches. The other is a carousel showing upcoming events of interest - there’s no search string for this case.
I need to walk before I run and this is all new to me. So I want to create a simple straightforward recommendation engine, not one that is using the latest/greatest algorithms. And to keep this to as few steps as possible. I also want to keep the costs low.
My application is written in Blazor and it runs on Azure. So I need to do all this in C# and I’d prefer to use Azure services.
Questions
Q1: I think I need to create vectors of every user and every event. Both have a fair number of boolean and numeric properties (that’s straightforward). And for the text properties I need to create an embedding for each - correct?
Q2: Each event has 1 Interest and 0-N Tags. The correct way to do this is to have a feature for each Interest & Tag and the event sets it to true/false for each - correct? If I take this approach over half the features will be all these booleans - will that then weigh those values stronger than everything else? And am I making my model way too complex having these 50 boolean features?
Q3: Every event has text properties (name, description, parent organization name, etc.). I assume I convert each of these into an embedding - correct? Is there an example anywhere showing how to get embeddings from Azure using C#? I’ve only found Python examples.
Q4: Once I’ve generated these vectors, where do I save them?
Q5: To find recommendations via similar volunteers, is my approach to find the NN volunteers that have signed up for events (maybe the closest 5 - 10). And then from the events they signed up for, find the NN future events? And if so, is there an example of how to do this in C# calling Azure?
Q6: For the case of a search text string, how do I apply that to find the best match? I still want events they are going to like, but in that set, the subset that matches the search string. This text should match all of the embedded text features in the event vectors. And if so, is there an example of how to do this in C# calling Azure?
Q7: I think I need Hybrid Search because, along with the vectors, distance from the user and datetime (how soon is it) matter. Those are straightforward SQL where clauses and putting them in vectors would require generating event vectors every day and a set for every user. And if so, is there an example of how to do this in C# calling Azure?
And the giant question - is this the right approach? Am I missing anything?
thanks - dave