My goal is to figure which of the top 20 trending news items per Google Trends relate to which of my 300 podcasts episodes (if any).
My approach:
For each episode, take the text summary (roughly 100 words) and send it to the embeddings endpoint, which returns an embed object like this:
0.047140226,0.021655217,0.049956247,0.01724814,0.005744687,-0.023809474,0.011503453,0.0070470977,-0.018318228,0.04925224,0.003560509,-0.05153322,-0.030582009,0.020782249,0.03953696,0.015051642,0.03886112,-0.023429312,0.0030325048,0.036946222,0.024006596,0.02066961,-0.03308827,0.0054666046,0.007624382,0.0009187275,0.012376421,0.035172127,0.05415212,0.00096448784,0.03061017,-0.01100361,-0.09957457,0.050660253,-0.020528808,-0.0022228982,0.009201355,-0.018388629,0.025696209,0.01976848,-0.021894578,0.025104845,0.0098631205,0.048660878,0.010 [... truncated for brevity]
Store that vector array in my MySQL database, as meta data against that episode.
Do that for each episode until all 300 of them have a vector array for their episode summary.
For each news item, do the same thing, though I’m just caching those vectors in memory since they change frequently.
For each news item, loop through each episode, and run the following php function on my web server (I got this from a different thread here on the forum):
function dot_product($news_item_vector_array, $podcast_summary_vector_array) {
$result = array_map(function($x, $y) {
return $x * $y;
}, $news_item_vector_array, $podcast_summary_vector_array);
return array_sum($result);
}
Sort the results and say that if the dot_product() for a given news item / episode is over some threshold, I can consider that to be a valid “News Item → Podcast Connection”
I can’t believe this, but it seems to be kind of working?
Take this podcast summary for example:
Telling a clear story about your product is a basic entrepreneurial skill. But to build enduring impact, you need to help amplify other stories — those that surround you in your community and your customers.
Marcus Samuelsson has done just this with beloved restaurants such as Hav & Mar and Red Rooster, and through his media group that celebrates the richness of the world's cuisines and the stories embedded within them.
Marcus shares how embracing a diversity of stories has let him create spaces where every individual's narrative is valued, and has opened up new avenues of inspiration for him as an entrepreneur and award-winning chef.
Now, take this example news item, which I regard as a good candidate for a “News Item → Podcast Connection”:
The Red Rooster wins prestigious michelin star award
Well, that gets a dot_product of 0.339879951911
Now, take this other news item, which I regard as completely unrelated to the podcast episode:
The country music singer toby keith has died
Well, that gets a much lower dot_product of 0.0421805131692
At the risk of sounding incredibly naive … is this embedding?