using embeddings endpoint to search a youtube transcript with timestamps
Steps:
- get embeddings working in typescript
- embed the transcript with
babbage-search-document
and store it to disk:
await embeddings.createEmbeddings(transcriptData.text)
- embed the queries with
babbage-search-query
await embedQuery(queries, 'babbage-search-query', apiKey!)
- use
cosine-similarity
to compare queries and docs offline
const results = await search(docEmbeddings!.embeddings, queryEmbedding!.embeddings, 3)
result:
qry: algorand
{
original: {
text: 'present this event for Algorand tonight.\n' +
'Thank you very much for being here. My',
start: 15.69,
duration: 5.13
},
text: 'present this event for Algorand tonight.\n' +
'Thank you very much for being here. My'
}
qry: italy
{
original: {
text: 'Everybody buonasera, good evening,\nit is a pleasure, honor to host and',
start: 6.54,
duration: 9.15
},
text: 'Everybody buonasera, good evening,\nit is a pleasure, honor to host and'
}
source repo (planning on expanding this repo greatly, star it to follow along!)