Searching a YouTube transcript with embeddings

using embeddings endpoint to search a youtube transcript with timestamps

Steps:

  1. get embeddings working in typescript :slight_smile:
  2. embed the transcript with babbage-search-document and store it to disk:
await embeddings.createEmbeddings(transcriptData.text)
  1. embed the queries with babbage-search-query
await embedQuery(queries, 'babbage-search-query', apiKey!)
  1. use cosine-similarity to compare queries and docs offline :slight_smile:
const results = await search(docEmbeddings!.embeddings, queryEmbedding!.embeddings, 3)

result:

qry: algorand
{
  original: {
    text: 'present this event for Algorand tonight.\n' +
      'Thank you very much for being here. My',
    start: 15.69,
    duration: 5.13
  },
  text: 'present this event for Algorand tonight.\n' +
    'Thank you very much for being here. My'
}
qry: italy
{
  original: {
    text: 'Everybody buonasera, good evening,\nit is a pleasure, honor to host and',
    start: 6.54,
    duration: 9.15
  },
  text: 'Everybody buonasera, good evening,\nit is a pleasure, honor to host and'
}

source repo (planning on expanding this repo greatly, star it to follow along!)

8 Likes

Very interested in this!