I would like to do semantic search on audio transcripts. I’ve made a few proof of concepts using embeddings and have some questions.
If I embed the whole transcript (which often doesn’t fit) it seems wasteful. If I create embeddings with simple heuristics like sliding windows, it’s hard to optimize for relevancy given how freeform transcripts can be. Is there a method for identifying the optimal text to create an embedding for?
The transcript document quality is a bit low but there is separate metadata I can add to it. Such as who is speaking, or a summary of the conversation to that point, topic labels, etc. Are there any best practices for cleaning text to create an embedding?