Langchain would way a good way to go about this. If the files will keep changing and you might have to create embeddings frequently, would be a good solution to use langchain.
However, if the files are static, you oculd use a database like Pinecone to store them long term.
For similarity once you have embeddings, cosine similarity is the way to go