Improving Semantic Search Engine Accuracy Using OpenAI Embeddings and Llama VectorStoreIndex

You need to reduce the noise.

Instead of grouping everything together just simply embed the reviews and then link each embedding with the product name.

Separate the concerns.

You can combine embeddings many ways. So it’s better to create groupings of single-concern embeddings. Product names are meaningless for this, but you could do some fun things like see how the embedding engine “feels” about the product names.

I’d recommend Weaviate. They offer a database that accepts your schema and also can embed items individually, and in groups

3 Likes