Embedding data with prompting

How do prompts influence embeddings when processing text?

For instance, if we take a piece of text and prepend a prompt like “highlight key symptoms in the following data,” will the embeddings place more emphasis on the symptoms?

Can prompts be used to “focus on and encode” only specific aspects of the input data?

In Retrieval-Augmented Generation (RAG), embeddings are typically created without prompts. However, some top-performing MTEB models on Hugging Face include prompts during embedding.

Would incorporating prompts during the embedding process for a knowledge base enhance its effectiveness?

If you’re aware of any related papers, experiences, or suggestions, I’d greatly appreciate your input.

Thank you!

1 Like

It’s theoretically possible if the model takes a prompt into consideration. Most models that embed don’t offer prompts, but if they did, I’m confident they would give better results.

You can intuitively think of a prompt as focusing the “next word prediction”, and so this would allow you to focus your embedding, instead of relying on blind semantics.

1 Like