Affect of Prompting in Embeddings and Retention of Data

Hello,

When embedding a piece of text, what effect do prompts have?
1-If we have a piece of medical text and we concat a prompt saying something like “pay close attention to the diagnosis in the following data” will diagnosis data become prominent in the vectors?

2-Can we prompt to “remember and encode” only selecting things in the input data?

In RAG, we usually embed data without any prompt. But some top performing MTEB huggingface models add prompts before embedding,

would prompting during embedding the knowledge base data help?

Any paper on this or any experience/suggestion would be fantastic.

Thank you in advance!

1 Like

Sorry for tagging you, just wanted to see if you have ideas.
@anon10827405 @Diet

Yes, but it depends on the model. I wouldn’t try it with openai’s text embed. you can absolutely forget about ada.

A model that has been trained to follow the context - query pattern might indeed be well suited to what you’re trying to do.

That said, at this time I wouldn’t try to mix different embedding queries in the corpus. You’d likely be better off keeping different corpus queries in different embedding indices.

2 Likes