Hi, I have a bunch of short texts from many people about their experiences in different fields and I want to turn them into embeddings using text-embedding-3 and compare the similarities.
Since some of them are short, I want to contextualize them by adding who answered about what (these are stored in separate columns) but I do not want these background text to be the basis of embeddings.
Are there ways to provide some kind of “system prompt” to clarify the text context, while avoiding the system prompt to be appearing or directly affecting the embeddings the model will generate?
If you have meta information that is not part of the embedding, i.e. it should not be included in the vector when it is created, then you can (with most Vector Databases) include that data as part of the meta fields.
That way, you will be searching on the exact sematics of the important content and you get back the metadate as an extra field.
Do you mean there is a way to get an embedding for a string of text with meta information to contextualize the string?
For example, if I want to use text-embedding-3-large to get a 3072-dimension embedding for the string “python is dangerous” but I would like to make sure it’s about snakes and not about computer science.
Is it possible to somehow say in the “meta” field that it’s about snakes, but these meta string / text themselves are not converted to embeddings?