Contextualized embeddings without contexualizing text

shinoza · November 18, 2024, 7:39am

Hi, I have a bunch of short texts from many people about their experiences in different fields and I want to turn them into embeddings using text-embedding-3 and compare the similarities.

Since some of them are short, I want to contextualize them by adding who answered about what (these are stored in separate columns) but I do not want these background text to be the basis of embeddings.

Are there ways to provide some kind of “system prompt” to clarify the text context, while avoiding the system prompt to be appearing or directly affecting the embeddings the model will generate?

I am thinking of something like this:

def generate_embedding(text, context_text): 
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=f"{context_text} '{text}'"
    )
    return response.data[0].embedding

Thanks!

Foxalabs · November 18, 2024, 7:58am

If you have meta information that is not part of the embedding, i.e. it should not be included in the vector when it is created, then you can (with most Vector Databases) include that data as part of the meta fields.

That way, you will be searching on the exact sematics of the important content and you get back the metadate as an extra field.

shinoza · November 19, 2024, 3:20am

Thank you for your reply!

Do you mean there is a way to get an embedding for a string of text with meta information to contextualize the string?

For example, if I want to use text-embedding-3-large to get a 3072-dimension embedding for the string “python is dangerous” but I would like to make sure it’s about snakes and not about computer science.

Is it possible to somehow say in the “meta” field that it’s about snakes, but these meta string / text themselves are not converted to embeddings?

Thanks

Topic		Replies	Views
Can context be added outside the prompt? Prompting	6	3733	May 12, 2023
Embedding data with prompting API embeddings	1	248	November 20, 2024
How to best prepare a FAQ document for embeddings Community embeddings , chatgpt	5	2300	August 18, 2023
Affect of Prompting in Embeddings and Retention of Data API embeddings	2	121	November 20, 2024
Embeddings on partial text API ada002	1	500	December 20, 2023

Contextualized embeddings without contexualizing text

Related topics