When using embedding models, why to ask normal models instead of embedding ones?

oscarcalderon · April 17, 2024, 12:40am

Hi, I’m trying to use an embedding model to work in an isolated fashion, as I want to provide sensitive data that I don’t want to get stored anywhere, so my idea is:

Generate an embedding based on a prompt containing business rules
Store the embedding in my database
Ask to the embedding model a question, by providing the embedding generated previously + my question (about the data contained in the embedding)

I was looking into platform openai com/docs/guides/embeddings and looked into the use cases, I build my code based on the one about asking questions:

examples/question_answering_using_embeddings

there I can see how to generate the embedding, but then at the end, I can see they just ask a normal GPT model (like 3.5 or 4.0) a question and I cannot see at which point the embedding is used or injected into the GPT model.

def ask(
    query: str,
    df: pd.DataFrame = df,
    model: str = GPT_MODEL,
    token_budget: int = 4096 - 500,
    print_message: bool = False,
) -> str:
    """Answers a query using GPT and a dataframe of relevant texts and embeddings."""
    message = query_message(query, df, model=model, token_budget=token_budget)
    if print_message:
        print(message)
    messages = [
        {"role": "system", "content": "You answer questions about the 2022 Winter Olympics."},
        {"role": "user", "content": message},
    ]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    response_message = response.choices[0].message.content
    return response_message

I thought it would be about asking the question to the embedding model, not a normal GPT model. Is there a way I can achieve the result I want (asking an embedding model, that doesn’t remember data, based on an embedding previously generated + sensitive data and questions about that)?

dignity_for_all · April 17, 2024, 2:51am

It seems you are using client.chat.completions.create

To use the embedding model, you need to use the embeddings model as follows:

client.embeddings.create(
  model="text-embedding-ada-002",
  input=query,
  encoding_format="float"
)

https://platform.openai.com/docs/api-reference/embeddings/create

_j · April 17, 2024, 4:26am

To get embeddings vectors, you have to send the full text to OpenAI.

They don’t train models on text, but they do retain it.

And that is the result of an embeddings model, a vector - a list of numbers.

This vector and the numbers inside contains some meaning, but it is only useful for comparing to other vectors returned by the same embeddings AI model, to then find how similar the language inputs were in meaning and topic, using the AI’s natural language understanding backend.

Example Use:

I have a database of 100 tech support questions and answers.

I send the text of each Q&A entity to an embeddings model, obtain a vector result, and store those results in a vector database. That will be my knowledge base.

Then, I have another new question. I can run embeddings on that new input (also requiring it being sent to OpenAI). The resulting vector can be compared against all others in the database (using a dot product algorithm).

The knowledge base search thus can return the highest quality matches of text back to me – or into an AI model that is supposed to answer questions.

wclayf · April 17, 2024, 4:46am

I’m pretty sure there’s no way to do “secret” work unless you just spend a few thousand to buy a top of the line NVIDIA GPU (or similar) and then just do it all locally. All the Cloud services are going to monitory your queries, and so will all the TLA government agencies.

dignity_for_all · April 17, 2024, 4:51am

When it comes to cloud services, whether data is retained or not makes a big difference.

johncain194 · April 17, 2024, 4:54am

How much does it cost if you ever calculate to invest the top of the line NVIDIA GPUs (for at least the minimum) and do all the jobs locally?

wclayf · April 17, 2024, 6:44am

I’m not experienced enough to know the answer to that, but I do know things are painfully slow if you don’t have a good GPU. I mean obviously it depends on the number of parameters in the model, the model architecture, etc., but the main place I’d go to learn all that is Huggingface.

Topic		Replies	Views
Feeding data then ask questions about it API	1	1462	February 28, 2024
About the usage of ChatGPT Embedding API	9	4448	August 18, 2023
Calculating embeddings costs API	8	10229	September 5, 2023
Offline Embedding Options Community embeddings	8	9359	June 23, 2023
Creating a support chat bot for my business API	4	3688	December 18, 2023

When using embedding models, why to ask normal models instead of embedding ones?

Example Use:

Related topics