How can I use Embeddings with Chat GPT 3-5 Turbo

Kindly refer to the above. The strategy for making gpt-3.5 respond from a knowledge-base that we provide, seems to involve passing a “context”. We get this context by first passing the user’s question to the Embedding API and getting a response. This is where we pass a vector and get a vector back, which we convert to text. Say, the response is 3 lines of text, which represents the most relevant context for the question asked.

Now, we pass this context, along-with the user’s question to gpt-3.5, so that it generates a human like response from within that context.

The problem is that, if the user’s question is “Is it available in blue?”, if we pass it to the Embedding API, it may generate no context or multiple contexts. This is because of the pronoun used.

Hope I was able to explain myself clearly

1 Like

Yes, so as I understand you, you do not pass any vectors to the model. You only send text to the model, which is what I expected.

Please correct me if I am wrong.

:slight_smile:

Yes, correct. To clarify, I just learnt about this approach while browsing this thread today and realized that there might be a conceptual problem. I have not tried this out yet.

I had the same / similar question. I think this largely will depend on the specific implementation / desired outcome, but there is a question of when to add context from embeddings when there is a conversational aspect. For instance, with davinci (single shot), its more obvious: you simply provide the context with any question and get your answer. When we have a conversational paradigm, it becomes a little trickier. We can use the previously mentioned example:

User: “How much is this shoe?”
GPT: “The shoe is $5”
User: “Is it in blue?”

What would we send to the turbo model for this kind of conversation? I see a few options:

a) Perform an embeddings search for each user query separately and add it with context to the messages array, eg:

user: Please answer the following question with context: {Context 1}.
user: How much is this shoe?
assistant: The shoe is $5.
user: Please answer the following question with context: {Context 2}.
user: Is it in blue?

This approach is problematic because “is it in blue” is unlikely to return accurate semantic matches for the query alone.

b) Perform an embeddings search for all user queries combined (eg “How much is this shoe? Is it in blue?”) and include that context, eg:

user: Please answer the question using the following context: {Total Context}
user: How much is this shoe?
assistant: The shoe is $5.
user: Is it in blue?

c) Pre-process the user queries by prompting GPT Completion to summarize, eg:

Please summarize the following user queries and determine the ultimate question: "How much is this shoe? Is it in blue?"

And get such a response: How much is this shoe and is it in blue?. This can then be searched to determine embeddings context, and we can use the rest of the approach in b) to construct the final completions prompt.

So far I’ve had some success with the “B” approach. However, more complications arise with this approach if the user decides to switch contexts and talk about something else, eg. “What about this shirt?” In that case, the embedding query: “How much is this shoe? Is it in blue? What about this shirt?” may have more difficulty finding relevant documents for context. Using distance heuristics may be appropriate to see if the the embeddings result is “close” enough.

1 Like

As per this documentation from Azure, they have suggested to pass the context as a part of System message:

<|im_start|>system
Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only answer questions using the context below and if you're not sure of an answer, you can say "I don't know".

Context:
- Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series.
- Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.
- At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use
<|im_end|>
<|im_start|>user
What is Azure OpenAI Service?
<|im_end|>
<|im_start|>assistant

I have tried to pass the context related to the latest question as the system message and the I have received satisfactory answer.

Great info. Sharing some techniques I have tried and looking for some feedback from anyone on here.

I have been playing around with a version of B in your msg. Each user prompt gets appended to the previous one in order to query for vectors and add context.

My use case is a POC for a fixed assets management app for accountants. The main question everyone has is how can we strengthen the conversation aspect.

Situation:
user: how much did the MacBook Pros cost?
assistant: $12,000.00
user: how about the Mercedes Sprinter Vans?

Solution (?):

  1. The text to query my pinecone vectors combines all user messages in the convo, with the most recent coming first.

So using the prior example, I query for embeddings using:

“how about the Mercedes Sprinter Vans? how much did the MacBook Pros cost?”.

This at least includes the keyword “cost” with “Mercedes Sprinter Vans”, which will help me find some info regarding the cost of Mercedes Spritner Vans. Point two is where I try to get the right context as much as possible.

  1. I have used a grouping methodology when creating embeddings.

For example, in our schema for a fixed asset (e.g. vehicles or computers), we have many cost related fields (purchase cost, service costs, capitalized costs, misc costs, etc.). So I programmatically create embeddings that say “Cost details for Mercedes Sprinter Van asset: Acquisition cost: $60,000, Service costs: $1,000.00, etc etc”.

This works half decently, but we have found that the key term in the second prompt (in this example, Mercedes Sprinter Van) must be very precise, and the longer the better. Say “sprinter van” instead of “Mercedes Sprinter Van” and you may not get the right context.

  1. I start every convo (on the backend) with a system message that tells ChatGPT to ask the user to be more explicit in their request if it doesn’t have the right context.

Ultimately this is not a 100% satisfactory approach, but the natural language capabilities of ChatGPT are so incredibly good that even when it doesn’t know the context, it makes for still a pretty nice UX.

Would love to hear thoughts from anyone who has made conversations + embeddings work nicely.

Not sure if this will help you, but have you considered creating a separate chat gpt api call (lets call it vectorContext) with a system message has instructions to take the last message, and create a query based on the context of the previous messages. You can even attempt to tell it that it is capable of noticing when the context has switched or been changed by the user.

You pass it the conversation, and it uses the conversation to create a query for the last user request, and attempts to identify when context has switched as well.

If you experiment with what I’m talking about (alter the prompt to get the results you want), I think you will find that it is capable of doing this, and probably capable of outputting the response in a way that you can parse. If it detects context has been changed by the user, you could prune the conversation you feed it to remove unneeded context. As a matter of fact, you could probably tell it to do this for you, if you think about it.

Wow, how did I not recognize the ability to do this. This was a great help, thank you. Still have to get used to the idea of being able to ask questions in plain english…lol.

Now, as each prompt comes in, I ask gpt if it has enough context to answer the question. If so, answer. If not, say “need context”. If the response is “need context”, then I can query pinecone. Otherwise I just return the answer to the client.

And if it’s a follow-up question, such as “how about …”, I ask it to rewrite the question to make it make sense on it’s own in isolation – so it will turn “how about my mercedes benz sprinter” into “what is the cost of my mercedes benz sprinter”, therefore providing the right keywords to query pinecone and get correct context.

Thanks again.

1 Like

Glad to have helped, I should have mentioned GPT4 will be better at a lot of that type of stuff, but 3.5 is plenty capable of a lot. It helps to think about how you could have a bunch of different “presets” for an api call (each designed for a different task) and you can pass the responses around wherever they’re needed.

1 Like

Looking forward to it. There are instances where 3.5 does not respond as direct, e.g. doesn’t say “need context.” when i tell it to explicitly do so, but still impressive.

Also wondering from anyone about designing/structuring your vector embeddings. I’ve done a ton of research but don’t see much about this.

For example, should I create text based on each field on every item in my database? E.g. “Cost of asset xyz: $100” … “Status of asset xyz: disposed”, and so on. This has the advantage of using less tokens each time, but provides less context.

Or should I just stringify the entire record? But then what if it’s too large an object?

Or create logical groupings like I mentioned before, e.g. group all cost related info into text embeddings.

Or all of the above?

I guess you have to work with users to understand what types of questions they will be asking, but wondering if there are any best practices to adhere to for this.

There won’t be an answer without a lot of assumptions on what you’re trying to build.

In most cases you would group any and all information that relates to whatever you are trying to embed (which I believe in your case is a product or asset).

So the questions I have for you.

  • Is semantic searching better than keywords? What are the benefits?

  • How will you manage product updates? Does it make more sense to embed the products by description, and set the metadata as a pointer? This way your price and quantity can be regularly updated without extra work. Or do you not use another database? In which case using a single un-nested object as metadata can lead to some very effective filtering abilities

  • Is a vector database what you are looking for? Text-To-SQL is becoming more reliable, and ChatGPT plugins seems to be a great fit for a lot of cases. I mean, we’re talking about a whole separate database. Is it worth it?

I’m thinking for 3.5, put everything in user. So “user”: “Here is a big chunk of the closest embedding data”, followed up on the the next line of “user”: “How do I blah blah blah question?”

So try “user”/“user” back to back in the API call. Or maybe invert things and put your embedding data in “assistant”, so “assistant”/“user”. I haven’t tried these in detail for 3.5 since they are supposed to fix this and I can use GPT-4, where it is already fixed. And for GPT-4 I used “system” for embedding data. OR just use DaVinci.

Here I got 3.5 to work as “user”/“user” in the Playground:

UPDATE: OK, slight mod. Keep system empty, for both GPT-4 and GPT-3.5. Put embedding in “user” followed up by question in “assistant”. This keeps the AI from apologizing and saying it is not a human blah blah blah. So “user”/“assistant” works for both 3.5 and 4.

GPT-3.5:

GPT-4

Here is DaVinci:

But agree that the “Chat” format of the bot is really strange when all you want to do is draw from embedding data. Hopefully we will get an updated completion endpoint, Einstein anyone?

1 Like

100%

I was really expecting a well defined way to inject context momentarily after ChatML was released. It makes me wonder if they are planning on releasing the ChatGPT plugins for the completion models.

Einstein. Love the name. An iGPT4 model? OpenAI. Yes. Please.

1 Like

OK another update. I think my example above was a little too simple. So I have another variation below where I use system on a passage out of a book, and put this in the “system”, and then have the “user” ask a question, including where it got the information from, and it gave good responses in both cases, including where it got the info from (so I knew it was listening to the embedding).

This seems like a more canonical approach (maybe) wondering what others’ thoughts are. Here are the results using this approach in Playground

GPT-3.5 (on Descartes from the embedding):

GPT-4 (on Descartes from the embedding):

Both are giving good results with this approach (both 3.5 and 4).

Here is another shot at the same data:

GPT-3.5 (on Sartre from the embedding):

GPT-4 (on Sartre from the embedding):

It looks like GPT-4 is much more aware of the context surrounding where “it” is in the whole big picture, whereas 3.5 gets lost and seems to regurgitate.

Finally, and I think I am done here, but it is important to all “embedders” out there, is for the AI to admit it doesn’t know. GPT-3.5 has a very hard time with this. Whereas GPT-4 seems more willing to follow along with the system instructions. This is already known (3.5 is bad at following instructions in “system”). So this could be a consideration for you, and you may want to swap things to “user”/“assistant” for 3.5 until it gets patched.

GPT-3.5 (off-topic, but instructed not to):

GPT-4 (stays on-topic and says “I don’t know” when challenged):

What I found is that the best approach is to treat it like an adult and give it the necessary information to answer the question properly. So my messages was

[
  { 'role': 'user', 'content': prompt },
  { 'role': 'user', 'content': question }
]

where prompt is

"Please the answer the next question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\n.  The question is about my web site, and the context are the sections of the web site that have the embeddings that have the closest cosine distance to the question's embedding. Context: #{context}:"

and question is the text of the question. In some cases this can give a briefer response than the Completion.create api call, but mostly the answers seem to be better.

My suggestion for ChatCompletions, is to always model the prompt portion of request as a “user” message. I haven’t detected any major differences between “user” and “assistant” messages but “system” messages are definitely biased away from so I avoid them.

Distance matters so more recent instructions in the conversation history will be followed over older instructions. I’m assuming the same thing goes for context… You can put your context in any message sent to the model but more recent context “should” trump older context.

I personally only put context in the prompt message (message[0]) but that’s because I don’t want that context to age out as the conversation history ages out of the prompt. Would you want the prompt forgetting what day it is because you put that in a message that you just had to purge?

This is how I did it (modified from the embeddings tutorial)

################################################################################
### Step 11- start here to ask questions for content that has already been embedded
################################################################################

print("\n\n**** STEP 11 ****\n") 

df=pd.read_csv('processed/embeddings.csv', index_col=0)
df['embeddings'] = df['embeddings'].apply(eval).apply(np.array)

df.head()

################################################################################
### Step 12 
################################################################################

def create_context(
    question, df, max_len=1800, size="ada"
):
    """
    Create a context for a question by finding the most similar context from the dataframe
    """

    # Get the embeddings for the question
    q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding']

    # Get the distances from the embeddings
    df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine')


    returns = []
    cur_len = 0

    # Sort by distance and add the text to the context until the context is too long
    for i, row in df.sort_values('distances', ascending=True).iterrows():
        
        # Add the length of the text to the current length
        cur_len += row['n_tokens'] + 4
        
        # If the context is too long, break
        if cur_len > max_len:
            break
        
        # Else add it to the text that is being returned
        returns.append(row["text"])

    # Return the context
    return "\n\n###\n\n".join(returns)

def answer_question(
    df,
    model="gpt-3.5-turbo-0301", 
    question="Am I allowed to publish model outputs to Twitter, without a human review?",
    max_len=1800,
    size="ada",
    debug=False,
    max_tokens=300, 
    stop_sequence=None
):
    """
    Answer a question based on the most similar context from the dataframe texts
    """
    context = create_context(
        question,
        df,
        max_len=max_len,
        size=size,
    )
    # If debug, print the raw model response
    if debug:
        print("Context:\n" + context)
        print("\n\n")

    
    try:
        # Create a completions using the question and context
        prompt=f"Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\nContext: {context}\n\n---\n\nQuestion: {question}\nAnswer:"

        response = openai.ChatCompletion.create( 
            messages=[
                {'role': 'system', 'content': 'You answer questions about bleep bloop.'},
                {'role': 'user', 'content': prompt},
                ],
            model="gpt-3.5-turbo-0301",
            temperature=0 
        )
        return response["choices"][0]["message"]["content"].strip()
    except Exception as e:
        print(e)
        return ""

################################################################################
### Step 13
################################################################################
print ("\nWhat's the difference between bleep and bloop")
print(answer_question(df, question="What's the difference between bleep and bloop?"))


2 Likes

Did you find any solution to this problem? Using GPT3.5 to summarise the question based on pervious questions history seems to be possible. However, this doesn’t lead to 100% accurate outcome if the context is changed?

Bless you. I understood you clearly and saw as you struggled to be understood. You might have fixed this already as it has been months but for the benefit of someone else coming across this post and reading through like I just did, there is a process called “orchestration”. That is where you modify prompts. In the example you gave, you can ask the LLM to summarise its past conversation history, you will then use the entirety of its output to do a vector search to get better context which you can then pass back into the LLM.