Web Q&A embeddings - turorial

oggee61 · November 13, 2023, 1:05pm

Hello everyone,

I recently went through the tutorial on Web Q&A embeddings provided by OpenAI (Web Q&A - OpenAI API). While navigating through it, I encountered several challenges, likely due to the fact that it hasn’t been updated since the release of version 1.0.0. However, I successfully managed to work through these issues. At present, the only hiccup I’m facing is that it appears the embedded data isn’t being integrated properly. Below is the code I’m using to execute the call:

The current issue I’m encountering is that the system is unable to identify the latest embedding model, a capability that, according to the tutorial, should function seamlessly. This unexpected difficulty is something I’m looking to resolve.

Any help is apricated, thanks in advance!

Foxalabs · November 13, 2023, 1:28pm

Hi and welcome to the Developer Forum!

Can you provide logs of the results? Try to include data that shows what was sent in and what was pulled back and any error messages you may receive.

oggee61 · November 13, 2023, 4:04pm

Hi Foxablio,

Thanks for quick response, as a fresh account I could only attach one file to my post, here is the output for the 3 example questions as posted in the tutorial:

It seems like the model is not able to access the embedded data

Foxalabs · November 13, 2023, 4:12pm

Ok, there seems to be a disconnect, the AI is not aware of your data, you need to make use of either the OpenAI retrieval system or your own vector database and perform a retrieval on that.

Have you followed every step to the letter in the example on eb Q&A - OpenAI API?

It seems like you may have missed a large section of it out.

oggee61 · November 14, 2023, 10:52am

I have followed the whole tutorial, but as it hasn’t been updated after the release of 1.0.0, I had to change few lines. Notably the embedding_utils didn’t work so I have changed the create_context function a bit

def create_context(
    question, df, max_len=1800, size="ada"
):
    # Get the embeddings for the question
    q_embeddings = openai.embeddings.create(input='test', model='text-embedding-ada-002').data[0].embedding

    # Calculate the distances from the embeddings
    def cosine_similarity(a, b):
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

    df['distances'] = 1 - np.array([cosine_similarity(q_embeddings, emb) for emb in df['embeddings'].values])


    returns = []
    cur_len = 0

    # Sort by distance and add the text to the context until the context is too long
    for i, row in df.sort_values('distances', ascending=True).iterrows():

        # Add the length of the text to the current length
        cur_len += row['n_tokens'] + 4

        # If the context is too long, break
        if cur_len > max_len:
            break

        # Else add it to the text that is being returned
        returns.append(row["text"])

    # Return the context
    return "\n\n###\n\n".join(returns)

I have run the create_context function on its own and the result is:

Any suggestions on what went wrong?

martin.hkvist · February 15, 2024, 4:17pm

You need to change ascending to False, as the cosine similarity measure increases with the similarity. Therefore, you want to add context with high similarity measure first, and sort the df in descending order based on the similarity measure.

Topic		Replies	Views
Issue regarding the tutorial on website Q&A with Embedding API	1	378	June 30, 2024
OpenAI Prototype for Website Q&A with Embeddings does not work Community chatgpt , api	6	878	August 28, 2023
Preparing the dataset for embeddings API	10	5756	December 17, 2023
How to text crawling and text embeddings on huge websites sites API embeddings , chatgpt	2	1969	May 12, 2023
Need some help with context and accuracy Prompting	5	2373	May 27, 2023

Web Q&A embeddings - turorial

Related topics