Is the OpenAI Embedding working well in the NodeJS?

melodyxpot · July 13, 2023, 11:18pm

Hello everyone.
I am a NodeJS developer and I love OpenAI.
But I have some problems on the OpenAI Embeddings.
I am going to implement PDF reader chatbot but when I embed the pdf data using OpenAI Embedding API and store it to the pinecone, it’s difficult to find the similar data.
Of course working well but I can’t find the result I want to look.
Is NodeJS good for that? I think I can get the better result if I use Python framework but anyway, I don’t think I am getting the best result now.
Please give me some tips if someone knows the solution.
I want to collaborate to make the wonderful world.
Thanks.

wfhbrian · July 13, 2023, 11:28pm

It’s great to hear that you’re enjoying using OpenAI! Regarding your issue with Embeddings, it’s important to note that the quality of the results depends on various factors, including the data you’re embedding.

If you feel that you’re not getting the desired results, make sure that the PDF data you’re embedding is properly preprocessed and represented in a way that captures its important features.

Another thing you can try is augmenting the “search text” you are using to retrieve similar content. One way you can do this is using the HyDE method.

Good luck with your ChatGPT,
Brian

melodyxpot · July 13, 2023, 11:32pm

Thanks a lot.
I think HyDE is helpful maybe.
I will take a look at it

anon22939549 · July 13, 2023, 11:33pm

HyDE works very well for pulling up appropriate documents at the document level.

I can strongly recommend it to anyone struggling with surfacing the correct embeddings.

supershaneski · July 13, 2023, 11:35pm

I use OpenAI’s NodeJS library and so far have no problem using it for embeddings and searching things from the embedded documents but I don’t use pinecone. Is the result very off?

melodyxpot · July 13, 2023, 11:35pm

Can you provide some posts or video tutorial to use HyDE?

melodyxpot · July 13, 2023, 11:36pm

No, not well.
At first, I used MongoDB but the result was terrible, lol.
And I got the idea to use pinecone and the result was better but not that better.

anon22939549 · July 13, 2023, 11:39pm

This is a good resource:

Here is the original paper:

melodyxpot · July 13, 2023, 11:45pm

Thanks a lot.
I am giving thanks to the OpenAI community.

atawsaa · August 16, 2023, 6:13pm

Hello! Could you share how you embedded a PDF using OpenAI/Node and storing it to Pinecone? Thanks!

tabatabaie.mojtaba · February 21, 2024, 10:10pm

Hi, Is it possible to kindly share a sample node js of how embeddings api search code might look like in node js? would really appreciate it

supershaneski · March 6, 2024, 11:55pm

This is an excerpt from the previous Q&A sample in the OpenAI cookbook, just showing the relevant portion. This is the query search part.

// Rank the chunks by their cosine similarity to the search query (using dot product since the embeddings are normalized) and return this
  const rankedChunks = files
    // Map each file to an array of chunks with the file name and score
    .flatMap((file) =>
      file.chunks
        ? file.chunks.map((chunk) => {
            // Calculate the dot product between the chunk embedding and the search query embedding
            const dotProduct = chunk.embedding.reduce(
              (sum, val, i) => sum + val * searchQueryEmbedding[i],
              0
            );
            // Assign the dot product as the score for the chunk
            return { ...chunk, filename: file.name, score: dotProduct };
          })
        : []
    )
    // Sort the chunks by their scores in descending order
    .sort((a, b) => b.score - a.score)
    // Filter the chunks by their score above the threshold
    .filter((chunk) => chunk.score > COSINE_SIM_THRESHOLD)
    // Take the first maxResults chunks
    .slice(0, maxResults);

files is an array of the stored files, in chunks, with its embeddings and text data.

searchQueryEmbedding is the embedding of the query string.

Topic		Replies	Views
Embeddings Documentation for Node.js? API	1	7605	December 3, 2023
OpenAI Embeddings - Search through ~1000 PDFs API embeddings	3	2509	August 28, 2024
Converting PDF Files Text into Embeddings API	4	31744	December 18, 2023
Reducing Cost of GPT 4 by using embeddings Prompting	23	10145	May 4, 2023
Is there any sample code to split a json file into smaller chunks? API	11	10565	October 26, 2023

Is the OpenAI Embedding working well in the NodeJS?

Related Topics