Hello everyone.
I am a NodeJS developer and I love OpenAI.
But I have some problems on the OpenAI Embeddings.
I am going to implement PDF reader chatbot but when I embed the pdf data using OpenAI Embedding API and store it to the pinecone, it’s difficult to find the similar data.
Of course working well but I can’t find the result I want to look.
Is NodeJS good for that? I think I can get the better result if I use Python framework but anyway, I don’t think I am getting the best result now.
Please give me some tips if someone knows the solution.
I want to collaborate to make the wonderful world.
Thanks.
It’s great to hear that you’re enjoying using OpenAI! Regarding your issue with Embeddings, it’s important to note that the quality of the results depends on various factors, including the data you’re embedding.
If you feel that you’re not getting the desired results, make sure that the PDF data you’re embedding is properly preprocessed and represented in a way that captures its important features.
Another thing you can try is augmenting the “search text” you are using to retrieve similar content. One way you can do this is using the HyDE method.
Good luck with your ChatGPT,
Brian
Thanks a lot.
I think HyDE is helpful maybe.
I will take a look at it
HyDE works very well for pulling up appropriate documents at the document level.
I can strongly recommend it to anyone struggling with surfacing the correct embeddings.
I use OpenAI’s NodeJS library and so far have no problem using it for embeddings and searching things from the embedded documents but I don’t use pinecone. Is the result very off?
Can you provide some posts or video tutorial to use HyDE?
No, not well.
At first, I used MongoDB but the result was terrible, lol.
And I got the idea to use pinecone and the result was better but not that better.
This is a good resource:
Here is the original paper:
Thanks a lot.
I am giving thanks to the OpenAI community.
Hello! Could you share how you embedded a PDF using OpenAI/Node and storing it to Pinecone? Thanks!
Hi, Is it possible to kindly share a sample node js of how embeddings api search code might look like in node js? would really appreciate it
This is an excerpt from the previous Q&A sample in the OpenAI cookbook, just showing the relevant portion. This is the query search part.
// Rank the chunks by their cosine similarity to the search query (using dot product since the embeddings are normalized) and return this
const rankedChunks = files
// Map each file to an array of chunks with the file name and score
.flatMap((file) =>
file.chunks
? file.chunks.map((chunk) => {
// Calculate the dot product between the chunk embedding and the search query embedding
const dotProduct = chunk.embedding.reduce(
(sum, val, i) => sum + val * searchQueryEmbedding[i],
0
);
// Assign the dot product as the score for the chunk
return { ...chunk, filename: file.name, score: dotProduct };
})
: []
)
// Sort the chunks by their scores in descending order
.sort((a, b) => b.score - a.score)
// Filter the chunks by their score above the threshold
.filter((chunk) => chunk.score > COSINE_SIM_THRESHOLD)
// Take the first maxResults chunks
.slice(0, maxResults);
files
is an array of the stored files, in chunks, with its embeddings and text data.
searchQueryEmbedding
is the embedding of the query string.