Need help for vector embedding search with Open Ai embeddding and elastic search cosine similarity

nawdeep.sharma · February 15, 2024, 3:29pm

Hello everyone,
Overview: I am creating a search engine using the Open Ai embeddings .I am using the dividing the documents in chunks and then converting the chunks to embedding and storing in Elastic Search(7.10).Now when i am sending a prompt i again convert to the same and do a script base cosine similarity search.
here is an example of elastic search query.

{
                    "script_score": {
                        "script": {
                            "source": "doc['associated_questions_vector_field'].size() == 0 ? 0 : (1  + cosineSimilarity(params.prompt_vector, doc['associated_questions_vector_field']) + (1 + cosineSimilarity(params.prompt_vector, doc['vector_field']))) / 2,
                            "params": {
                                "prompt_vector": query_vector
                            }
                        }
                    }
                }

Problem : Earlier it seemed to be working fine but now as the documents have increased i am getting wrong documents results any idea why that is happening and has anyone faced similar issue.

jorgeintegrait · February 15, 2024, 7:14pm

The new documents may just have content that affects the specific space area of the test queries you were using.

Just in case though, review the process you followed with the new documents to ensure you haven’t introduced an issue. Review the documents themselves after chunking, to ensure the content is well understood and there isn’t a lot of hidden text that is trhowing matching off.

Finally, it is normal that a match that worked in a small controlled subset will not work once you populate the space with many more chunks. What you can do to continuously improve, is to have an adapter that fine-tunes your embedding model as per your ideal pairs of questions and answers. I recommend Chroma and deeplearning.ai’s course on this: (it’s just one hour) Short Courses | Learn Generative AI from DeepLearning.AI

Topic		Replies	Views
Inconsistent Embedding Results for my dataset API embeddings	1	43	November 14, 2024
Improving Semantic Search Engine Accuracy Using OpenAI Embeddings and Llama VectorStoreIndex API embeddings , gpt-4 , fine-tuning , vector-db , semantic-search	1	1015	May 17, 2024
Semantic text search using Embeddings in a web application API	1	766	December 17, 2023
Embedding and searching from similar embeddings API	6	6254	October 27, 2023
Semantic search on large document API	10	3405	January 3, 2024

Need help for vector embedding search with Open Ai embeddding and elastic search cosine similarity

Related topics