Retrieval from Azure Index (as Vector Database)

sa_labrats · August 8, 2024, 3:23pm

Hi All,

I have been working on Azure + OpenAI integration and how we can use Azure Index as an large scale vector database.

My Use case:

Take pdf documents as input (doc size >= 25 pages)
Divide each document into suitable chunks (used langchain)
Embed each chunk (used openai embedding model)
Upload each chunk embedding onto Azure AI Index

The structure of the Azure Index is:

{
id: "----", (string)
title: "title-of-the-document", (string)
chunk:"content-of-chunk", (string)
chunk_embedding:"chunk-embedding" (SingleCollection)
}

Now, I have stored each chunk with its embedding for each document on the index.

In the retrieval step:

I filter the chunks w.r.t document title.
To fetch the top k relevant chunks based on the user query, I do am using the search approaches like Vector Search, Hybrid Search etc. which is not giving a proper top k chunks that I can further use as my context knowledge.
I have also tried using NumPy cosine similarity search, but that is also not satisfactory.

Also, size of the document is the reason I can not directly upload the document and its embedding since it exceeds the context limit.

Does anyone have any suggestion or approach by which I can:

Use content (chunks, embedding) and user query (content, embedding) to retrieve a proper top k chunks that could then be used as context knowledge.

Thank you

Topic		Replies	Views
Embedding and searching from similar embeddings API	6	6703	October 27, 2023
How I cluster/segment my text after embeddings process for easy understanding? API	13	13181	December 18, 2024
Searching Using Vectors Derived from Long Text Segments in an Embedding Model API embeddings , api	4	2440	December 15, 2023
Sharding in vector databases Community gpt-4	1	454	July 2, 2024
Document Retrieval in Large Database Community embeddings	4	4018	October 27, 2024

Retrieval from Azure Index (as Vector Database)

Related topics