Hi All,
I am working on creating an application like chat with PDF (After uploading a question, when user asks question it should return the answer)
I initially started with simple RAG Approach (steps given below)
- Extracting PDF text content and converting it into embedding (using openAI embedding model) then store it in the chromaDB vector database
- When user asks question that will be converted into embeddings
- With the user question I will retrieve similar text chunks from the chromaDB
- Send context with user question and similar text chunk to the open AI model
- OpenAI model will generate the response
This is working good at initial level. But it is confusing when it comes situation like I want to retrieve image along with the text when user asks question
When I research I found a Multi Modal RAG techniques
My actual questions are
- What about openAI Assistant. It’s still in Beta Can I use that for this will it work on images
- can I continue with the RAG technique or is there any feasible method when compared with RAG.
- If I continue with RAG can I implement this without Framework like langchain or LlamaIndex. Is this possible as a beginner level?
I would appreciate any help on this