Which is the best approach to do chat with PDF application (RAG, Fine Tuning, Open AI Assistant)?

Hi All,

I am working on creating an application like chat with PDF (After uploading a question, when user asks question it should return the answer)

I initially started with simple RAG Approach (steps given below)

  1. Extracting PDF text content and converting it into embedding (using openAI embedding model) then store it in the chromaDB vector database
  2. When user asks question that will be converted into embeddings
  3. With the user question I will retrieve similar text chunks from the chromaDB
  4. Send context with user question and similar text chunk to the open AI model
  5. OpenAI model will generate the response

This is working good at initial level. But it is confusing when it comes situation like I want to retrieve image along with the text when user asks question

When I research I found a Multi Modal RAG techniques

My actual questions are

  1. What about openAI Assistant. It’s still in Beta Can I use that for this will it work on images
  2. can I continue with the RAG technique or is there any feasible method when compared with RAG.
  3. If I continue with RAG can I implement this without Framework like langchain or LlamaIndex. Is this possible as a beginner level?

I would appreciate any help on this

Thank you @marcolivierbouch
I am curious about What about Images? will openAI assistant return images along with the text when user asks question.

1 Like

It seems like complex. Can you give me an advice As a beginner whether I can Proceed with the RAG multimodal for image retrieval. or analyzing openAI Assistant will be worth?

1 Like

A post was merged into an existing topic: Best tool to build chatbot using Assistant API - OpenAssistantGPT