Hi everyone,
I’m working on building a custom AI-powered chat assistant that can answer user questions based on product documentation files that I maintain.
Here’s what I have:
- A large number of product-specific documents stored in a directory.Example:
/docs/
├── A.pdf
├── B.mdx
├── C.pdf
├── D.mdx
- Each file corresponds to a unique product (e.g., A, B, C, D).
- I want the assistant to answer queries like:
- “B is showing an error, what should I do?”
- “How do I configure product A?”
- “What’s the installation process for D?”
- The assistant should search and understand the content of these documents and provide relevant, accurate answers.
My tech stack:
- Next.js (v15+ App Router)
- OpenAI GPT-4 API
- Documents are in .pdf and .mdx formats
What I’m trying to achieve:
- Ingest the documentation (PDF or MDX) into a searchable format (embeddings, vector DB, etc.).
- Connect OpenAI GPT-4 with a retrieval mechanism so the model can reference the documents.
- Provide a frontend chat interface where users can ask natural language questions.
- Build everything within a Next.js App Router project (not pages directory).
Where I need help:
- What’s the best approach to process and chunk PDF + MDX content for embedding?
- What’s a good vector database that works well with local files and integrates nicely with Next.js (e.g., pgvector, Qdrant, etc.)?
- How should I structure my app in App Router? (e.g., server actions, API routes, Vercel AI SDK?)
- Any open-source templates or examples that come close to this use case?
- Best practices to optimize response quality and context relevance when using OpenAI + RAG setup?
If anyone has done something similar or can point me to solid tutorials/examples, I’d greatly appreciate it!
Thanks in advance