Hi everyone,
I’m building an AI-powered educational assistant that must answer questions grounded in very large PDFs (up to ~4000 pages per document). I’m currently using a RAG-based setup, but I’m facing serious production issues and would appreciate architectural guidance.
Current Problems:
-
Very high latency
-
Responses take 25+ seconds.
-
Sometimes even longer with complex queries.
-
-
Missing information in responses
-
The system retrieves only partial sections.
-
Important parts of the document are ignored.
-
Answers feel incomplete or fragmented.
-
-
Requirements:
-
Fast response time (<3 seconds ideally)
-
High-quality, well-structured answers
-
Accurate grounding with page references
-
Ability to handle 4000+ pages reliably
-
Production-ready and scalable
Any one have a better Ai pipeline, or knew how to implement it in a proper way ?