“Chat over documents” is generally handled by splitting the document into chunks, creating embeddings on the chunks, and storing them in a vector database. Then on a user query, create embedding of the query, perform a similarity search in the vector database to retrieve the chunks, then put those relevant chunks in the prompt. There’s a bunch of tutorials on YouTube and elsewhere on how to do this. LangChain is likely the easiest way to get started with a proof-of-concept (although wouldn’t recommend using it for a large-scale production app).