Novel way to chat with your one or many PDF documents. A multi-step agent approach

Hello everyone, I’ve developed a novel way to interact with your PDF documents!

Existing solutions can be frustrating, so we’ve constructed a deeper, context-aware conversational approach that goes beyond mere vector encapsulation. I think indeed that superior solutions are possible with a more nuanced, multi-step agent approach… We are using gpt-3.5-turbo for that.

Give it a try, it’s free and let me know what you think :

Could can also follow us on twitter or join our discord.

edit July 19: works now also with more than one document at a time…


Well Eric, the tool was build using openAI api so i think it’s very much relevant. You have other posts like mime which are active. Edited to add that it’s using openAI…

1 Like

Didn’t make a big search, but at least all under this post as Related Topics…

As a user of OpenAI community, I find value in posts like that personally. I’m happy to see what tools people create and use them, if they are good. I think topics like this should stay.


@clb do you have any technical details on how you did this as a multi-step agent?


Hey Thanks for asking. Yes I can tell you a bit more.

  1. pre-analysis task(s) (1 or more openai calls depending on the nature of the doc)
    defining type, pre selecting future strategies for search, and caching results. Also embedding of course.
  2. per question task(s): refining strategy depending on conversation (x depth question/answer), vector search being only one of them…

What exactly does your tool do that the others do not? Ask question about pdf and get replies


Asking questions and getting replies is indeed the easy part.

The devil is in the details like often and the type of replies you get. To give you an simple example, if you ask “give me quotes that relate to love” that relates to a book, the strategy to search in a vector index, return the chunks that match and send to openai works quite well. In many other cases not so well. The chunks themselves may need to be selected in a more “intelligent” way.

But the best is to try yourself: I can give you a small example. Upload the full “The great Gatsby” and let’s compare replies to the question “Who is Jay”(1), then “is he in love ?”(2) replies:

  1. Jay Gatsby is a character in the document. He is described as a young man who invented a persona for himself and is faithful to that conception until the end. He is portrayed as someone who came from a poor background but aspires to be successful and wealthy.
  2. Yes, he is in love.

  1. There are multiple characters named Jay mentioned in the PDF file. Could you please provide more context or information about which Jay you are referring to?
  2. If you are referring to Jay Gatsby from “The Great Gatsby” mentioned in 74, then yes, he is in love with Daisy Buchanan.

  1. Jay, also known as Jay Gatsby, is a central character in the book “The Great Gatsby” by F. Scott Fitzgerald. He is a mysterious and wealthy man who throws lavish parties at his mansion in West Egg, Long Island. Jay Gatsby is known for his extravagant lifestyle and his pursuit of Daisy Buchanan, the woman he loves. Throughout the novel, Gatsby’s true identity and his past are gradually revealed, showcasing his complex and enigmatic nature.
  2. Yes, Jay Gatsby is deeply in love. He is infatuated with Daisy Buchanan, a married woman whom he met and fell in love with years before the events of the novel. Gatsby’s love for Daisy is one of the driving forces behind his extravagant lifestyle and his relentless pursuit of wealth and success. Despite the challenges and obstacles that stand in their way, Gatsby remains devoted and passionately in love with Daisy throughout the story. His love for her is a central theme in “The Great Gatsby.”

All tools are giving quite wildly different answers depending on their tuning and strategies so they are not equivalent. Of course it’s easy to cherry pick instances where one tool perform well and other not but you should judge the differences between the tools on wide benchmarks (we are ourselves doing that all the time).

So to answer your question, I believe to this day we have a better tuning and strategies (still not perfect but we are improving them every day).


Do you have a paper or documentation for this method? Or is it more proprietary?

Btw, looks like it gets pretty awesome results.


can you please share more details about tuning? is it hard prompting strategy? do you use prefix tuning or soft prompting? have you tried other base models?



No internals doc link sorry. I’m not against releasing some of the docs, methods and code as open source in the long run but It’s too early for that. In the meantime, I will try to disclose as much as I can here to your questions.

1 Like

soft prompting but the harder part is context selection


Cuando solicité un examen de opción múltiple, las respuestas se presentaron de forma desordenada. Aún queda mucho camino por recorrer y mejorar.

I’m of the opinion until we get large token models that can take in the entire document, these document agents are a bit hit or miss. The miss part comes when the large 60k token document references the ideas/concepts throughout the entirety of the document, so there’s no good way to minimise the text thrown to the AI. An example of this is a piece of legislation, where it covers a topic in great detail and hence the language used is quite repetitive. For some documents that are structured with concepts localised in sections of the document though they seem to work well.

Here’s an example I struggled with using my own internal built solution

It’s true that large context window remove the need to use vector search but you need to take into account that the same problem we have today with large document resurface when you need to chat with a large collection of documents.

Also even context window large enough to contain all data may benefit from multi steps (agent) approach. The same as human, a task need to be divided into sub-tasks to better succeeded.

1 Like

I suppose it depends how attention is divided and how the large contexts are structured, having a super large input context does not necessarily equal a super large output, and I think there will be significant trade-offs in terms of accuracy and inference relevance if the reference material for your query is distributed across the input.

I think larger context models will not be GPT-4 level across the entire text, as in, a billion token model will be a very different beast compared to a smaller model with full attention to the whole thing. If you want that then the math says you need n^2 memory and compute, you can do tricks to expand the size, but it will always come at a cost unless you have compute to cover it.


Agree. That kind of reinforce the argument of a multi step approach: a basic strategy (that we are also experimenting) is to use large context window to structure and select content that can be sent to better models…

1 Like

Agreed, unless there is a radical redesign of how ML works, and neural nets are not the thing used, you still have the fundamental problem of very large matrices that need to be analysed.

I see the future (short to medium term) very much being chorus based, collections of specialist models doing their own dedicated task at domain expert level and communicating with a central controller, everything from translation of input language to some agreed standard and back again to logic, math, physics, creative language, etc, etc…

If that kind of a system can be built and trained to work harmoniously with a very large multi trillion token context, I think you basically have AGI at that point, possibly even ASI.

1 Like