Novel way to chat with your one or many PDF documents. A multi-step agent approach

clb · July 11, 2023, 7:02am

Hello everyone, I’ve developed a novel way to interact with your PDF documents!

Existing solutions can be frustrating, so we’ve constructed a deeper, context-aware conversational approach that goes beyond mere vector encapsulation. I think indeed that superior solutions are possible with a more nuanced, multi-step agent approach… We are using gpt-3.5-turbo for that.

Give it a try, it’s free and let me know what you think : docAnalyzer.ai

Could can also follow us on twitter or join our discord.

edit July 19: works now also with more than one document at a time…

clb · July 11, 2023, 7:39am

Well Eric, the tool was build using openAI api so i think it’s very much relevant. You have other posts like mime which are active. Edited to add that it’s using openAI…

clb · July 11, 2023, 8:02am

Didn’t make a big search, but at least all under this post as Related Topics…

damc4 · July 11, 2023, 8:49am

As a user of OpenAI community, I find value in posts like that personally. I’m happy to see what tools people create and use them, if they are good. I think topics like this should stay.

brettpthomas · July 11, 2023, 4:11pm

@clb do you have any technical details on how you did this as a multi-step agent?

clb · July 11, 2023, 6:50pm

Hey Thanks for asking. Yes I can tell you a bit more.

pre-analysis task(s) (1 or more openai calls depending on the nature of the doc)
defining type, pre selecting future strategies for search, and caching results. Also embedding of course.
per question task(s): refining strategy depending on conversation (x depth question/answer), vector search being only one of them…

tiwale · July 11, 2023, 10:10pm

What exactly does your tool do that the others do not? Ask question about pdf and get replies

clb · July 12, 2023, 5:44am

Asking questions and getting replies is indeed the easy part.

The devil is in the details like often and the type of replies you get. To give you an simple example, if you ask “give me quotes that relate to love” that relates to a book, the strategy to search in a vector index, return the chunks that match and send to openai works quite well. In many other cases not so well. The chunks themselves may need to be selected in a more “intelligent” way.

But the best is to try yourself: I can give you a small example. Upload the full “The great Gatsby” and let’s compare replies to the question “Who is Jay”(1), then “is he in love ?”(2)

pdf.ai replies:

Jay Gatsby is a character in the document. He is described as a young man who invented a persona for himself and is faithful to that conception until the end. He is portrayed as someone who came from a poor background but aspires to be successful and wealthy.
Yes, he is in love.

chatpdf.com:

There are multiple characters named Jay mentioned in the PDF file. Could you please provide more context or information about which Jay you are referring to?
If you are referring to Jay Gatsby from “The Great Gatsby” mentioned in 74, then yes, he is in love with Daisy Buchanan.

docAnalyzer.ai:

Jay, also known as Jay Gatsby, is a central character in the book “The Great Gatsby” by F. Scott Fitzgerald. He is a mysterious and wealthy man who throws lavish parties at his mansion in West Egg, Long Island. Jay Gatsby is known for his extravagant lifestyle and his pursuit of Daisy Buchanan, the woman he loves. Throughout the novel, Gatsby’s true identity and his past are gradually revealed, showcasing his complex and enigmatic nature.
Yes, Jay Gatsby is deeply in love. He is infatuated with Daisy Buchanan, a married woman whom he met and fell in love with years before the events of the novel. Gatsby’s love for Daisy is one of the driving forces behind his extravagant lifestyle and his relentless pursuit of wealth and success. Despite the challenges and obstacles that stand in their way, Gatsby remains devoted and passionately in love with Daisy throughout the story. His love for her is a central theme in “The Great Gatsby.”

All tools are giving quite wildly different answers depending on their tuning and strategies so they are not equivalent. Of course it’s easy to cherry pick instances where one tool perform well and other not but you should judge the differences between the tools on wide benchmarks (we are ourselves doing that all the time).

So to answer your question, I believe to this day we have a better tuning and strategies (still not perfect but we are improving them every day).

codie · July 12, 2023, 6:12am

Do you have a paper or documentation for this method? Or is it more proprietary?

Btw, looks like it gets pretty awesome results.

churqing · July 12, 2023, 9:52am

can you please share more details about tuning? is it hard prompting strategy? do you use prefix tuning or soft prompting? have you tried other base models?

clb · July 12, 2023, 11:45am

Thanks.

No internals doc link sorry. I’m not against releasing some of the docs, methods and code as open source in the long run but It’s too early for that. In the meantime, I will try to disclose as much as I can here to your questions.

clb · July 12, 2023, 5:26pm

soft prompting but the harder part is context selection

herbalcas · July 14, 2023, 11:21pm

Cuando solicité un examen de opción múltiple, las respuestas se presentaron de forma desordenada. Aún queda mucho camino por recorrer y mejorar.

ddrechsler · July 18, 2023, 2:05am

I’m of the opinion until we get large token models that can take in the entire document, these document agents are a bit hit or miss. The miss part comes when the large 60k token document references the ideas/concepts throughout the entirety of the document, so there’s no good way to minimise the text thrown to the AI. An example of this is a piece of legislation, where it covers a topic in great detail and hence the language used is quite repetitive. For some documents that are structured with concepts localised in sections of the document though they seem to work well.

Here’s an example I struggled with using my own internal built solution

clb · July 19, 2023, 12:31pm

It’s true that large context window remove the need to use vector search but you need to take into account that the same problem we have today with large document resurface when you need to chat with a large collection of documents.

Also even context window large enough to contain all data may benefit from multi steps (agent) approach. The same as human, a task need to be divided into sub-tasks to better succeeded.

Foxalabs · July 19, 2023, 12:41pm

I suppose it depends how attention is divided and how the large contexts are structured, having a super large input context does not necessarily equal a super large output, and I think there will be significant trade-offs in terms of accuracy and inference relevance if the reference material for your query is distributed across the input.

I think larger context models will not be GPT-4 level across the entire text, as in, a billion token model will be a very different beast compared to a smaller model with full attention to the whole thing. If you want that then the math says you need n^2 memory and compute, you can do tricks to expand the size, but it will always come at a cost unless you have compute to cover it.

clb · July 19, 2023, 12:45pm

Agree. That kind of reinforce the argument of a multi step approach: a basic strategy (that we are also experimenting) is to use large context window to structure and select content that can be sent to better models…

Foxalabs · July 19, 2023, 12:50pm

Agreed, unless there is a radical redesign of how ML works, and neural nets are not the thing used, you still have the fundamental problem of very large matrices that need to be analysed.

I see the future (short to medium term) very much being chorus based, collections of specialist models doing their own dedicated task at domain expert level and communicating with a central controller, everything from translation of input language to some agreed standard and back again to logic, math, physics, creative language, etc, etc…

If that kind of a system can be built and trained to work harmoniously with a very large multi trillion token context, I think you basically have AGI at that point, possibly even ASI.

Topic		Replies	Views
AI PDF ChatBOT with GPT4 + Document Highlighting Community gpt-4 , api	4	5531	December 17, 2023
Best strategy to dialog with a large dataset API api , semantic-search , assistants-api , api-embedding	9	607	October 27, 2024
CHATGPT API with 200 massive PDF files API	5	1272	December 14, 2024
Is there any way by which I can let GPT-4 API summarize large PDF texts? API gpt-4 , api	10	11488	May 6, 2024
Use case: asking questions about a specific document API	7	2360	June 12, 2023

Novel way to chat with your one or many PDF documents. A multi-step agent approach

Related topics