Research assistant

I am working on a project to help me with Q&A on research papers. The idea is to use pinecone for vectorizing the papers. The model with retrieve documents from selected google.drive and send them to pinecone for embedding and then I will have an interface to ask question and receive answers. Is anyone interested in a similar project?

3 Likes

I would be interested in such idea since I am a med student. I am currently working on a summarizer for mp4 college sessions so its easier for students to summarize the important aspects of their classes. My app with yours and viceversa would make great team maybe.

1 Like

Let’s do it! I have Nelson looking at the project also. He has done a google.sheet that can stack up Q&A and I was thinking that my project combined with his could ad some kind of record keeping of the Q&A.

Adding you idea into it would really make it a powerful tool for students.

Are you using whisper to transcribe the mp4 and then making a summarization of the transcript?

Perhaps we can have chat the three of us, Nelson, you, and I to see what we can do? Send me an email: constantin@ostende.nu and I will add Nelson in the conversation if that is ok.

Skickat frĂĄn min iPhone
Constantin

2 Likes

check out explainpaper.com

1 Like

This is awesome as reading assistant! It can be a good feature to have. It’s a good compliment for the tool I am working on. The idea of my tool is that you can upload many articles at once and drawing answers from several articles.

Skickat frĂĄn min iPhone
Constantin

1 Like

Just curious, what do you mean by drawing?
Obtaining answers for the same question from different sources?
Or requesting the AI model to formulate a response by combining multiple data sources?

The goal is to have the model to formulate a response by combining multiple data sources. For example to find a common theme in multiple sources

Skickat frĂĄn min iPhone
Constantin

Interesting. I’m working on something which will be capable of doing that, but I’m far from there.
How long are your articles?

Well I am counting on a median length of 4133 words. How far along are you? I have testing to do now. One problem I have is that the output token length is too small. The input eats up the tokens and I need to work around that because the answers are too short. It stops mid-explanation

Skickat frĂĄn min iPhone
Constantin

A similar task that you seek to complete is a feature part of a bigger project and I’m 3 months away, probably.

I was curious about your use case to learn more about what people are trying to achieve with the help of GPT-3.

Do you use google.collab for your project?

I have some ideas that I want to try out. One idea is to prepare the model with questions related to conducting a literature review and then concatenate the answers of each question in to new separate input data.

So the process could look something like this:

  1. Upload PDF’s to pinecone.io for vectorization
  2. Working through first question for each article by approve or decline it to generate a new answer. Then second question, 3:d and so on.
  3. Now there will be new input data related to each of the question.
  4. Start asking question and draw answer from the new data set that was created.

Do you want to collaborate on something?

Skickat frĂĄn min iPhone
Constantin

I’m not using Google Colab. What I’m working on is a larger commercial product and it covers some use cases you’ve mentioned - more or less.
Right now it’s not in a stage where it can be tested. But I plan to share it on this forum when appropriate and invite people to give it a try for testing it purposes.

I’m interested in the use cases so I can build something useful - this is where I can collaborate.

We are doing something extremely similar for almost the exact same use case in our application BookMapp - Search Semantically Across Multiple PDFs.

Signup here: Login - Knowledge Graph (bookmapp.com)

We go further than this to also create a multidimensional graph of relatedness. Please give this a try. Feel free to DM if more details are needed.

1 Like

Hi!

I built an app for this called bundleIQ.com. And took it a few steps further via AI-powered chrome extension and editor to help write papers while correlating research. Would love to work with you on hooking it into Google Drive. We haven’t done that yet.

Check out this Loom video for example - Unlocking Climate Change Insights with bundleIQ - YouTube.

We built a project that accomplishes this goal. Please checkout and give any thoughts or feedback https://www.humata.ai/

1 Like

we should work together because I know exactly how to make it into a tool that every researcher (academic and business) will want! So far I have actually created something complementary to what you have created. I can upload several articles and analyze them together. If you are up for having a chat and work together give me a sign! Cheers

@danrasmuson - That looks quite impressive on a first try.

If I wanted to use it as part of another project is this a commercial project for license? Or is the code available as open source?

Also I see it is limited to 60 pages. Is that due to server cost? ChatGPT costs? Have you considered making this a subscription project to help cover those costs? Or Open source so users who want to submit longer documents would do so at their cost?

Hi @rkaplan.

Thank you for the feedback! We plan to increase the 60 page limit soon as its main job is to keep our costs low while the product is free.

A subscription seems likely to me, but we haven’t decided yet.

We are open to releasing Humata.ai as an API for it to integrated into another project. If you are open to it could you describe a bit about your usecase?

Thanks.

My use case is to try out the concept of “GPT publishing” i.e. write a monograph on a specific specialty in medicine and use that as a knowledge base on the topic for ChatGPT. Thus the responses from ChatGPT would be reliable since the source of its information would be known.

So basically I would create a website that is a reliable chatbox on a specific topic - in my case I would focus on medicine and develop an authoring template which maximizes the utility of ChatGPT.

Secondarily it am pondering the case of analysis of medical records i.e. for a physician to review a patient’s past medical records and quickly/easily retrieve information about past medical history. HIPAA privacy laws in USA restrict this somewhat; having an API so most of the records can remain local on the user’s computer would be helpful. The local software would also have to be sure any identifying information is not sent to the Humata or GPT servers. So it is a very useful idea but a bit more complex than the first.

ChatGPT is quite capable of doing this currently when small portions of a medical record are used, i.e. up to the current 4000 token limit.

Separate question - I tried using the file in the Dropbox link with your software and I get the error below any time I ask a question. Any idea why this is happening?

Hi @rkaplan. I think that is a great idea. The integration of generative AI into a publishing workflow seems inevitable.

In order for Humata to prove useful for your usecase we would need to expose a flexible APi to your application. This isn’t yet in our short term roadmap.

Hope that helps!