I am new here. I am trying to use our paper data repository to train the bot to answer more topic-specific questions. I saw that it is impossible to load PDFs, but you need to do it with JSON. However, before I start this journey, which will take certainly weeks to optimize, I want to be sure that this is the right way to go.

I want to use the data repository of peer-reviewed papers so that allows me to fine-tune the IPA bot to give more insightful answers about how proteins interact with other proteins that are contained in the user question. Is that even possible?

It looks like embeddings will be a much better approach than fine-tuning for your use case.

A lot of projects have also been launched lately that enable question answering based on provided documents.

