That said, I’m not sure if embedding diagrams and tables as images is the best idea. For tables, it’s probably better to extract the rows and columns, and for diagrams and graphs to extract the function and maybe embed some hypothetical data (depending on what you intend to use it for). GPT-4-V can help with that, but it’s not super reliable.
I am probably dreaming in future a bit, but what I will explain what my meaning of “learning” was here.
So basically I have a couple of chemistry books, and I want to optimize certain reactions. This would usually take a person to do a PhD or read books (including the one I have).
Or I could somehow figure out a way to use these LLMs to help me out. But RAG is not best suited here for two reasons. 1) as you mentioned its not the best, and over huge files I guess even less useful so. 2) RAG’s performance is based on the prompting as it will fetch the stuff that relates to prompts (whereas I want it to have knowledge of entire books while thinking of answer)
I tried getting numerical data and having random forest regressor (works okayish) to optimise my reaction but the data I generate is very less to have a deep network that could maybe learn the laws of chemistry (might need tons of data) from basic reaction data.
I am assuming these models do output human-like language but knowledge ingestion is probably quite inhuman and not so great.