I have AskYourPdf lately for some pdf documents that are related to math and that were partly wirtten with LaTeX. However the plugins do not seem to catch everything from the pdf. Does anyone know how I could overcome the problem? Obviously I cannot copy paste the pdf and put into the chat, so do you have any suggestions what I could do? Thanks
It’s an ongoing issue without a good solution currently.
I’m doing a Statistics PhD do as you might imagine all of the papers I read are extremely heavy with mathematics.
Of all the plugins I’ve tried, none of them have been able to parse the math environments.
Honestly, your best bet may be to find the paper on arxiv, hope the authors uploaded the LaTeX, convert it to markdown, put it into a vector database and roll your own interface through the API.
I am doing a masters in economics and I also need it for econometrics. And it also does not properly navigate in math environments. It spits out nonsense in some cases.
Yeah, I will maybe try it that way. If you find any god plugin or any other method how to get around the problem, please let me know here.
https://mathpix.com/ does a very good job at this, but it’s not free…
Attaching math as an image works surprisingly well. Just take a picture/screenshot of your equation, problem, or diagram, and let the vision function work its magic. Ilya was recorded saying that vision doubled the accuracy of their GPT model in math tests. I’ve had it solve complex, challenging integrals this way wthout sweat.
However, I would advise against attaching all pages from long and complex documents as images because the OCR funcionality is still unreliable and may miss some steps along the way, especially if page formatting is convoluted, and particularly for those fragments in the middle of the text (e.g.: not at the start or the end).
If you cannot afford to use vision, I would suggest sticking to LaTeX documents as stated earlier by @elmstedt because their structure is designed to be machine readable, and the process of creating a LaTeX document is similar to programming/coding, so this helps with automated AI understanding.
For the most part I agree with your answer.
but am curious as to why convert to markdown?
Is that something you picked up from others?
Something you have used?
Something that you have seen metrics on that note it gives better results?
Can you share a reference or link. I ask because the reference or link may provide more details.
I get the gist of the idea as it is true that starting with LaTeX will give better results for trying to create data of a math expression for use with semantic search or even just recreating the individual Greek letters and math operators, etc. but I would not agree that LaTeX was designed to be machine readable, I would consider that more of a given than a side effect or desired goal.
I cannot yet include links in my replies, but I could send you a DM
Side effect granted. Would not start an argument about that right now.
AI today and vision of the future conference between Jensen Huang (Nvidia) and Ilya Sutskever (OpenAI) the day after GPT4 was released (second half).
As an afterthought, you may also want to try and delve into OpenAI’s Clip research: CLIP: Connecting text and images
Markdown is much more dense than \LaTeX so it uses fewer tokens.
And, while I’m sure the model has seen lots of \LaTeX, it has always (anecdotally) seemed more fluent in Markdown.
Beyond that, a lot of \LaTeX is filled with custom commands that I am not confident in the model handling well even if given the full document.
The link needed more than just a like so that others would know to watch the video.
Just want to add that if you have a latex document, pandoc can be used to convert it to just about anything, including markdown. Only thing ive tried (more than once, unfortunately i always forget) is pdf → anything. It’ll make a pdf just fine, but coming from PDF is always annoying.
Check out Mathpix. This works very well for mathematical documents. Best I have found