There are many solutions to extracting text from PDFs but to extract technical expressions such as math, proofs, type rules, etc, from a PDF and in a way that one could say chat with the PDF and the AI would also understand the technical expressions is not AFAIK available.
Great question,
Iâve puzzled with this for a long time myself.
I would always advice you to try and obtain the source files for the document (usually a .tex file)
If that isnât possible you can use an image2text AI to convert it to latex, but this isnât always reliable, the only service Iâve tried that could do this reliably is mathpix:
Mathpix is a paid service, but it has an API that accepts PDFs and convert them to latex, math included.
I would be very interested if anyone knows of an open source equivalent that does the same.
I did a bit of searching, and found this project:
https://lukas-blecher.github.io/LaTeX-OCR/
I havenât tested it myself, but it looks promising
Microsoftâs Math Solver app can do formulaic OCR, on android and iOS but I donât know if it can be used as part of a toolchain.
In searching Microsoftâs Math Solver found this
but it does not expose the converted image for use with a API call, can you connect the dots if I missed something.
Side note: You private message response is still the closest I have seen to an answer for my other question. Since you noted it you should get the credit in this post by adding it as a reply.
Took me a bit to find the code. For others seeking the code for LaTeX-OCR
Sure, the link is
Which is an OCR to Latex conversion system with a personal use option.
On the Microsoft one, I wondered if perhaps the AI/OCR system used in that app was one of the sub components offered by the Azure ML suite.
I know youâll be after open source, but Abby fine reader is great in our experience for a variety of tough OCR extraction problems and easy to integrate into your workflows
Took a look at the Abby fine reader site. While it can convert a PDF to text with the reader, I did not see that it could convert images with technical expression(s) into a format that an AI can understand.
Example of a technical expression, in this specific case typing rules
Did I miss something on the site?
Iâm sorry but we have no experience extracting typing rules with Abby, but I would suggest to contact them to confirm. Very good at extracting from image pdfs
Hereâs another link for the inf-project, seems like itâs also paid tho.
Yes typing rules are used by an exclusive group.
Here is another example of a technical expression more familiar to many.
(ref)
While that is an image of a fraction, the representation is in MathML
<mfrac>
<mrow>
<mn>3</mn>
<mi>x</mi>
<mo>â<!-- â --></mo>
<mn>2</mn>
<mi>x</mi>
<mi>y</mi>
<mo>+</mo>
<mi>c</mi>
</mrow>
<mrow>
<mi>y</mi>
<mo>â<!-- â --></mo>
<mn>1</mn>
</mrow>
</mfrac>
which has enough metadata such that an AI should be able to understand that it is a fraction.
Passing the MathML fraction above to ChatGPT alone shows that ChatGPT May 24 Version understands it
Prompt
<mfrac>
<mrow>
<mn>3</mn>
<mi>x</mi>
<mo>â<!-- â --></mo>
<mn>2</mn>
<mi>x</mi>
<mi>y</mi>
<mo>+</mo>
<mi>c</mi>
</mrow>
<mrow>
<mi>y</mi>
<mo>â<!-- â --></mo>
<mn>1</mn>
</mrow>
</mfrac>
Completion
The expression you provided is a fraction. Letâs break it down and simplify it step by step.
The numerator of the fraction is: 3x - 2xy + c
The denominator of the fraction is: y - 1
So the fraction can be written as:
(3x - 2xy + c) / (y - 1)
Since there is no specific instruction regarding what you would like to do with this expression, I assume you want to simplify it or perform some operations. Please let me know what specific operation or simplification you would like me to assist you with.
Ericâs example in latex:
\frac{3x-2xy+c}{y-1}
Found this fairly interesting bit of OpenCV code
Iâm not convinced it will be able to cope with complex notation, but throwing it in here as a starting point for someone else
It was interesting waking up to this.
Iâve been looking at this very issue over the past few months, and Iâve tried nearly all of the solutions listed here.
My conclusion is that weâre just not there yet. Itâs pretty close in many ways, but there is no single solution that is a one-stop-shop for reading technical PDFs.
Mathpix is the best for extracting information from equation and inline environments, but it can be extremely costly to process a large number of pages and it doesnât handle other environments (algorithm) particularly well.
Even then, you still need to review the work that comes out of Mathpix because itâs only about 90% correct (depending on complexity of course).
I suspect we will see a big push in the open-source space over the next year or two as more people want to feed technical papers into LLMs.
I emailed sciaccess a few months ago to see if they offered an academic discount, and never got a reply.
Didnât inspire confidence.
Good point and one should seriously consider when understanding technical papers.
Also for others that did not notice the jobs posting on Arxiv, seems they are expanding quite a bit.
Here is an example of a PDF with typing rules available on Arxiv.
âGPC: A Pattern Calculus for Property Graphsâ by Nadime Francis, AmĂ©lie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, Domagoj VrgoÄ
It is available in a few different formats.
The typing rules being demonstrated are in Figure 2.
Text source of typing rules (Click triangle to expand)
This is from file 2210.16580
which is downloaded by clicking Download source
on the Format selector page
\begin{figure*}\centering
\newcommand{\custominference}[2]{$\displaystyle\inference{#1}{#2}$}
%\custominference{}{\nodelit{x\,[:\ell]} \vdash x:\tnode}
\custominference{}{\nodelit{x} \vdash x:\tnode}
\hfil
\custominference{}{\nodelit{x:\ell} \vdash x:\tnode}
\hfil
%\custominference {}{\arclit{x\, [:\ell]}\ \vdash x:\tedge}
\custominference {}{\arclit{x}\ \vdash x:\tedge}
\hfil
\custominference {}{\arclit{x :\ell}\ \vdash x:\tedge}
\hfil
\custominference{ x \notin \var(\pat)}{x=\restrictor\ \pat \vdash x:\tpath}
%\custominference{\pat \vdash x:\tau \quad p \notin \var(\pat)}{p=\pat \vdash p:\tpath,\ x:\tau}
\bigskip
\custominference {\pat\vdash z:\tau}{\pat\quantlit{n..m} \vdash z:\tlist(\tau)}
\hfil
\custominference {\pat\vdash z:\tau}{\restrictor \pat \vdash z:\tau}
%\hfil
%\custominference{\pat \vdash z:\tau \quad z\neq x}{x=\pat \vdash z:\tau}
\hfil
\custominference{\pat \vdash z:\tau \quad z\neq x }{ x= \restrictor \pat \vdash z:\tau}
\bigskip
\custominference {\pat\vdash x:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat \vdash x.a=c:\tbool} \hfil
\custominference {\pat\vdash x:\tau\quad \pat\vdash y:\tau'\quad\tau,\tau'\in\set{\tnode,\tedge}}{\pat \vdash x.a=y.b:\tbool}
\bigskip
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta':\tbool}{\pat \vdash \theta\wedge\theta':\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta':\tbool}{\pat \vdash \theta\vee\theta':\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool}{\pat \vdash \neg\theta:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash z:\tau}{\pat\condlit{\theta} \vdash z:\tau}
\bigskip
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tmaybe(\tau)}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\hfil
\custominference {\pat_1\vdash z:\tmaybe(\tau) \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\bigskip
\custominference {\pat_1\vdash z:\tau \quad z \notin \var(\pat_2) }{\pat_1 + \pat_2 \vdash
z: \tau?}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \notin \var(\pat_1) }{\pat_1 + \pat_2 \vdash z: \tau?}
\bigskip
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat_1 \, \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad z \not\in\var(\pat_2)}{\pat_1 \, \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \not\in\var(\pat_1)}{\pat_1 \, \pat_2 \vdash z:\tau}
\bigskip
\custominference {Q_1\vdash z:\tau \quad Q_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{Q_1 , Q_2 \vdash z:\tau}
\hfil
\custominference{Q_1\vdash z:\tau \quad z\not\in\var(Q_2) }{Q_1, Q_2 \vdash z:\tau}
\hfil
\custominference{Q_2\vdash z:\tau \quad z\not\in \var(Q_1) }{Q_1, Q_2 \vdash z:\tau}
\caption{Typing rules for the \gpml type system.}
\label{fig:type-system}
\end{figure*}
Does ChatGPT understand the TeX? (Click triangle to expand)
Prompt
\begin{figure*}\centering
\newcommand{\custominference}[2]{$\displaystyle\inference{#1}{#2}$}
%\custominference{}{\nodelit{x,[:\ell]} \vdash x:\tnode}
\custominference{}{\nodelit{x} \vdash x:\tnode}
\hfil
\custominference{}{\nodelit{x:\ell} \vdash x:\tnode}
\hfil
%\custominference {}{\arclit{x, [:\ell]}\ \vdash x:\tedge}
\custominference {}{\arclit{x}\ \vdash x:\tedge}
\hfil
\custominference {}{\arclit{x :\ell}\ \vdash x:\tedge}
\hfil
\custominference{ x \notin \var(\pat)}{x=\restrictor\ \pat \vdash x:\tpath}
%\custominference{\pat \vdash x:\tau \quad p \notin \var(\pat)}{p=\pat \vdash p:\tpath,\ x:\tau}
\bigskip
\custominference {\pat\vdash z:\tau}{\pat\quantlit{nâŠm} \vdash z:\tlist(\tau)}
\hfil
\custominference {\pat\vdash z:\tau}{\restrictor \pat \vdash z:\tau}
%\hfil
%\custominference{\pat \vdash z:\tau \quad z\neq x}{x=\pat \vdash z:\tau}
\hfil
\custominference{\pat \vdash z:\tau \quad z\neq x }{ x= \restrictor \pat \vdash z:\tau}
\bigskip
\custominference {\pat\vdash x:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat \vdash x.a=c:\tbool} \hfil
\custominference {\pat\vdash x:\tau\quad \pat\vdash y:\tauâ\quad\tau,\tauâ\in\set{\tnode,\tedge}}{\pat \vdash x.a=y.b:\tbool}
\bigskip
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\thetaâ:\tbool}{\pat \vdash \theta\wedge\thetaâ:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\thetaâ:\tbool}{\pat \vdash \theta\vee\thetaâ:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool}{\pat \vdash \neg\theta:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash z:\tau}{\pat\condlit{\theta} \vdash z:\tau}
\bigskip
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tmaybe(\tau)}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\hfil
\custominference {\pat_1\vdash z:\tmaybe(\tau) \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\bigskip
\custominference {\pat_1\vdash z:\tau \quad z \notin \var(\pat_2) }{\pat_1 + \pat_2 \vdash
z: \tau?}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \notin \var(\pat_1) }{\pat_1 + \pat_2 \vdash z: \tau?}
\bigskip
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat_1 , \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad z \not\in\var(\pat_2)}{\pat_1 , \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \not\in\var(\pat_1)}{\pat_1 , \pat_2 \vdash z:\tau}
\bigskip
\custominference {Q_1\vdash z:\tau \quad Q_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{Q_1 , Q_2 \vdash z:\tau}
\hfil
\custominference{Q_1\vdash z:\tau \quad z\not\in\var(Q_2) }{Q_1, Q_2 \vdash z:\tau}
\hfil
\custominference{Q_2\vdash z:\tau \quad z\not\in \var(Q_1) }{Q_1, Q_2 \vdash z:\tau}
\caption{Typing rules for the \gpml type system.}
\label{fig:type-system}
\end{figure*}
Completion
The figure you provided presents a set of typing rules for the \gpml type system. These rules define how different patterns and expressions are typed in the system. Letâs go through each rule:
-
The first two rules state that a node or an arc literal with or without a label is typed as a node or an edge, respectively.
-
The third rule states that if a variable
x
is not bound in the pattern (\pat
), thenx
can be restricted to a path and typed as a path (\tpath
). -
The fourth rule states that if a pattern
\pat
is typed asz
of type\tau
, then the pattern\pat\quantlit{n..m}
is typed asz
of typeList(\tau)
. -
The fifth and sixth rules deal with restricting patterns. If a pattern
\pat
is typed asz
of type\tau
, then the pattern\restrictor\ \pat
is also typed asz
of type\tau
. -
The seventh rule states that if a pattern
\pat
is typed asx
of type\tau
, where\tau
is either a node or an edge, then the expressionx.a=c
is typed as a Boolean (\tbool
). -
The eighth rule states that if a pattern
\pat
is typed asx
of type\tau
and another patterny
is typed asy
of type\tau'
, where both\tau
and\tau'
are either nodes or edges, then the expressionx.a=y.b
is typed as a Boolean (\tbool
). -
The ninth, tenth, eleventh, and twelfth rules deal with Boolean operations. They state that if patterns or expressions
\pat
and\theta
are typed as Boolean values, then the conjunction (\wedge
), disjunction (\vee
), negation (\neg
), and conditional (\condlit{\theta}
) expressions are also typed as Booleans. -
The thirteenth rule states that if patterns
\pat_1
and\pat_2
are both typed asz
of type\tau
, then the pattern\pat_1 + \pat_2
is also typed asz
of type\tau
. -
The fourteenth, fifteenth, and sixteenth rules deal with patterns that include optional values. They specify how patterns with optional values are typed based on the presence or absence of the value.
-
The seventeenth, eighteenth, and nineteenth rules deal with concatenating patterns and expressions. They specify the typing for the concatenation operator (
,
). If patterns or expressions\pat_1
and\pat_2
are typed asz
of type\tau
, then\pat_1 \, \pat_2
is also typed asz
of type\tau
. Additionally, if the variablez
is not bound in either\pat_1
or\pat_2
, then the typing remains the same. -
The twentieth, twenty-first, and twenty-second rules deal with concatenating queries (
Q_1
andQ_2
). They specify the typing for the concatenation operator (,
). If queriesQ_1
andQ_2
are both typed asz
of type\tau
, thenQ_1 , Q_2
is also typed asz
of type\tau
. Additionally, if the variablez
is not bound in eitherQ_1
orQ_2
, then the typing remains the same.
These rules define the typing behavior of various patterns and expressions in the \gpml type system.
Personal note:
In the completion it notes The first two rules state
but it really should be The first four rules state
. The first rule is for a Node without a type, the second is for a Node with a type, the third is for an edge without a type and the fourth is for an edge with a type. Thus the number for the remainder of the rules is off but if you have a bit of understanding of how to read typing rules it is easy to match up the description with the typing rule.
PostScript version of second typing rule (Click triangle to expand)
This is from file 2210.16580v1.ps.gz
which is downloaded by clicking Download PostScript
on the Format selector page
727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6
b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
Does ChatGPT understand PostScript (Click triangle to expand)
Note: Only the PostScript for the second type rule is used as giving ChatGPT all of the PostScript for the typing rules would exceed the token limit.
Prompt
The following is a portion of PostScript, explain what you can.
727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6
b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
Completion
The portion of code you provided appears to be written in the PostScript programming language. PostScript is a page description language commonly used for printing and graphics.
Letâs analyze the code line by line:
727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6 b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
-
727 190 654 5 v
: This line consists of four numeric values (727
,190
,654
, and5
) followed by the operatorv
. These values are likely coordinates or parameters related to positioning or transformations. Without more context, itâs difficult to determine their exact purpose. -
290 w
: This line contains a numeric value (290
) followed by the operatorw
. Again, without more context, itâs unclear what this line represents. -
FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6 b FN(\272)25 b(
)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p`: This is a series of instructions involving various operators and string literals. Hereâs a breakdown of the individual components:FN(\271)
: This is likely a font-related instruction, specifying a particular font with a parameter\271
.q
: This operator saves the current graphics state.FH(G)27 b
: This could be another font-related instruction specifying a different font (G
) with a parameter27
. Theb
operator might be related to setting the font weight or style.FF(:)21 b
: This is likely a command to print or display a colon (:
) using the current font. The21 b
might be related to font size or position.FH(\022)6 b
: This could be another font-related instruction, specifying a different font with a non-printable character\022
. The6 b
might be related to font size or position.FN(\272)25 b
: This could be another font-related instruction specifying a different font with a parameter\272
. The25 b
might be related to font size or position.(
)18 b: This is likely a command to print or display an opening parenthesis (
() using the current font. The
18 b` might be related to font size or position.FH(G)28 b
: This could be another font-related instruction specifying a different font (G
) with a parameter28
. Theb
operator might be related to setting the font weight or style.FF(:)21 b
: Similar to the previous occurrence, this is likely a command to print or display a colon (:
) using the current font. The21 b
might be related to font size or position.Fw(No)r(de)p
: This appears to be a command to print or display the string âNoâ using a specific font (w
). The subsequent charactersr(de)p
are not immediately clear without additional context.
Overall, without further context or knowledge of the surrounding PostScript code, itâs challenging to provide a definitive interpretation or explanation of the code.
Personal note.
Having parsed PostScript by hand know that Fw(No)r(de)
is actually the word Node
in the second typing rule. While I did not expect ChatGPT to understand everything about the PostScript snippet, was surprised that it did not recognize that as the word Node
.
For those of us without experience with typing systems, can you tell us how well it understood the TeX code?
As with most responses by ChatGPT it depends upon your level of knowledge. For a beginner they do very good but for a expert they would probably find more wrong than correct. But as we know sometimes just getting over the initial hump for learning when there is nothing and having some info as demonstrated in the example will get someone interested enough to keep going.
A much better example to answer your question would be to use the typing rules for a functional programming language like F# or Haskell where the typing rules can be demonstrated also with code and the types included in the function signature.
If I find such I will give it a try and if it works will post as a reply.
Update on the search.
I spent about an hour after posting that looking and the language with the best typing rules published was for ML but they were to complex for the beginner.
However typing rules for Lambda Calculus are very easy to understand (ref) but the problem is that many do not understand Lambda Calculus. Typed Lambda Calculus might be better but then the typing rules are a bit more complex.