In searching Microsoft’s Math Solver found this

but it does not expose the converted image for use with a API call, can you connect the dots if I missed something.


Side note: You private message response is still the closest I have seen to an answer for my other question. Since you noted it you should get the credit in this post by adding it as a reply.

1 Like

Took me a bit to find the code. For others seeking the code for LaTeX-OCR

1 Like

Sure, the link is

Which is an OCR to Latex conversion system with a personal use option.

On the Microsoft one, I wondered if perhaps the AI/OCR system used in that app was one of the sub components offered by the Azure ML suite.

2 Likes

I know you’ll be after open source, but Abby fine reader is great in our experience for a variety of tough OCR extraction problems and easy to integrate into your workflows

https://pdf.abbyy.com/

Took a look at the Abby fine reader site. While it can convert a PDF to text with the reader, I did not see that it could convert images with technical expression(s) into a format that an AI can understand.

Example of a technical expression, in this specific case typing rules

Did I miss something on the site?

1 Like

I’m sorry but we have no experience extracting typing rules with Abby, but I would suggest to contact them to confirm. Very good at extracting from image pdfs

Here’s another link for the inf-project, seems like it’s also paid tho.

Yes typing rules are used by an exclusive group.

Here is another example of a technical expression more familiar to many.

image

(ref)

While that is an image of a fraction, the representation is in MathML

   <mfrac>
	<mrow>
	  <mn>3</mn>
	  <mi>x</mi>
	  <mo>−<!-- − --></mo>
	  <mn>2</mn>
	  <mi>x</mi>
	  <mi>y</mi>
	  <mo>+</mo>
	  <mi>c</mi>
	</mrow>
	<mrow>
	  <mi>y</mi>
	  <mo>−<!-- − --></mo>
	  <mn>1</mn>
	</mrow>
  </mfrac>

which has enough metadata such that an AI should be able to understand that it is a fraction.


Passing the MathML fraction above to ChatGPT alone shows that ChatGPT May 24 Version understands it

Prompt


   <mfrac>
	<mrow>
	  <mn>3</mn>
	  <mi>x</mi>
	  <mo>−<!-- − --></mo>
	  <mn>2</mn>
	  <mi>x</mi>
	  <mi>y</mi>
	  <mo>+</mo>
	  <mi>c</mi>
	</mrow>
	<mrow>
	  <mi>y</mi>
	  <mo>−<!-- − --></mo>
	  <mn>1</mn>
	</mrow>
  </mfrac>

Completion

The expression you provided is a fraction. Let’s break it down and simplify it step by step.

The numerator of the fraction is: 3x - 2xy + c
The denominator of the fraction is: y - 1

So the fraction can be written as:

(3x - 2xy + c) / (y - 1)

Since there is no specific instruction regarding what you would like to do with this expression, I assume you want to simplify it or perform some operations. Please let me know what specific operation or simplification you would like me to assist you with.

2 Likes

Eric’s example in latex:

\frac{3x-2xy+c}{y-1}

1 Like

Found this fairly interesting bit of OpenCV code

I’m not convinced it will be able to cope with complex notation, but throwing it in here as a starting point for someone else

It was interesting waking up to this.

I’ve been looking at this very issue over the past few months, and I’ve tried nearly all of the solutions listed here.

My conclusion is that we’re just not there yet. It’s pretty close in many ways, but there is no single solution that is a one-stop-shop for reading technical PDFs.

Mathpix is the best for extracting information from equation and inline environments, but it can be extremely costly to process a large number of pages and it doesn’t handle other environments (algorithm) particularly well.

Even then, you still need to review the work that comes out of Mathpix because it’s only about 90% correct (depending on complexity of course).

I suspect we will see a big push in the open-source space over the next year or two as more people want to feed technical papers into LLMs.

I emailed sciaccess a few months ago to see if they offered an academic discount, and never got a reply.

Didn’t inspire confidence.

Good point and one should seriously consider when understanding technical papers.


Also for others that did not notice the jobs posting on Arxiv, seems they are expanding quite a bit.

Here is an example of a PDF with typing rules available on Arxiv.

“GPC: A Pattern Calculus for Property Graphs” by Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, Domagoj Vrgoč

It is available in a few different formats.

The typing rules being demonstrated are in Figure 2.

image

Text source of typing rules (Click triangle to expand)

This is from file 2210.16580 which is downloaded by clicking Download source on the Format selector page

\begin{figure*}\centering

\newcommand{\custominference}[2]{$\displaystyle\inference{#1}{#2}$}

%\custominference{}{\nodelit{x\,[:\ell]} \vdash x:\tnode} 
\custominference{}{\nodelit{x} \vdash x:\tnode}
\hfil
\custominference{}{\nodelit{x:\ell} \vdash x:\tnode}
\hfil
%\custominference {}{\arclit{x\, [:\ell]}\  \vdash x:\tedge} 
\custominference {}{\arclit{x}\  \vdash x:\tedge} 
\hfil
\custominference {}{\arclit{x :\ell}\  \vdash x:\tedge} 
\hfil
\custominference{ x \notin \var(\pat)}{x=\restrictor\ \pat \vdash x:\tpath}
%\custominference{\pat \vdash x:\tau \quad p \notin \var(\pat)}{p=\pat \vdash p:\tpath,\ x:\tau}


\bigskip

\custominference {\pat\vdash z:\tau}{\pat\quantlit{n..m} \vdash z:\tlist(\tau)} 
\hfil
\custominference {\pat\vdash z:\tau}{\restrictor \pat \vdash z:\tau}
%\hfil
%\custominference{\pat \vdash z:\tau \quad z\neq x}{x=\pat \vdash  z:\tau}
\hfil
\custominference{\pat \vdash z:\tau \quad z\neq x }{ x= \restrictor \pat \vdash  z:\tau}

\bigskip

\custominference {\pat\vdash x:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat \vdash x.a=c:\tbool} \hfil
\custominference {\pat\vdash x:\tau\quad \pat\vdash y:\tau'\quad\tau,\tau'\in\set{\tnode,\tedge}}{\pat \vdash x.a=y.b:\tbool} 

\bigskip

\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta':\tbool}{\pat \vdash \theta\wedge\theta':\tbool} 
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta':\tbool}{\pat \vdash \theta\vee\theta':\tbool} 
\hfil
\custominference {\pat\vdash \theta:\tbool}{\pat \vdash \neg\theta:\tbool} 
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash z:\tau}{\pat\condlit{\theta} \vdash z:\tau} 

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tmaybe(\tau)}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\hfil
\custominference {\pat_1\vdash z:\tmaybe(\tau) \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad  z \notin \var(\pat_2) }{\pat_1 + \pat_2 \vdash 
z: \tau?} 
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \notin \var(\pat_1) }{\pat_1 + \pat_2 \vdash z: \tau?}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat_1 \, \pat_2 \vdash z:\tau} 
\hfil
\custominference {\pat_1\vdash z:\tau \quad z \not\in\var(\pat_2)}{\pat_1 \, \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \not\in\var(\pat_1)}{\pat_1 \, \pat_2 \vdash z:\tau} 

\bigskip

\custominference {Q_1\vdash z:\tau \quad Q_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{Q_1 , Q_2 \vdash z:\tau}
\hfil
\custominference{Q_1\vdash z:\tau \quad z\not\in\var(Q_2) }{Q_1, Q_2 \vdash z:\tau}
\hfil
\custominference{Q_2\vdash z:\tau \quad z\not\in \var(Q_1) }{Q_1, Q_2 \vdash z:\tau}

\caption{Typing rules for the \gpml type system.}
\label{fig:type-system}
\end{figure*}
Does ChatGPT understand the TeX? (Click triangle to expand)

ChatGPT May 24 Version

Prompt

\begin{figure*}\centering

\newcommand{\custominference}[2]{$\displaystyle\inference{#1}{#2}$}

%\custominference{}{\nodelit{x,[:\ell]} \vdash x:\tnode}
\custominference{}{\nodelit{x} \vdash x:\tnode}
\hfil
\custominference{}{\nodelit{x:\ell} \vdash x:\tnode}
\hfil
%\custominference {}{\arclit{x, [:\ell]}\ \vdash x:\tedge}
\custominference {}{\arclit{x}\ \vdash x:\tedge}
\hfil
\custominference {}{\arclit{x :\ell}\ \vdash x:\tedge}
\hfil
\custominference{ x \notin \var(\pat)}{x=\restrictor\ \pat \vdash x:\tpath}
%\custominference{\pat \vdash x:\tau \quad p \notin \var(\pat)}{p=\pat \vdash p:\tpath,\ x:\tau}

\bigskip

\custominference {\pat\vdash z:\tau}{\pat\quantlit{n…m} \vdash z:\tlist(\tau)}
\hfil
\custominference {\pat\vdash z:\tau}{\restrictor \pat \vdash z:\tau}
%\hfil
%\custominference{\pat \vdash z:\tau \quad z\neq x}{x=\pat \vdash z:\tau}
\hfil
\custominference{\pat \vdash z:\tau \quad z\neq x }{ x= \restrictor \pat \vdash z:\tau}

\bigskip

\custominference {\pat\vdash x:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat \vdash x.a=c:\tbool} \hfil
\custominference {\pat\vdash x:\tau\quad \pat\vdash y:\tau’\quad\tau,\tau’\in\set{\tnode,\tedge}}{\pat \vdash x.a=y.b:\tbool}

\bigskip

\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta’:\tbool}{\pat \vdash \theta\wedge\theta’:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta’:\tbool}{\pat \vdash \theta\vee\theta’:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool}{\pat \vdash \neg\theta:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash z:\tau}{\pat\condlit{\theta} \vdash z:\tau}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tmaybe(\tau)}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\hfil
\custominference {\pat_1\vdash z:\tmaybe(\tau) \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad z \notin \var(\pat_2) }{\pat_1 + \pat_2 \vdash
z: \tau?}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \notin \var(\pat_1) }{\pat_1 + \pat_2 \vdash z: \tau?}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat_1 , \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad z \not\in\var(\pat_2)}{\pat_1 , \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \not\in\var(\pat_1)}{\pat_1 , \pat_2 \vdash z:\tau}

\bigskip

\custominference {Q_1\vdash z:\tau \quad Q_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{Q_1 , Q_2 \vdash z:\tau}
\hfil
\custominference{Q_1\vdash z:\tau \quad z\not\in\var(Q_2) }{Q_1, Q_2 \vdash z:\tau}
\hfil
\custominference{Q_2\vdash z:\tau \quad z\not\in \var(Q_1) }{Q_1, Q_2 \vdash z:\tau}

\caption{Typing rules for the \gpml type system.}
\label{fig:type-system}
\end{figure*}

Completion

The figure you provided presents a set of typing rules for the \gpml type system. These rules define how different patterns and expressions are typed in the system. Let’s go through each rule:

  1. The first two rules state that a node or an arc literal with or without a label is typed as a node or an edge, respectively.

  2. The third rule states that if a variable x is not bound in the pattern (\pat), then x can be restricted to a path and typed as a path (\tpath).

  3. The fourth rule states that if a pattern \pat is typed as z of type \tau, then the pattern \pat\quantlit{n..m} is typed as z of type List(\tau).

  4. The fifth and sixth rules deal with restricting patterns. If a pattern \pat is typed as z of type \tau, then the pattern \restrictor\ \pat is also typed as z of type \tau.

  5. The seventh rule states that if a pattern \pat is typed as x of type \tau, where \tau is either a node or an edge, then the expression x.a=c is typed as a Boolean (\tbool).

  6. The eighth rule states that if a pattern \pat is typed as x of type \tau and another pattern y is typed as y of type \tau', where both \tau and \tau' are either nodes or edges, then the expression x.a=y.b is typed as a Boolean (\tbool).

  7. The ninth, tenth, eleventh, and twelfth rules deal with Boolean operations. They state that if patterns or expressions \pat and \theta are typed as Boolean values, then the conjunction (\wedge), disjunction (\vee), negation (\neg), and conditional (\condlit{\theta}) expressions are also typed as Booleans.

  8. The thirteenth rule states that if patterns \pat_1 and \pat_2 are both typed as z of type \tau, then the pattern \pat_1 + \pat_2 is also typed as z of type \tau.

  9. The fourteenth, fifteenth, and sixteenth rules deal with patterns that include optional values. They specify how patterns with optional values are typed based on the presence or absence of the value.

  10. The seventeenth, eighteenth, and nineteenth rules deal with concatenating patterns and expressions. They specify the typing for the concatenation operator (,). If patterns or expressions \pat_1 and \pat_2 are typed as z of type \tau, then \pat_1 \, \pat_2 is also typed as z of type \tau. Additionally, if the variable z is not bound in either \pat_1 or \pat_2, then the typing remains the same.

  11. The twentieth, twenty-first, and twenty-second rules deal with concatenating queries (Q_1 and Q_2). They specify the typing for the concatenation operator (,). If queries Q_1 and Q_2 are both typed as z of type \tau, then Q_1 , Q_2 is also typed as z of type \tau. Additionally, if the variable z is not bound in either Q_1 or Q_2, then the typing remains the same.

These rules define the typing behavior of various patterns and expressions in the \gpml type system.


Personal note:

In the completion it notes The first two rules state but it really should be The first four rules state. The first rule is for a Node without a type, the second is for a Node with a type, the third is for an edge without a type and the fourth is for an edge with a type. Thus the number for the remainder of the rules is off but if you have a bit of understanding of how to read typing rules it is easy to match up the description with the typing rule.

PostScript version of second typing rule (Click triangle to expand)

This is from file 2210.16580v1.ps.gz which is downloaded by clicking Download PostScript on the Format selector page

727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6
b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
Does ChatGPT understand PostScript (Click triangle to expand)

Note: Only the PostScript for the second type rule is used as giving ChatGPT all of the PostScript for the typing rules would exceed the token limit.

Prompt

The following is a portion of PostScript, explain what you can.

 727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6
b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p

Completion

The portion of code you provided appears to be written in the PostScript programming language. PostScript is a page description language commonly used for printing and graphics.

Let’s analyze the code line by line:

727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6 b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
  1. 727 190 654 5 v: This line consists of four numeric values (727, 190, 654, and 5) followed by the operator v. These values are likely coordinates or parameters related to positioning or transformations. Without more context, it’s difficult to determine their exact purpose.

  2. 290 w: This line contains a numeric value (290) followed by the operator w. Again, without more context, it’s unclear what this line represents.

  3. FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6 b FN(\272)25 b()18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p`: This is a series of instructions involving various operators and string literals. Here’s a breakdown of the individual components:

    • FN(\271): This is likely a font-related instruction, specifying a particular font with a parameter \271.
    • q: This operator saves the current graphics state.
    • FH(G)27 b: This could be another font-related instruction specifying a different font (G) with a parameter 27. The b operator might be related to setting the font weight or style.
    • FF(:)21 b: This is likely a command to print or display a colon (:) using the current font. The 21 b might be related to font size or position.
    • FH(\022)6 b: This could be another font-related instruction, specifying a different font with a non-printable character \022. The 6 b might be related to font size or position.
    • FN(\272)25 b: This could be another font-related instruction specifying a different font with a parameter \272. The 25 b might be related to font size or position.
    • ()18 b: This is likely a command to print or display an opening parenthesis (() using the current font. The 18 b` might be related to font size or position.
    • FH(G)28 b: This could be another font-related instruction specifying a different font (G) with a parameter 28. The b operator might be related to setting the font weight or style.
    • FF(:)21 b: Similar to the previous occurrence, this is likely a command to print or display a colon (:) using the current font. The 21 b might be related to font size or position.
    • Fw(No)r(de)p: This appears to be a command to print or display the string “No” using a specific font (w). The subsequent characters r(de)p are not immediately clear without additional context.

Overall, without further context or knowledge of the surrounding PostScript code, it’s challenging to provide a definitive interpretation or explanation of the code.


Personal note.

Having parsed PostScript by hand know that Fw(No)r(de) is actually the word Node in the second typing rule. While I did not expect ChatGPT to understand everything about the PostScript snippet, was surprised that it did not recognize that as the word Node.

2 Likes

For those of us without experience with typing systems, can you tell us how well it understood the TeX code?

As with most responses by ChatGPT it depends upon your level of knowledge. For a beginner they do very good but for a expert they would probably find more wrong than correct. But as we know sometimes just getting over the initial hump for learning when there is nothing and having some info as demonstrated in the example will get someone interested enough to keep going.

A much better example to answer your question would be to use the typing rules for a functional programming language like F# or Haskell where the typing rules can be demonstrated also with code and the types included in the function signature.

If I find such I will give it a try and if it works will post as a reply. :slightly_smiling_face:


Update on the search.

I spent about an hour after posting that looking and the language with the best typing rules published was for ML but they were to complex for the beginner.

However typing rules for Lambda Calculus are very easy to understand (ref) but the problem is that many do not understand Lambda Calculus. Typed Lambda Calculus might be better but then the typing rules are a bit more complex.

2 Likes

I agree with Eric’s assesment,

Although I do have some reservations about the tex source, in the example used there’s a single \newcommand defined, but the code itself contains several different custom commands, stuff like \pat and tbool

These commands are most likely defined elsewhere in the document source, this issue might be solved with an extended context window.

But we’re out of luck, if theses commands are defined in non-standard latex packages.

1 Like

Yes this is a valid point that must be addressed to pass a subset of the TeX or LaTeX to ChatGPT.

Follow up (Click triangle to expand)

In order to work with a larger body of TeX switching to

The book

  • is open source on GitHub
  • is free
  • needs little to no prior knowledge to understand
  • has various representations of technical expressions in TeX that can been viewed in human form in the PDF format.

For this truth table

image

the TeX is

https://github.com/rzach/forallx-yyc/blob/dd26ca3043c9288fc548836e136f2416ed70d93d/forallx-yyc-truthtables.tex#L21-L26

Giving that to ChatGPT as a new conversation

Prompt

\begin{tabular}{c|c}
\metav{A} & \enot\metav{A}\\
\hline
T & F\\
F & T 
\end{tabular}

Completion

The table you provided represents the truth values of a formula and its negation. Let’s break it down:

  • \metav{A}: This is a placeholder representing a statement or proposition. It can take on the truth values True (T) or False (F).

  • \enot\metav{A}: This is the negation of \metav{A}. It represents the opposite truth value of \metav{A}. If \metav{A} is True, then \enot\metav{A} is False, and vice versa.

The table shows the possible truth values for \metav{A} and \enot\metav{A}. When \metav{A} is True, \enot\metav{A} is False. When \metav{A} is False, \enot\metav{A} is True. This is consistent with the logical operation of negation.

So, the table reflects the basic principles of negation in classical logic.

Personal note:
Obviously ChatGPT can understand the TeX but some of the TeX needs to be pretty printed.


Here is a refined prompt in a new conversation that addresses the evident problems in the first completion.

Prompt

LaTeX commands and meaning
\enot - `not` symbol
\metav - meta variable

```LaTeX
\begin{tabular}{c|c}
\metav{A} & \enot\metav{A}\\
\hline
T & F\\
F & T 
\end{tabular}
```

* format for readability by a human
* convert LaTeX  commands to symbols
* do not explain LaTeX 
* do not explain Markdown or other formatting
* explain expression for a beginner

Completion

A | ¬A
------
T | F
F | T

Explanation for a beginner:
This table represents the truth values of a statement A and its negation (¬A). The variable A can take on two truth values: True (T) or False (F). The table shows all possible combinations of truth values for A and its negation ¬A. When A is True, ¬A is False, and vice versa.


Since TeX is actually code with macros, variables and state, such information needs to be passed to ChatGPT to correctly process and/or pretty print the TeX.

While smaller examples of TeX can be passed in a single prompt, there will be TeX that is larger than the token limit and so can not be passed in. Possible solutions would be to use existing code that parses TeX and maybe even runs part of the TeX to establish the state at the time, or to train an LLM so that it can infer reasonable values without seeing all of the TeX.

@elmstedt have you tried the InftyReader that @N2U posted above? What can you tell us about it?

I believe it was @Foxabilo who brought infreader to our attention, any thoughts on this?

As mentioned by Eric, optical character recognition of complex equations seems to be an issue many have struggled with.

1 Like