How to extract technical expressions from PDFs so that they can be understood by AI?

Here is an example of a PDF with typing rules available on Arxiv.

“GPC: A Pattern Calculus for Property Graphs” by Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, Domagoj Vrgoč

It is available in a few different formats.

The typing rules being demonstrated are in Figure 2.

image

Text source of typing rules (Click triangle to expand)

This is from file 2210.16580 which is downloaded by clicking Download source on the Format selector page

\begin{figure*}\centering

\newcommand{\custominference}[2]{$\displaystyle\inference{#1}{#2}$}

%\custominference{}{\nodelit{x\,[:\ell]} \vdash x:\tnode} 
\custominference{}{\nodelit{x} \vdash x:\tnode}
\hfil
\custominference{}{\nodelit{x:\ell} \vdash x:\tnode}
\hfil
%\custominference {}{\arclit{x\, [:\ell]}\  \vdash x:\tedge} 
\custominference {}{\arclit{x}\  \vdash x:\tedge} 
\hfil
\custominference {}{\arclit{x :\ell}\  \vdash x:\tedge} 
\hfil
\custominference{ x \notin \var(\pat)}{x=\restrictor\ \pat \vdash x:\tpath}
%\custominference{\pat \vdash x:\tau \quad p \notin \var(\pat)}{p=\pat \vdash p:\tpath,\ x:\tau}


\bigskip

\custominference {\pat\vdash z:\tau}{\pat\quantlit{n..m} \vdash z:\tlist(\tau)} 
\hfil
\custominference {\pat\vdash z:\tau}{\restrictor \pat \vdash z:\tau}
%\hfil
%\custominference{\pat \vdash z:\tau \quad z\neq x}{x=\pat \vdash  z:\tau}
\hfil
\custominference{\pat \vdash z:\tau \quad z\neq x }{ x= \restrictor \pat \vdash  z:\tau}

\bigskip

\custominference {\pat\vdash x:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat \vdash x.a=c:\tbool} \hfil
\custominference {\pat\vdash x:\tau\quad \pat\vdash y:\tau'\quad\tau,\tau'\in\set{\tnode,\tedge}}{\pat \vdash x.a=y.b:\tbool} 

\bigskip

\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta':\tbool}{\pat \vdash \theta\wedge\theta':\tbool} 
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta':\tbool}{\pat \vdash \theta\vee\theta':\tbool} 
\hfil
\custominference {\pat\vdash \theta:\tbool}{\pat \vdash \neg\theta:\tbool} 
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash z:\tau}{\pat\condlit{\theta} \vdash z:\tau} 

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tmaybe(\tau)}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\hfil
\custominference {\pat_1\vdash z:\tmaybe(\tau) \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad  z \notin \var(\pat_2) }{\pat_1 + \pat_2 \vdash 
z: \tau?} 
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \notin \var(\pat_1) }{\pat_1 + \pat_2 \vdash z: \tau?}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat_1 \, \pat_2 \vdash z:\tau} 
\hfil
\custominference {\pat_1\vdash z:\tau \quad z \not\in\var(\pat_2)}{\pat_1 \, \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \not\in\var(\pat_1)}{\pat_1 \, \pat_2 \vdash z:\tau} 

\bigskip

\custominference {Q_1\vdash z:\tau \quad Q_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{Q_1 , Q_2 \vdash z:\tau}
\hfil
\custominference{Q_1\vdash z:\tau \quad z\not\in\var(Q_2) }{Q_1, Q_2 \vdash z:\tau}
\hfil
\custominference{Q_2\vdash z:\tau \quad z\not\in \var(Q_1) }{Q_1, Q_2 \vdash z:\tau}

\caption{Typing rules for the \gpml type system.}
\label{fig:type-system}
\end{figure*}
Does ChatGPT understand the TeX? (Click triangle to expand)

ChatGPT May 24 Version

Prompt

\begin{figure*}\centering

\newcommand{\custominference}[2]{$\displaystyle\inference{#1}{#2}$}

%\custominference{}{\nodelit{x,[:\ell]} \vdash x:\tnode}
\custominference{}{\nodelit{x} \vdash x:\tnode}
\hfil
\custominference{}{\nodelit{x:\ell} \vdash x:\tnode}
\hfil
%\custominference {}{\arclit{x, [:\ell]}\ \vdash x:\tedge}
\custominference {}{\arclit{x}\ \vdash x:\tedge}
\hfil
\custominference {}{\arclit{x :\ell}\ \vdash x:\tedge}
\hfil
\custominference{ x \notin \var(\pat)}{x=\restrictor\ \pat \vdash x:\tpath}
%\custominference{\pat \vdash x:\tau \quad p \notin \var(\pat)}{p=\pat \vdash p:\tpath,\ x:\tau}

\bigskip

\custominference {\pat\vdash z:\tau}{\pat\quantlit{n…m} \vdash z:\tlist(\tau)}
\hfil
\custominference {\pat\vdash z:\tau}{\restrictor \pat \vdash z:\tau}
%\hfil
%\custominference{\pat \vdash z:\tau \quad z\neq x}{x=\pat \vdash z:\tau}
\hfil
\custominference{\pat \vdash z:\tau \quad z\neq x }{ x= \restrictor \pat \vdash z:\tau}

\bigskip

\custominference {\pat\vdash x:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat \vdash x.a=c:\tbool} \hfil
\custominference {\pat\vdash x:\tau\quad \pat\vdash y:\tau’\quad\tau,\tau’\in\set{\tnode,\tedge}}{\pat \vdash x.a=y.b:\tbool}

\bigskip

\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta’:\tbool}{\pat \vdash \theta\wedge\theta’:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash\theta’:\tbool}{\pat \vdash \theta\vee\theta’:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool}{\pat \vdash \neg\theta:\tbool}
\hfil
\custominference {\pat\vdash \theta:\tbool \quad \pat\vdash z:\tau}{\pat\condlit{\theta} \vdash z:\tau}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tmaybe(\tau)}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}
\hfil
\custominference {\pat_1\vdash z:\tmaybe(\tau) \quad \pat_2\vdash z:\tau}{\pat_1 + \pat_2 \vdash z:\tmaybe(\tau)}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad z \notin \var(\pat_2) }{\pat_1 + \pat_2 \vdash
z: \tau?}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \notin \var(\pat_1) }{\pat_1 + \pat_2 \vdash z: \tau?}

\bigskip

\custominference {\pat_1\vdash z:\tau \quad \pat_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{\pat_1 , \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_1\vdash z:\tau \quad z \not\in\var(\pat_2)}{\pat_1 , \pat_2 \vdash z:\tau}
\hfil
\custominference {\pat_2\vdash z:\tau \quad z \not\in\var(\pat_1)}{\pat_1 , \pat_2 \vdash z:\tau}

\bigskip

\custominference {Q_1\vdash z:\tau \quad Q_2\vdash z:\tau \quad \tau\in\set{\tnode,\tedge}}{Q_1 , Q_2 \vdash z:\tau}
\hfil
\custominference{Q_1\vdash z:\tau \quad z\not\in\var(Q_2) }{Q_1, Q_2 \vdash z:\tau}
\hfil
\custominference{Q_2\vdash z:\tau \quad z\not\in \var(Q_1) }{Q_1, Q_2 \vdash z:\tau}

\caption{Typing rules for the \gpml type system.}
\label{fig:type-system}
\end{figure*}

Completion

The figure you provided presents a set of typing rules for the \gpml type system. These rules define how different patterns and expressions are typed in the system. Let’s go through each rule:

  1. The first two rules state that a node or an arc literal with or without a label is typed as a node or an edge, respectively.

  2. The third rule states that if a variable x is not bound in the pattern (\pat), then x can be restricted to a path and typed as a path (\tpath).

  3. The fourth rule states that if a pattern \pat is typed as z of type \tau, then the pattern \pat\quantlit{n..m} is typed as z of type List(\tau).

  4. The fifth and sixth rules deal with restricting patterns. If a pattern \pat is typed as z of type \tau, then the pattern \restrictor\ \pat is also typed as z of type \tau.

  5. The seventh rule states that if a pattern \pat is typed as x of type \tau, where \tau is either a node or an edge, then the expression x.a=c is typed as a Boolean (\tbool).

  6. The eighth rule states that if a pattern \pat is typed as x of type \tau and another pattern y is typed as y of type \tau', where both \tau and \tau' are either nodes or edges, then the expression x.a=y.b is typed as a Boolean (\tbool).

  7. The ninth, tenth, eleventh, and twelfth rules deal with Boolean operations. They state that if patterns or expressions \pat and \theta are typed as Boolean values, then the conjunction (\wedge), disjunction (\vee), negation (\neg), and conditional (\condlit{\theta}) expressions are also typed as Booleans.

  8. The thirteenth rule states that if patterns \pat_1 and \pat_2 are both typed as z of type \tau, then the pattern \pat_1 + \pat_2 is also typed as z of type \tau.

  9. The fourteenth, fifteenth, and sixteenth rules deal with patterns that include optional values. They specify how patterns with optional values are typed based on the presence or absence of the value.

  10. The seventeenth, eighteenth, and nineteenth rules deal with concatenating patterns and expressions. They specify the typing for the concatenation operator (,). If patterns or expressions \pat_1 and \pat_2 are typed as z of type \tau, then \pat_1 \, \pat_2 is also typed as z of type \tau. Additionally, if the variable z is not bound in either \pat_1 or \pat_2, then the typing remains the same.

  11. The twentieth, twenty-first, and twenty-second rules deal with concatenating queries (Q_1 and Q_2). They specify the typing for the concatenation operator (,). If queries Q_1 and Q_2 are both typed as z of type \tau, then Q_1 , Q_2 is also typed as z of type \tau. Additionally, if the variable z is not bound in either Q_1 or Q_2, then the typing remains the same.

These rules define the typing behavior of various patterns and expressions in the \gpml type system.


Personal note:

In the completion it notes The first two rules state but it really should be The first four rules state. The first rule is for a Node without a type, the second is for a Node with a type, the third is for an edge without a type and the fourth is for an edge with a type. Thus the number for the remainder of the rules is off but if you have a bit of understanding of how to read typing rules it is easy to match up the description with the typing rule.

PostScript version of second typing rule (Click triangle to expand)

This is from file 2210.16580v1.ps.gz which is downloaded by clicking Download PostScript on the Format selector page

727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6
b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
Does ChatGPT understand PostScript (Click triangle to expand)

Note: Only the PostScript for the second type rule is used as giving ChatGPT all of the PostScript for the typing rules would exceed the token limit.

Prompt

The following is a portion of PostScript, explain what you can.

 727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6
b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p

Completion

The portion of code you provided appears to be written in the PostScript programming language. PostScript is a page description language commonly used for printing and graphics.

Let’s analyze the code line by line:

727 190 654 5 v 290 w FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6 b FN(\272)25 b(`)18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p
  1. 727 190 654 5 v: This line consists of four numeric values (727, 190, 654, and 5) followed by the operator v. These values are likely coordinates or parameters related to positioning or transformations. Without more context, it’s difficult to determine their exact purpose.

  2. 290 w: This line contains a numeric value (290) followed by the operator w. Again, without more context, it’s unclear what this line represents.

  3. FN(\271)q FH(G)27 b FF(:)21 b FH(\022)6 b FN(\272)25 b()18 b FH(G)28 b FF(:)21 b Fw(No)r(de)p`: This is a series of instructions involving various operators and string literals. Here’s a breakdown of the individual components:

    • FN(\271): This is likely a font-related instruction, specifying a particular font with a parameter \271.
    • q: This operator saves the current graphics state.
    • FH(G)27 b: This could be another font-related instruction specifying a different font (G) with a parameter 27. The b operator might be related to setting the font weight or style.
    • FF(:)21 b: This is likely a command to print or display a colon (:) using the current font. The 21 b might be related to font size or position.
    • FH(\022)6 b: This could be another font-related instruction, specifying a different font with a non-printable character \022. The 6 b might be related to font size or position.
    • FN(\272)25 b: This could be another font-related instruction specifying a different font with a parameter \272. The 25 b might be related to font size or position.
    • ()18 b: This is likely a command to print or display an opening parenthesis (() using the current font. The 18 b` might be related to font size or position.
    • FH(G)28 b: This could be another font-related instruction specifying a different font (G) with a parameter 28. The b operator might be related to setting the font weight or style.
    • FF(:)21 b: Similar to the previous occurrence, this is likely a command to print or display a colon (:) using the current font. The 21 b might be related to font size or position.
    • Fw(No)r(de)p: This appears to be a command to print or display the string “No” using a specific font (w). The subsequent characters r(de)p are not immediately clear without additional context.

Overall, without further context or knowledge of the surrounding PostScript code, it’s challenging to provide a definitive interpretation or explanation of the code.


Personal note.

Having parsed PostScript by hand know that Fw(No)r(de) is actually the word Node in the second typing rule. While I did not expect ChatGPT to understand everything about the PostScript snippet, was surprised that it did not recognize that as the word Node.

2 Likes