Is there any open source software that can interpret/decipher type rules in a PDF?

As we know there are many ChatGPT plugins for conversing with PDFs but AFAIK they only extract the text and do not interpret/decipher images such as these typing rules, e.g.

From: “A Tutorial Implementation of a Dependently Typed Lambda Calculus” By Andres Löh, Conor McBride, W. Swierstra, Published 1 April 2010

DOI: 10.3233/FI-2010-304

Is there any open source software that can interpret/decipher type rules in a PDF?

Please only give answers if you know it to be correct, no speculative answers.

If you have questions about this send me a private message.


ChatOCR did not work (Click triangle to see details)

Tested above image using ChatOCR here are the details.

Prompt

Extract type rules from this image

https://global.discourse-cdn.com/openai1/original/3X/a/9/a9cfdda59ae2356a5fe10273dba17f42646a0bb3.png

Completion

Used ChatOCR

Request to ChatOCR

{
  "url": "https://global.discourse-cdn.com/openai1/original/3X/a/9/a9cfdda59ae2356a5fe10273dba17f42646a0bb3.png"
}

Response from ChatOCR

{
  "content": "OCR job is in progress.",
  "display_image": "![Image](https://global.discourse-cdn.com/openai1/original/3X/a/9/a9cfdda59ae2356a5fe10273dba17f42646a0bb3.png)",
  "has_more": true,
  "job_id": "31b8328cc9f647611c1158f21db94c9803a4f189b3e03c00b802d33c3d096791",
  "next_token": null
}

Used ChatOCR

Request to ChatOCR

{
  "job_id": "31b8328cc9f647611c1158f21db94c9803a4f189b3e03c00b802d33c3d096791",
  "has_more": true
}

Response from ChatOCR

{
  "System Message": "The following text was extracted from the file you uploaded. [The full file is available here](http://staf.ai/api/files?linesId=31b8328cc9f647611c1158f21db94c9803a4f189b3e03c00b802d33c3d096791.txt).\n\n How did we do? [We would love to hear your feedback](https://talktomerlin.com/s/chatocr).",
  "content": {
    "Extracted Text": "```markdown\n\n+e llv\ne::T\nU\nV\nX\nX\nAx\nV v[x\ne'llv\nelln\ne'\nU\nv\nell V\ne e'\nv\nAx\ne ll xx\nV\nFigure 1. Evaluation in A\n```"
  },
  "has_more": false
}

I’m sorry, but the text extracted from the image doesn’t seem to be accurate or meaningful. This could be due to the quality of the image, the font used, or the complexity of the layout. OCR (Optical Character Recognition) technology can sometimes struggle with these factors.

If you have a clearer image or if the information is available in a text format, I would be able to assist you better.

Contents of file https://d1mizjfdzgkqa.cloudfront.net/31b8328cc9f647611c1158f21db94c9803a4f189b3e03c00b802d33c3d096791.txt

+e llv
e::T
U
V
X
X
Ax
V v[x
e'llv
elln
e'
U
v
ell V
e e'
v
Ax
e ll xx
V
Figure 1. Evaluation in A

While it did extract the text, it was not in any manner that could be converted for use as typing rules.


In order to foster a greater sense of fairness within the community, created a separate topic where users can openly discuss extracting technical expressions from PDFs for such purposes as inclusion in a ChatGPT plugin or as an agent for use with such as Longchain. It appears that many users are grappling with this issue and are unsure of where to begin or which direction to pursue. Providing a designated space for this discussion would greatly assist those seeking guidance.

A post was merged into an existing topic: How to extract technical expressions from PDFs so that they can be understood by AI?