Can API cut images (such as mathematical figures) from the PDFs?

amitg.usd · December 2, 2024, 7:31pm

Basically I am loking to extract questions from the PDF which has Figure, question and 4 options. All of them can have mathematical figures. And PDF page can contain multiple columnar layers where these questions can be.

Return type has to be JSON for each question. Image can be encoded as base64.

Can OpenAI extract this info?

EricGT · December 2, 2024, 8:01pm

Welcome to the forum!

Can you give a more specific example with an image and which parts you would like selected.

The reason I ask is that if one has the PDF as created from tool like a LaTeX editor and not an image capture then it would be much more accurate to access the image information from the PDF file then to use a nondeterministic means using an image.

amitg.usd · December 2, 2024, 8:20pm

This should extract a JSON List which should have 2 items. Each should have a figure, question and 4 options.

amitg.usd · December 2, 2024, 8:21pm

PDFs are more likely scanned books and no LaTex editor

EricGT · December 2, 2024, 8:28pm

One option worth trying is to use a tool like

and see what it produces. Years ago I was looking around for such a tool and this was one was by far the most accurate and with the best conversion to a useable format.

In the meantime I will play around with your posted image to see if I can create what you seek. I will use ChatGPT but the prompt is more likely what you need out of the process, no promises.

icdev2dev · December 2, 2024, 8:33pm

This should be relatively straightforward to do.

You would do one page at a time ; regardless of whether the content overflows over the second page.

You would likely use

a trained model (i.e. detectron2 or similar) to extract the position of the image itself.
gpt-40-mini for text extaction with appropriate prompting.

EricGT · December 2, 2024, 8:53pm

I started with ChatGPT Windows app, using the 4o model and basically your need as a prompt. While the extracted text into JSON was great, getting just the image was not so good. The first time it did not recognize that there was no image with the question and took the entire second question as the image. Also the image of the first question had the text with it. Trying to get just the image from the first question was not working as well as desired. For each of these tries Python code was created to convert the image to text and extract the image part but if you want to use this process repeatedly with success using the API you might need to do some manual intervention, spend hours to days tweaking the prompt and Python code, or possibly train a model to learn what you seek.

Can OpenAI extract this info?

Short answer, yes but not to the quality and efficiency you might need.

Was not able to share the conversation as that is not allowed with images in the conversation.

shafique1 · December 3, 2024, 5:59am

OpenAI models can take in text but cannot pull out structured questions, figures, and options from multi-column PDFs. You will need to use a PDF parsing library (like PyPDF2 or PDFPlumber) to extract text and images, encode images in base64, and structure the data into JSON. OpenAI can help refine or organize extracted data.

Topic		Replies	Views
GPT-4 API for Educational Application API gpt-4 , chatgpt	2	1452	January 24, 2025
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3814	December 6, 2023
Scanned pdf with API and ask questions API chatgpt , api	3	1333	October 15, 2024
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	9	7087	October 14, 2024
Question about extracting images from files with GPT4o API gpt-4	0	2459	May 20, 2024

Can API cut images (such as mathematical figures) from the PDFs?

Related topics