Basically I am loking to extract questions from the PDF which has Figure, question and 4 options. All of them can have mathematical figures. And PDF page can contain multiple columnar layers where these questions can be.
Return type has to be JSON for each question. Image can be encoded as base64.
Can you give a more specific example with an image and which parts you would like selected.
The reason I ask is that if one has the PDF as created from tool like a LaTeX editor and not an image capture then it would be much more accurate to access the image information from the PDF file then to use a nondeterministic means using an image.
and see what it produces. Years ago I was looking around for such a tool and this was one was by far the most accurate and with the best conversion to a useable format.
In the meantime I will play around with your posted image to see if I can create what you seek. I will use ChatGPT but the prompt is more likely what you need out of the process, no promises.
I started with ChatGPT Windows app, using the 4o model and basically your need as a prompt. While the extracted text into JSON was great, getting just the image was not so good. The first time it did not recognize that there was no image with the question and took the entire second question as the image. Also the image of the first question had the text with it. Trying to get just the image from the first question was not working as well as desired. For each of these tries Python code was created to convert the image to text and extract the image part but if you want to use this process repeatedly with success using the API you might need to do some manual intervention, spend hours to days tweaking the prompt and Python code, or possibly train a model to learn what you seek.
Can OpenAI extract this info?
Short answer, yes but not to the quality and efficiency you might need.
Was not able to share the conversation as that is not allowed with images in the conversation.
OpenAI models can take in text but cannot pull out structured questions, figures, and options from multi-column PDFs. You will need to use a PDF parsing library (like PyPDF2 or PDFPlumber) to extract text and images, encode images in base64, and structure the data into JSON. OpenAI can help refine or organize extracted data.