Handling Multi-Page PDF Parsing with GPT-4o-mini: Image Size Limit Issues

shubhdoshi21 · October 4, 2024, 6:24pm

Hi everyone,

I’m trying to parse a PDF travel ticket of 5 to 6 pages using the OpenAI GPT-4o-mini model. My approach involves converting each page of the PDF into images, then combining these page images into a single image. I send this combined image along with a prompt to the model for parsing.

However, when the PDF has many pages, the combined image height becomes too large and exceeds the maximum size supported by the API, resulting in an “invalid image” error.

Has anyone faced a similar issue or have suggestions on how to

handle multi-page PDFs without exceeding the image size limit?
Parse data accurately from multi-page pdf maintaining the context?

Looking forward to any insights!

anon10827405 · October 4, 2024, 6:28pm

Hey, I’ve done something similar.

It’s tough. It’s a massive shame that OpenAI doesn’t allow PDFs (competitors do). So the first step is to try and first extract the text content from the PDF and then pass it to the model along with the now semantic-destroyed image file.

Then, you need to determine if it’s possible for the model to work in parallel for each page. Or if the model is even needed (an OCR may work here, I’d recommend checking out OCR2.0)

Since each model works on each page in parallel you can then perform a final synthesis stage where the model combines all the gathered information together to form whatever expected output (I imagine you’re expecting something structured)

With an appropriate “approval” system that categorizes and sorts these processed documents you can then go deeper and start to perform classification steps on the pages to determine if they are even useful. In my case a lot of documents come with terms & conditions and stuff and it’s nasty and noisy so I just eliminate those pages, leading to cheaper costs and more accurate results.

pt3 · November 29, 2024, 12:38am

Hi there,

I was looking for something similar, but following your suggestions and setting it up myself was way too much work. I just found the tool Parble and it is exactly what I needed. Might help you too!

Topic		Replies	Views
Creating AI Based Document Splitter API pdf , multimodal , gpt-4o	3	632	August 28, 2024
What is the best way to parse a PDF file with ChatGPT? API	9	50018	November 16, 2024
Best Way to Process 2500 large PDFs for Specific Data Extraction? API chatgpt , api , langchain , pdf	2	1903	November 3, 2024
Using GPT-4-Turbo to fill out complex PDF forms API image-reading , gpt-4-vision , gpt-4-turbo	6	4464	February 28, 2025
Train GPT for analyze large number of pdf Community chatgpt	8	2142	August 2, 2024

Handling Multi-Page PDF Parsing with GPT-4o-mini: Image Size Limit Issues

Related topics