Why does ChatGPT perform better than the API for OCR tasks?

LUNY · July 9, 2025, 1:37pm

Hello everyone,
So i got some pdfs containing tables that i want to extract, those tables are hard to extract using libraries like pypdf. So i thought about using OCR, i used Chatgpt (4o) and it performed well. So i tried to use the API and do the same:




def encode_image(pil_image):
    buffered = BytesIO()
    pil_image.save(buffered, format="JPEG")
    return base64.b64encode(buffered.getvalue()).decode("utf-8")

def gpt_extract_text_from_image(image):
    base64_image = encode_image(image)

    response = client.responses.create(
        model="gpt-4o",
        input=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": (
                            "Extract the full table from this image with all data, including headers.\n"
                            "You are free to format the output as you see fit for clarity. Just make sure all data is readable and aligned correctly."
                        ),
                    },

                    {
                        "type": "input_image",
                        "image_url": f"data:image/jpeg;base64,{base64_image}",
                    },
                ],
            }
        ],
    )
    return response.output_text

pdf_path = "mypdf.pdf"
images = convert_from_path(pdf_path, dpi=700)
first_image = images[1]  

extracted_text = gpt_extract_text_from_image(first_image)

print(extracted_text)

Except that the output contained a lot of errors, I tried making the resolution higher (I used a DPI of 1200, which results in a very high resolution). I dont understand the difference between chatgpt and the API ? And can you suggest any other alternatives ?

_j · July 9, 2025, 2:25pm

Understand the limitations of vision:

The image size will be reduced so that the shortest side is maximum 768 pixels.

This makes a page have a resolution along the lines of 768x990, under 90 DPI.

You may have better results in slicing pages. Send pieces with a bit of overlap that are 1024x512, wider. Or even up to 1536x512, matching the resolution of underlying tiling. Then also note with interspersed “type:text” along the lines “page 2, section 3” for understanding.

Isolating tables to appear in a single image is difficult going in blind, and the AI has vision limitations in unraveling tables, keys, and legends.

LUNY · July 11, 2025, 12:48pm

Hello,
Thank you so much for your remark, i did not know it resizes automaticaly the image when the dimensions are too big …
For a document page, I took multiple pictures with overlap and i adjusted the DPI to be higher, then i did a second pass using the LLM to clean the extracted content (remove the overlap), and I got a table 100% identical to the one in the pdf.

Topic		Replies	Views
OCR using API for text extraction API api	9	26463	December 18, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	4142	December 6, 2023
GPT-4o PDF upload vs API vision API	3	10345	May 17, 2024
Can an assistant help me with OCR? API gpt-4	7	4065	June 6, 2024
Process scanned pdfs through api API gpt-4 , chatgpt , api , pdf , ocr	3	1400	January 10, 2026

Why does ChatGPT perform better than the API for OCR tasks?

Related topics