GPT-4o-vision for extraction of complex tables

piam22 · March 8, 2025, 2:37pm

I am trying to extract complex tables from pdf as markdown using gpt-4o vision preview version.
The model does a good job in extracting simple tables but fails to extract complex tables with grouped columns and rows.

Could anyone suggest well defined prompts for gpt-4o vision to extract complex tables.

Note As of now I do not want to use any hugging face OCR models.

Here is the prompt I am using.

messages = [

                    {"role": "system", "content": (

                        f"""

            You are an OCR-like data extraction tool designed to extract structured tables from images and convert them into Markdown format. The tables may contain grouped column headers and merged cells.

            You are expert in indentifying clear separation line for grouped columns and grouped rows in the input image.

            ### **Instructions for Table Extraction and Formatting:**

            1. **Identify Table Components**

               - Extract all headers, sub-headers, and data rows from the table image.

               - Preserve the **logical hierarchy** of the table.

               - If a column header spans multiple columns, ensure it is **repeated for each column** while maintaining clarity.

            2. **Handling Merged and Grouped Headers**

               - If a header spans multiple columns **repeat the header name across all relevant columns**. 

               - For sub-groups under the same main header ensure they appear directly under their respective sections.

               - If a **cell spans multiple rows**, repeat its value across rows to maintain table structure.

            3. **Markdown Table Formatting**

               - Use the `|` character for table column separation.

               - Use `---` for column headers to differentiate them from data rows.

               - **Ensure proper alignment** and avoid breaking the Markdown table format.

            4. **Handling Multi-line Content**

               - If a cell contains multiple lines, use `<br>` to separate them (e.g., `"100 (99.58) <br> (0.42)"`).

               - If a section of the table is missing values, fill the cell with `"null"` rather than omitting it.

            5. **Edge Cases and Special Considerations**

               - Do not invent or interpolate data.

               - If any data is unreadable, output `"null"` instead of leaving it blank.

               - If special symbols (`|`, `_`, `*`) appear, escape them properly to avoid breaking Markdown formatting.

            ### **Your Task:**

            Extract tables from the provided image following these rules and output the result in **Markdown table format** with correctly repeated grouped headers. Once you extract 

            the markdown you again iterate Step 1 to Step 5 and compare with the input base 64 image to check if you have not missed any column names

            or data and its in proper **Markdown format, so that if you convert that **Markdown to pdf you get original table back.

        """


                    )},

                    {"role": "user", "content": [

                        {"type": "text", "text": (

                           "Extract only the table from the provided image and return it in Markdown format. "

                            "Ensure the table maintains correct row and column alignments, including grouped column headers. "

                            "Use the pipe (`|`) format and ensure proper spacing for readability. "

                            "Do not include any additional text, explanations, or formatting beyond the extracted table."

                        )}

                    ]}

                ]
 
        messages[1]["content"].append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded}"}})

Topic		Replies	Views
GPT-4-vision extraction of tables with branched rows/vertically-merged cells Prompting gpt-4-vision	9	2795	March 8, 2025
Data points in tables and charts in images Prompting gpt-4	7	2184	April 17, 2025
Using images as context in prompt Prompting gpt-4	5	3847	April 29, 2024
Table extraction using langchain and gpt3.5 or 4o API gpt-4 , gpt-35-turbo , chatgpt , api , langchain	0	1106	August 30, 2024
How to feed Table data into OpenAPI GPT model for creating my own customised chatGPT for the workplace? API	18	14219	June 3, 2024

GPT-4o-vision for extraction of complex tables

Related topics