Limitations of GPT-4V's high res tiling process?

mwp897 · April 9, 2024, 10:47pm

In using GPT-4V for tables that are significantly bigger than 512x512 in high res mode, I’m finding pretty frequent row and column confusion, where the right value is extracted in the wrong row/column. However when I condense the image to something closer to 512x512 I get great performance, as long as the text hasn’t been rescaled to the extent that it’s less legible. So I’m wondering if this is a limitation of the tiling process.

Is it possible to get more details on the implementation for this tiling? My concern is that for tables in particular, long range connections are important. If a particular value and the corresponding row/column header appear in separate tiles and the model’s seeing these tiles independently, it’s easy to see how this could lead to row/column confusion (depending on the exact implementation). I think this is especially true when the low res copy has been downscaled to the point that it’s not legible.

_j · April 9, 2024, 11:07pm

There’s an alternate way of sending images that are larger and untiled…

user_standard_image_message = [
{"role": "user", "content": [
    user_text_str,
    {"image": base64_image1},  # no detail settings required
    {"image": base64_image2},
    ]
}]

It is higher resolution than 512x512, because things like text that can’t be seen otherwise at “detail: low” can be resolved. No “url” download, only data.

There is something interesting about the way vision works that can be revealed, then, its inability to “see” more than a certain amount, perhaps due to attention layer limits. Send a whole page to that method, and it starts to hallucinate contents after a paragraph or two of verbatim OCR.

Topic		Replies	Views
Inconsistencies in Image Analysis with GPT-4o-mini Using Low Detail API gpt-4o-mini	1	56	September 10, 2024
How Does the GPT-4V API deal with large Images? API gpt-4 , gpt-4-vision	0	907	January 22, 2024
GPT-4v preview limiting batch requests? API batching , gpt-4-vision	5	1227	December 31, 2023
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	2957	December 6, 2023
Why detail low-res and high-res not the same on 512x512 images? API gpt-4-vision	1	1387	April 10, 2024

Limitations of GPT-4V's high res tiling process?

Related Topics