GPT-4 Vision Pixel Limitations

gstoyanov · January 19, 2024, 9:34am

I am trying to create a flow, that is using some frames extracted from a video to create a description of the said video, and I was wondering a couple of things:

What is the maximum number of pixels, the gpt-4-vision-preview model can handle in a single call? As I have one keymap image, with 320 px width and 51K px height, where all the frames are stitched together, but when provided to OpenAI it got rejected, as I assume I am over the token threshold of 4096 for this model. For the record, I am using the POST /chat/completions call
Shall I provide the frames as URLs or as base64 encoded strings? I was wondering if perhaps by providing it that way, I can minimize the consumption of prompt tokens a bit, as the URLs of those pictures is pre-signed, hence lengthy.
If I need to provide them in batches, how can I preserve the context provided by the previous batch? Or do I need to pass the description from the previous call(s) and tell ChatGPT that this is a new batch of pictures and it would need to use the previous context to create a second part of the description?

Thanks a lot in advance to everyone who can find some time to answer some/my questions!

angelamm1 · March 25, 2024, 3:27pm

What are the pixel limitations when the gpt4v model read an image? I was trying to read and image like this 9933x9934 px and I got an error.

vb · March 25, 2024, 4:56pm

Maybe you hit the file size limitation of 20 MB.
But it’s generally advised to reduce the dimensions of the images before passing them to the model.

You can read up on the process here:
https://platform.openai.com/docs/guides/vision

_j · March 25, 2024, 5:48pm

If you are able to successfully send that by resizing or re-encoding, you should be aware that the image will be resized so that the smallest dimension is no larger than 768px. That means you are basically sending something that will be interpreted at 768x768, and in four detail tiles.

Here’s a snippet for constraining the size and cost, by a maximum dimension of 1024 (where the maximum dimension on a long skinny image like in the first post is normally resized down to 2048).

angelamm1 · March 26, 2024, 7:32am

Thanks a lot for the information. I will check the images and test the example provided by you.

Topic		Replies	Views
GPT-4 Vision File Size Limitations Feedback gpt-4-vision	1	8460	April 1, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	4064	December 6, 2023
How Does the GPT-4V API deal with large Images? API gpt-4 , gpt-4-vision	0	1322	January 22, 2024
Maximum number of images in a GPT-4V request? API gpt-4 , gpt-4-vision	5	12052	November 17, 2023
Gpt-4-vision-preview "can't process images input?" API gpt-4 , api	1	1190	November 20, 2023

GPT-4 Vision Pixel Limitations

Related topics