GPT-4v preview limiting batch requests?

jstef · December 30, 2023, 5:36pm

Hi – I’m attempting to process as many images per day as I can with the gpt-4v preview model, so I’m attempting to batch as many images as I can into 1 request.

Issue: the model is only recognizing the first 4 images I submit (images 5 and beyond are ignored). For context, it’s ~5k tokens for this request, and each image adds ~1k tokens.

Has anyone had this experience? I can’t find any documentation online.

sps · December 31, 2023, 12:02am

Hi @jstef

Welcome to the community.

What’s the value of finish_reason in the response?

_j · December 31, 2023, 12:57am

There’s limitations of the cognition of the AI model.

its a chat model, pretrained to start shortening and limiting output around 700 tokens of text
its vision component is not clearly documented, but larger images are attested to use more “tiles” - by how much you are billed
For some tasks, the amount of attention it can pay or information it can receive seem limited - text recognition will falter 1/3 of the way through, but you can also get starting at the 2/3 point.
more images and it will get confused.
An artificially low max_tokens is placed if you don’t specify, set it more like 1500.

I would first try the detail=low setting, and resize images yourself so the longest dimension is 512 pixels. See if that still has the recognition you are looking for within the resized images, then pass a dozen with request for very short details (like “in a numbered list, tell me how many bananas appear in each image I’ve attached”)

jstef · December 31, 2023, 4:55pm

The finish reason is simply “stop”. Everything runs like normal, it just doesn’t return the values for images 5+.

jstef · December 31, 2023, 4:59pm

This is very helpful. We are adjusting and testing.

One question: in your opinion, is this “attention issue” something that is expected to improve over time (if so, slowly / rapidly)?

_j · December 31, 2023, 5:07pm

There is only one model offered with computer vision, and it is not the original GPT-4.

Better multimodal model for us would have to rely on OpenAI giving back quality of attention masking and attention layers (at computation expense) nobody that is not an insider has seen.

Batching with a single call doesn’t save much money, but promises a quality reduction. I’d instead go to your rate limits page and see what it would take to increase your tier, or press the “request increase” there. gpt-4-vision-preview is still noted “not for production”.

Topic		Replies	Views
Batch API for GPT4 Vision limitations API	1	928	April 25, 2024
OpenAI GPT-4o Image Processing: 500 Errors and Long Response Times with Larger Requests API gpt-4 , api-vision	2	450	September 30, 2024
GPT-4 Vision - I need lots of requests API	2	2181	January 6, 2024
Batching / parallel API calls for GPT-4-vision API gpt-4-vision	0	839	May 28, 2024
Maximum number of images in a GPT-4V request? API gpt-4 , gpt-4-vision	5	11530	November 17, 2023

GPT-4v preview limiting batch requests?

Related topics