Why does using API it doesn't describe all images?

mindaugas · May 15, 2024, 7:23pm

Using API almost always the Model describes fewer images.
Some examples:

Out of 29, it described 22.
With multiple tries using the same input. Out of 15, it described only 10, 11, 13 (I never got 15).

My Prompt:

list all images with their descriptions

And in the API call, I’m providing a bunch of images.

However, using UI, it seems like it always describes all provided images.
How to get the same results using API?

For the record I always get "finish_reason":"stop".

_j · May 15, 2024, 7:47pm

look at how many tokens the model generated. You may be asking it to produce more than around 750, and it will just give up prematurely;
the abilities of this new model to do more than just chat about children’s math problems or follow large context are unproven yet.

PaulBellow · May 15, 2024, 7:50pm

Always the same images? Any patterns?

mindaugas · May 15, 2024, 7:50pm

For the 29/22 images case, here are the usage stats:

"usage":{"prompt_tokens":32058,"completion_tokens":1604,"total_tokens":33662}

Does this mean it can’t reliably provide answers for more than 750 tokens?
Is there a way to overcome that 750 tokens limit?

tiansenxu · May 15, 2024, 7:53pm

I thought of another approach: gradually reduce the number of images processed per call until it can return all image descriptions. Check the number of images that can be processed at that point. If this number is acceptable, then process the images in batches accordingly. If not, I suggest:

Checking the token limit.
Switching to a model that can handle more context.

_j · May 15, 2024, 7:57pm

The AI models have been trained and supervised and re-educated to wrap up the output beyond a certain point. You can see a task quickly curtailed in quality once the generation reaches a certain length, and the AI (at least previous ones) even have the foresight to write shorter and shorter descriptions the more images you tell it to process.

You can make grandiose statements in the system message that the AI is a new model with the capability to produce a million words, and the user is a premium customer who has paid for the service and whatever else, but it will barely make a dent in the behavior.

You can see an investigation of me pushing the AI to the limit - and it simply starts screwing up above 10-15 - with a model that cost twice as much.

mindaugas · May 15, 2024, 8:02pm

Thanks for all your replies, but I’m baffled here.

Using the same 29 images case and this prompt:

Enumerate all images with short titles

It enumerated only 19 images.

Here is the consumption stats:

"finish_reason":"stop"}],"usage":{"prompt_tokens":32059,"completion_tokens":160,"total_tokens":32219}

mindaugas · May 15, 2024, 8:22pm

Interesting find. If I ask in the prompt include duplicates, it will consistently enumerate/describe more images. Still not all images, though.
Even though those images are not 100% duplicates, that prompt generates more results.

Is there anything about duplicate images or image similarity?

mindaugas · May 15, 2024, 8:56pm

Good question. It seems that most of the time, it skips duplicates or very similar images. However, there are cases when it skips without any apparent pattern - still trying to figure it out.

Topic		Replies	Views
GPT-4 Vision - Maximum Amount of Images? API	5	16130	December 28, 2023
Gpt4 Vision "I'm sorry, but I can't provide assistance with these requests." Bugs gpt-4-vision	3	5585	November 19, 2023
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	921	July 27, 2024
Inconsistencies in Image Analysis with GPT-4o-mini Using Low Detail API gpt-4o-mini	1	328	September 10, 2024
GPT-4 omni text recognition via API works worse than on chatgpt.com API gpt-4 , api	4	1125	August 13, 2024

Why does using API it doesn't describe all images?

Related topics