Gpt 4o can only take 39 images?

Hey guys,

I am trying out things and following instructions described in the latest cookbook. I am trying to summarize a 3 minute long video, for that I am trying to send 170 frames to GPT 4o like this:

openai.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0,
    messages: [
      {
        role: 'system',
        content:
          'You are a video summariser. Provided a video you provide a description of what happens in the video. Respond in text, no markup.',
      },
      {
        role: 'user',
        content: [
          { type: 'text', text: 'These are the frames from the video.' },
          ...images.map((filename) => ({
            type: 'image_url' as const,
            image_url: {
              url: `data:image/jpeg;base64,${fs
                .readFileSync(`${folderPath}/${filename}`)
                .toString('base64')}`,
              detail: 'low' as const,
            },
          })),
        ],
      },
    ],
  });

When only sending 39 images, it works as expected and uses about 3000 tokens. However when adding the 40th image, the used tokens jump up to 30000 (10x of the 39 one).

[ERROR] 18:24:18 Error: 429 Request too large for gpt-4o in organization org-REDACTED on tokens per min (TPM): Limit 30000, Requested 30643. The input or output tokens must be reduced in order to run successfully.

I am essentially trying to copy all steps from the cookbook but fail to get the same results, as the 40th image adds 27000 tokens for no reason?

Any tips are appreciated!

@fulop.dani96 Very similar to what I experience. Very frustrating :frowning:

I’ve been working on a similar system, as in breaking videos into multiple images.

I haven’t run into strange token issues like you are experiencing (January, 2025).

However, the system seems to start ‘losing track’ of things, if I try to upload too many images. I’ve been able to successfully, and consistently upload 30 without any issues. At 40, it starts to lose them.

I simply uploaded a bunch of images (taken from a video I made in iMovie) which contained various letters of the alphabet, and some numbers. They were in a random and mixed up order. My instructions were to return which letters/numbers were present. It did this pretty easily.

I then would slowly take out one or two items from the list, and send a request in again, trying to make sure that the system wasn’t just ‘assuming’ that all of the letters would be present.

It was still easily able to detect the one or two missing letters/numbers as I gave it to them. But, like I said earlier, only up until about 30 images per request. At 40, it would start to ‘hallucinate’ (I guess whatever the opposite of hallucinate is, forget maybe?) some of the items.

So, I guess as of January 2024, the 4o-mini (what I was using) is able to handle about 30 images at once.

Hope that helps someone else. Because before just now, I gathered that it could only detect a maximum of 10 images.

Then again, my ‘images’ were just letters and numbers on a black screen. Maybe more complicated images would cause more issues. I need more tests…

Edit: I should also mention that I had ‘Tier 3’ while doing this.

Another edit:

My regular prompt says something like ‘these are image frames taken from a single video, please describe everything found in them’.

When doing 25 letters of the alphabet (one missing) the 4o-mini model seems to have trouble noticing the missing one on its own. If I write in the prompt “hey, there’s at least one missing letter, which one is it?” it DOES notice and find the missing letter(s).

Regular 4o, however, will notice and find any missing letters without the additional prompt hand holding.