Images input order with gpt-4 vision/omni

yovell04 · May 20, 2024, 11:45am

Hi, I’m using GPT4 Vision and Omni to help analyze some movies. I’m trying to really analyze each scene in the movie and get some insights regarding it. one of the thing I try to do is using the movie’s transcript for a specific section of it, alongside some key frames to build a textual graph of who talks to who during the movie. I was wondering if GPT-4 cares about the input order of the frames I give it, because for my task, the order makes a high difference. I read online that people try to split the messages: “this is image no.1” → img → “this is image no.2” → img2 and so on… but it seems too wasteful for my case.

To me (and maybe I’m wrong) like text or audio maintains a linear order (or bi-direction encoding), so should the images do by default?
did anyone notice a difference between GPT4-V and Omni on this topic by any chance?

Topic		Replies	Views
Api image/text order with gpt-4v API gpt-4 , gpt-4-vision	2	988	March 22, 2024
GPT4-V: the order of multiple image inputs API gpt-4-vision	4	8528	October 26, 2024
Does the order of items in content array affect the response with gpt4-vision API gpt-4 , gpt-4-vision	2	601	January 15, 2024
How to best work with 100s of images API gpt-4	0	1335	January 17, 2024
Referring to multiple images in vision API API gpt-4	7	3761	October 26, 2024

Images input order with gpt-4 vision/omni

Related topics