I have extracted a set of 40 key images from a video. Can the 4o mini model accept 40 images, identify the top 5 images in accordance with the instructions, and provide me with the relevant images?
Is there another way to get the best images from the ones that have been provided?
For a number of reasons this would not be possible. You would fast exceed the token limit for gpt-4o-mini with that amount of pictures. Additionally, the model would struggle to analyze 40 pictures in a single API call.
If I was in your place, I would not provide more than 2 pictures for a given request; ask the model to return a description of the picture (or whatever it is you need as a basis to make a selection). Then combine all these outputs and run a final API call to make the selection of the top 5 pictures based on the relevant criteria.