How to best work with 100s of images

rsomani95 · January 17, 2024, 3:19pm

Hello.

My question has been touched upon in different contexts in the forum before. Since links are disallowed, I’ll provide titles of related posts at the bottom.

My main question is:
What is the best way to work with 1000s of images with the GPT4-V API?

As I understand it, there is no inbuilt way to have the model keep track of indices of images sent.
In the post titled " I give 5 images to gpt4-vision and need to identify 2 similar images?, the suggested response is to ask the model to output indices as JSON in the text prompt, like this:

You are a helpful assistant designed to output JSON.  
You will help extract the indices of items in an array based on the ordinal numbers mentioned in a text.

In my experience, this works for a few items but the moment you scale up, ANY LLM starts hallucinating. This feels inherently unreliable to me.

In another post titled “Referring to multiple images in vision API”, the suggested method is to add the name of the image inside the image itself, like so:

As hacky as that seems, this seems to be the most robust solution?

I’m curious if anyone has worked with sending GPT4-V 100s or 1000s of images in a single request and have had success with keeping track of images.

Thanks!

Topic		Replies	Views
I give 5 images to gpt4-vision and need to identify 2 similar images? API gpt-4-vision	11	5756	January 18, 2024
Referring to multiple images in vision API API gpt-4	7	4559	October 26, 2024
How to identify photos when batching for gpt 4 vision API	3	1625	March 18, 2024
Image recognition: looking for advice Prompting gpt-4 , chatgpt , image-reading , chat-with-images	0	823	March 1, 2024
Frame unique identification API gpt-4-vision	5	291	May 17, 2024

How to best work with 100s of images

Related topics