Referring to multiple images in vision API

dustin.wyatt · November 17, 2023, 5:57pm

I’m encountering an issue with the vision API regarding the handling of multiple images.

For example, when submitting two image URLs and requesting descriptions, I’m able to coax it into mostly returning a valid JSON list of descriptions. However, it’s unclear whether the descriptions are returned in the same order as the URLs provided. This ambiguity prevents me from confidently mapping returned_image_descriptions[0] to the first image URL, and returned_image_descriptions[1] to the second. Has anyone else experienced this, and is there a way to ensure the responses correspond deterministically to the order of the submitted image URLs?

I’ve tried making it return a JSON list of objects with a schema like:

{  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "url": {
      "type": "string"
      "format": "uri"
    },
    "description": {
      "type": "string"
    }
  },
  "required": [
    "url",
    "description"
  ]
}

But the url fields end up with made-up URLs rather than the actual image urls providing in the request.

Foxalabs · November 17, 2023, 6:21pm

Hi and welcome to the Developer Forum!

Best of at this state using a simple image per call, and personally I’d cache the image locally and upload it in base64 format.

Darkbelg · November 17, 2023, 7:02pm

Yeah you can’t map input with output. The model does not have any access to meta data.

However the solution i found was to just add a black bar to the bottom with the name of the image. This way it will return the name of the image with what ever description you want. But you have to tell the model about it obviously.
1251_63

dustin.wyatt · November 17, 2023, 7:33pm

Lol, that’s a great workaround. Thanks for sharing.

mindaugas · May 15, 2024, 10:59pm

As time passed by, is there any other solution to the problem?

Streambuild · August 5, 2024, 5:01pm

Hi is there no way to somehow pass the image names to the vision api?

amagic_g · August 21, 2024, 1:37pm

Not that I am aware of. but would be happy to hear if there is any.

ufku · October 26, 2024, 8:26pm

Sending a text message with an ID before each image seems to be working fine:

Main text message:

Analyze the attached images and select the best one for a finance site.

Return the results in JSON format using the following interface:
{
  images: {
    // The id of the analyzed image.
    "id": number;
    // Set this to true if it's the best image.
    "best_image": boolean;
    // One sentence feedback on why you chose the image.
    "feedback": string;
  }[];
}

Here is how i build the rest of the content in PHP:

foreach ($images as $id => $url) {
  $content[] = ['type' => 'text', 'text' => "ID for the next image: $id"];
  $content[] = ['type' => 'image_url', 'image_url' => ['url' => $url]];
}

Topic		Replies	Views
GPT4-V: the order of multiple image inputs API gpt-4-vision	4	9703	October 26, 2024
How to best work with 100s of images API gpt-4	0	1421	January 17, 2024
How to identify photos when batching for gpt 4 vision API	3	1506	March 18, 2024
I give 5 images to gpt4-vision and need to identify 2 similar images? API gpt-4-vision	11	5423	January 18, 2024
Image tagging issue in openai vision API gpt-4-vision	2	234	October 23, 2024

Referring to multiple images in vision API

Related topics