Referring to multiple images in vision API

I’m encountering an issue with the vision API regarding the handling of multiple images.

For example, when submitting two image URLs and requesting descriptions, I’m able to coax it into mostly returning a valid JSON list of descriptions. However, it’s unclear whether the descriptions are returned in the same order as the URLs provided. This ambiguity prevents me from confidently mapping returned_image_descriptions[0] to the first image URL, and returned_image_descriptions[1] to the second. Has anyone else experienced this, and is there a way to ensure the responses correspond deterministically to the order of the submitted image URLs?

I’ve tried making it return a JSON list of objects with a schema like:

{  "$schema": "",
  "type": "object",
  "properties": {
    "url": {
      "type": "string"
      "format": "uri"
    "description": {
      "type": "string"
  "required": [

But the url fields end up with made-up URLs rather than the actual image urls providing in the request.


Hi and welcome to the Developer Forum!

Best of at this state using a simple image per call, and personally I’d cache the image locally and upload it in base64 format.

Yeah you can’t map input with output. The model does not have any access to meta data.

However the solution i found was to just add a black bar to the bottom with the name of the image. This way it will return the name of the image with what ever description you want. But you have to tell the model about it obviously.

1 Like

Lol, that’s a great workaround. Thanks for sharing.

As time passed by, is there any other solution to the problem?