How to do few shot prompting with images in GPT-4 vision api structure? Can someone provide a code to do so?

It would be essentially the same as sending other few-shot examples: you give an input, and you demonstrate the way the AI responds, so that it can begin following a pattern.

These latest models, such as the 1106 version of gpt-4-turbo that vision is based on, are highly-trained on chat responses, so previous input will show far less impact on behavior.

After the system message (that still needs some more demonstration to the AI), you then pass example messages as if they were chat that occurred.

One can also experiment with how the AI interprets the “name” field, where you can use a name like “example”.

multi-shot_messages = [
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Perform the programmed vision task on two images"
      },
      {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{base64_image1}", "detail": "low"}
      },
      {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{base64_image2}", "detail": "high"}
      }
    ]
  },
  {
    "role": "assistant",
    "content": '{"similarity": 75, "commonality": "dogs"}'
      
  }
]

If you read the response the AI writes, you also can probably figure out what I want you to do…

then to build your final API call:

messages = system + multi-shot_messages + history + user_input

2 Likes