How to do few-shot prompting interweaving text and images with Gpt-4-vision-preview as seen in "The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)"?

DataBindu · January 23, 2024, 5:44pm

I don’t understand how to interweave text and images ( or just ordering them in the prompt) while using the API, especially in a few-shot image & text manner.

I see multiple images can be uploaded but there’s no option to control the ordering of the text in the prompt with the order of the images as seen in “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)”.

Thanks for the help!

supershaneski · January 23, 2024, 11:46pm

For example, looking at this as reference

Perhaps, you can format the message like this

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "In the graph, which year has the highest average gas price for the month of June?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://.../national_gas_price_comparison_2016-2019.jpg",
          },
         {"type": "text", "text": "This graph is a line plot for national gas price comparison from 2016 until 02/04/2019. The legend on top shows the line color of each year, red (2019), blue (2018), green (2017)  and orange (2016). Since the data is reported until Feb. 2019, only 3 years have datapoints for the month of June, 2018 (blue), 2017 (green) and 2016 (orange). Among them, blue line for 2018 is at the top for the month of June.  Hence, the year with the highest average gas price for the month of June is 2018. "},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://.../national_gas_price_comparison_2015-2018.jpg",
          },
         {"type": "text", "text": "This graph is a line plot for national gas price comparison from 2015 until 12/10/2018. The legend on top shows the line color of each year, red (2018), orange (2017), green (2016)  and orange (2017). Since the data is reported until Dec. 2018, all 4 years have datapoints for the month of June. Among them, red line for 2018 is at the top for the month of June.  Hence, the year with the highest average gas price for the month of June is 2018. "},
        },
        ...
      ],
    }
  ],

Previously, I thought that the sequence of the image and text within the same message content entry does not matter. But it seems to be relevant.

deeksha.s.nayak · March 26, 2024, 7:40am

import os
import requests
import base64

Configuration

GPT4V_KEY = “”

headers = {
“Content-Type”: “application/json”,
“api-key”: GPT4V_KEY,
}

Function to encode image to base64

def encode_image_to_base64(image_path):
with open(image_path, “rb”) as img_file:
encoded_image = base64.b64encode(img_file.read()).decode(“utf-8”)
return encoded_image

Payload for the request

payload = {
“model”: “gpt-4-vision-preview”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Where is dove light hydration lotion present on the shelf?”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151622.jpg"),
}
},
{
“type”: “text”,
“text”: “It is located on the second shelf in the middle.”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151409.jpg"),
}
},
{
“type”: “text”,
“text”: “It is located on the second shelf at the right.”
},
{
“type”: “text”,
“text”: “Where is dove light hydration lotion present on the shelf?”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151535.jpg"),
}
},
]
}
],
“max_tokens”: 300
}

GPT4V_ENDPOINT = “”

Send request

try:
response = requests.post(GPT4V_ENDPOINT, headers=headers, json=payload)
response.raise_for_status() # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
raise SystemExit(f"Failed to make the request. Error: {e}")

Extracting and printing GPT-4’s response

response_data = response.json()
if ‘choices’ in response_data and response_data[‘choices’]:
gpt4_response = response_data[‘choices’][0][‘message’][‘content’]
print(gpt4_response)
else:
print(“No GPT-4 response found in the API response.”)
Whats wrong in this code? I tried to implement few shot learning but it doesn’t generate a response

chirag.shah285 · April 7, 2024, 4:15am

Interested in resurrecting this to see if few shooting a visions model any different than few shooting a language model? How were the results? did it actually work?

deeksha.s.nayak · April 10, 2024, 5:18am

The results although not accurate but are able to generate better resonses than without few shot. So I’d say it works.

siddhantsingh652 · July 26, 2024, 11:49am

Can you share the github link where you have tested this kind of few shot prompting with images?

samuelsami0001 · September 4, 2024, 12:12pm

Hi Deeksha, can we get the end-to-end code or GitHub repo for a few shot promoting images

Topic		Replies	Views
Gpt-4 vision few shot prompting with images API	3	3318	May 29, 2024
How to do few shot prompting with images in GPT-4 vision api structure? Can someone provide a code to do so? API	6	5495	April 23, 2024
How to add correct examples for image-to-text task Prompting gpt-4-vision	5	2172	December 29, 2023
It is possible to have better performance by using few-shot prompting with image inputs and structured outputs? Prompting gpt-4	0	33	March 20, 2025
How we can Few shot in the gpt 4 vision model API gpt-4 , chatgpt , assistants-api	0	362	May 29, 2024