How to do few-shot prompting interweaving text and images with Gpt-4-vision-preview as seen in "The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)"?

For example, looking at this as reference

Perhaps, you can format the message like this

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "In the graph, which year has the highest average gas price for the month of June?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://.../national_gas_price_comparison_2016-2019.jpg",
          },
         {"type": "text", "text": "This graph is a line plot for national gas price comparison from 2016 until 02/04/2019. The legend on top shows the line color of each year, red (2019), blue (2018), green (2017)  and orange (2016). Since the data is reported until Feb. 2019, only 3 years have datapoints for the month of June, 2018 (blue), 2017 (green) and 2016 (orange). Among them, blue line for 2018 is at the top for the month of June.  Hence, the year with the highest average gas price for the month of June is 2018. "},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://.../national_gas_price_comparison_2015-2018.jpg",
          },
         {"type": "text", "text": "This graph is a line plot for national gas price comparison from 2015 until 12/10/2018. The legend on top shows the line color of each year, red (2018), orange (2017), green (2016)  and orange (2017). Since the data is reported until Dec. 2018, all 4 years have datapoints for the month of June. Among them, red line for 2018 is at the top for the month of June.  Hence, the year with the highest average gas price for the month of June is 2018. "},
        },
        ...
      ],
    }
  ],

Previously, I thought that the sequence of the image and text within the same message content entry does not matter. But it seems to be relevant.

2 Likes