I don’t understand how to interweave text and images ( or just ordering them in the prompt) while using the API, especially in a few-shot image & text manner.
I see multiple images can be uploaded but there’s no option to control the ordering of the text in the prompt with the order of the images as seen in “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)”.
Thanks for the help!
1 Like
For example, looking at this as reference
Perhaps, you can format the message like this
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "In the graph, which year has the highest average gas price for the month of June?"},
{
"type": "image_url",
"image_url": {
"url": "https://.../national_gas_price_comparison_2016-2019.jpg",
},
{"type": "text", "text": "This graph is a line plot for national gas price comparison from 2016 until 02/04/2019. The legend on top shows the line color of each year, red (2019), blue (2018), green (2017) and orange (2016). Since the data is reported until Feb. 2019, only 3 years have datapoints for the month of June, 2018 (blue), 2017 (green) and 2016 (orange). Among them, blue line for 2018 is at the top for the month of June. Hence, the year with the highest average gas price for the month of June is 2018. "},
{
"type": "image_url",
"image_url": {
"url": "https://.../national_gas_price_comparison_2015-2018.jpg",
},
{"type": "text", "text": "This graph is a line plot for national gas price comparison from 2015 until 12/10/2018. The legend on top shows the line color of each year, red (2018), orange (2017), green (2016) and orange (2017). Since the data is reported until Dec. 2018, all 4 years have datapoints for the month of June. Among them, red line for 2018 is at the top for the month of June. Hence, the year with the highest average gas price for the month of June is 2018. "},
},
...
],
}
],
Previously, I thought that the sequence of the image and text within the same message content entry does not matter. But it seems to be relevant.
2 Likes
import os
import requests
import base64
Configuration
GPT4V_KEY = “”
headers = {
“Content-Type”: “application/json”,
“api-key”: GPT4V_KEY,
}
Function to encode image to base64
def encode_image_to_base64(image_path):
with open(image_path, “rb”) as img_file:
encoded_image = base64.b64encode(img_file.read()).decode(“utf-8”)
return encoded_image
Payload for the request
payload = {
“model”: “gpt-4-vision-preview”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Where is dove light hydration lotion present on the shelf?”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151622.jpg"),
}
},
{
“type”: “text”,
“text”: “It is located on the second shelf in the middle.”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151409.jpg"),
}
},
{
“type”: “text”,
“text”: “It is located on the second shelf at the right.”
},
{
“type”: “text”,
“text”: “Where is dove light hydration lotion present on the shelf?”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151535.jpg"),
}
},
]
}
],
“max_tokens”: 300
}
GPT4V_ENDPOINT = “”
Send request
try:
response = requests.post(GPT4V_ENDPOINT, headers=headers, json=payload)
response.raise_for_status() # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
raise SystemExit(f"Failed to make the request. Error: {e}")
Extracting and printing GPT-4’s response
response_data = response.json()
if ‘choices’ in response_data and response_data[‘choices’]:
gpt4_response = response_data[‘choices’][0][‘message’][‘content’]
print(gpt4_response)
else:
print(“No GPT-4 response found in the API response.”)
Whats wrong in this code? I tried to implement few shot learning but it doesn’t generate a response
Interested in resurrecting this to see if few shooting a visions model any different than few shooting a language model? How were the results? did it actually work?
The results although not accurate but are able to generate better resonses than without few shot. So I’d say it works.
2 Likes
Can you share the github link where you have tested this kind of few shot prompting with images?
Hi Deeksha, can we get the end-to-end code or GitHub repo for a few shot promoting images