How to do few-shot prompting interweaving text and images with Gpt-4-vision-preview as seen in "The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)"?

deeksha.s.nayak · March 26, 2024, 7:40am

import os
import requests
import base64

Configuration

GPT4V_KEY = “”

headers = {
“Content-Type”: “application/json”,
“api-key”: GPT4V_KEY,
}

Function to encode image to base64

def encode_image_to_base64(image_path):
with open(image_path, “rb”) as img_file:
encoded_image = base64.b64encode(img_file.read()).decode(“utf-8”)
return encoded_image

Payload for the request

payload = {
“model”: “gpt-4-vision-preview”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Where is dove light hydration lotion present on the shelf?”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151622.jpg"),
}
},
{
“type”: “text”,
“text”: “It is located on the second shelf in the middle.”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151409.jpg"),
}
},
{
“type”: “text”,
“text”: “It is located on the second shelf at the right.”
},
{
“type”: “text”,
“text”: “Where is dove light hydration lotion present on the shelf?”
},
{
“type”: “image”,
“image”: {
“base64”: encode_image_to_base64(r"D:\STORE\Retail\20240314_151535.jpg"),
}
},
]
}
],
“max_tokens”: 300
}

GPT4V_ENDPOINT = “”

Send request

try:
response = requests.post(GPT4V_ENDPOINT, headers=headers, json=payload)
response.raise_for_status() # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
raise SystemExit(f"Failed to make the request. Error: {e}")

Extracting and printing GPT-4’s response

response_data = response.json()
if ‘choices’ in response_data and response_data[‘choices’]:
gpt4_response = response_data[‘choices’][0][‘message’][‘content’]
print(gpt4_response)
else:
print(“No GPT-4 response found in the API response.”)
Whats wrong in this code? I tried to implement few shot learning but it doesn’t generate a response

Topic		Replies	Views
Gpt-4 vision few shot prompting with images API	3	4395	May 29, 2024
How to do few shot prompting with images in GPT-4 vision api structure? Can someone provide a code to do so? API	6	6508	April 23, 2024
How to add correct examples for image-to-text task Prompting gpt-4-vision	5	2459	December 29, 2023
It is possible to have better performance by using few-shot prompting with image inputs and structured outputs? Prompting gpt-4	0	112	March 20, 2025
How we can Few shot in the gpt 4 vision model API gpt-4 , chatgpt , assistants-api	0	394	May 29, 2024