Gpt-4 vision few shot prompting with images

deeksha.s.nayak · March 26, 2024, 1:46pm

Can someone provide a complete code with libraries to demonstrate few shot prompting with images? Here’s the use case I’m focusing on. I want to prompt the model with a pair of reference and shelf image providing the description. So next when the user uploads new pairs of reference shelf images the model is able to generate similar descriptions.

_j · March 26, 2024, 3:07pm

A single user message that includes a few images will look like:

user_message = [ { "role": "user", "content": [ { "type": "text", "text": "describe these two images", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image1}", "detail": "low"} }, { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image2}", "detail": "high"} } ] } ]

Past turns of conversation that have the user image input and the AI assistant answer can be exactly the same as first sent if you want the AI to continue seeing the contents of the images in past messages.

Chat history can be the same preservation of information as you would send in a normal chat (at higher expense with images), or you can start to expire and remove the actual images from being sent again after a limited number of future turns.

The style of response the AI writes is unlikely to vary much if you are having it perform similar tasks again. It is only when the older image content is still important that you’d send them again, otherwise, old images would be a distraction.

anandnikhil12 · May 29, 2024, 5:36pm

But where you are labelling the image few shot means images and its output how we want

_j · May 29, 2024, 5:43pm

Multi-shot means training an AI on a pattern in the context you send, where it can learn the desired output.

Just the unseen prompts of “user” and “assistant” are one such pattern when you send by chat completions, and that break in the language to signify a different respondent is necessary when using a base AI.

A example set of training examples one might sent to the AI to show it the responses desired from messages that include training images:

user: safe? (picture of lion)
assistant: {“safety”: “unsafe”}
user: safe? (picture of baby)
assistant: {“safety”: “safe”}
user: safe? (picture of blowtorch)
assistant: {“safety”: “unsafe”}
user: safe? (picture of school glue)
assistant: {“safety”: “safe”}

By the previous conversation, in this case also using stock images, a new user input in the same format can evoke the same response and decision-making.

Chat models are now overtrained and can barely learn from context.

Topic		Replies	Views
How to do few-shot prompting interweaving text and images with Gpt-4-vision-preview as seen in "The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)"? API gpt-4 , api	6	3200	September 4, 2024
Few shots with multiple images API api , lost-user	1	288	January 28, 2025
It is possible to have better performance by using few-shot prompting with image inputs and structured outputs? Prompting gpt-4	0	84	March 20, 2025
Ingesting Few-Shot examples with Structured Output Prompting api , assistants-api	4	634	June 6, 2025
How to Include Image-Text Pairs as Few-Shot Examples in Prompts? Prompting api	3	356	April 17, 2025

Gpt-4 vision few shot prompting with images

Related topics