How to Include Image-Text Pairs as Few-Shot Examples in Prompts?

mdhor · April 17, 2025, 7:32am

Hi everyone,

I’m working on a task where the model needs to generate text based on both an image and some input text (an image-text-to-text task).

I want to use few-shot prompting to guide the model, providing examples of the desired input/output behavior. However, I’m running into a conceptual block regarding how to format the examples themselves within the prompt.

Standard few-shot examples are typically text-to-text, which is straightforward to include:

Prompt:
Example 1 Input: [Text Input 1]
Example 1 Output: [Text Output 1]
Example 2 Input: [Text Input 2]
Example 2 Output: [Text Output 2]
Actual Input: [Actual Text Input]
Actual Output:

My challenge is: How do I represent the image part of an image-text pair within these examples in the prompt?

Prompt:
Example 1 Input Image: [??? How to include Image 1 ???]
Example 1 Input Text: [Text Input 1]
Example 1 Output Text: [Text Output 1]

Example 2 Input Image: [??? How to include Image 2 ???]
Example 2 Input Text: [Text Input 2]
Example 2 Output Text: [Text Output 2]

Actual Input Image: [Actual Image ???]
Actual Input Text: [Actual Text Input]
Actual Output Text:

Any insights, examples, or pointers to documentation would be greatly appreciated!

Thanks!

_j · April 17, 2025, 10:37am

mdhor · April 17, 2025, 1:14pm

Thanks for the reply.

Isn’t there typically an image encoder that creates an image embedding before it is parsed along side the text? If so, passing the base64 image as text to the VLM won’t yield the same results?

_j · April 17, 2025, 1:29pm

The images must be contained in a role message, and only the “user” message is allowed vision input.

Your are not passing the image as text if you use the “type” of the user message part correctly.

It is just a method of providing the data to the API, a data URL, just as an API request with an internet URL would retrieve the image file and place the data, encoded and vectorized, positionally, in to AI context of that message.

messages

system:
- type: text
- text: you’re a nice bot
user:
- type: text
- text: look at this picture
- type: image
- image: {picture data}
- type: text
- text: tell me if im 2 cute

where the user has sent three list items.

Topic		Replies	Views
Few shots with multiple images API api , lost-user	1	273	January 28, 2025
How can you use the API to merge two pictures? API	3	624	January 16, 2025
Gpt-4 vision few shot prompting with images API	3	3878	May 29, 2024
Use base64 encoded images or urls within prompts? API gpt-4	2	7535	August 7, 2024
How to add correct examples for image-to-text task Prompting gpt-4-vision	5	2322	December 29, 2023

How to Include Image-Text Pairs as Few-Shot Examples in Prompts?

Related topics