Few-Shot Prompting with Structured Outputs

jul_gabel · December 6, 2024, 10:03am

Hey everyone, I’m new to the OpenAI API, and wondering how to give few-shot examples when using structured outputs.
Approach 1: Serialize as JSON string
Here, I convert the structured output to a JSON string and use that as the content for the few-shot examples:

import openai
from pydantic import BaseModel
client = openai.OpenAI(api_key=api_key)

class Example(BaseModel):
    field_1 : str
    field_2 : str
    
few_shot_examples = [
    {"role": "user", "content": "example user query"},
    {"role": "assistant", "content": """{
        \"field_1\": \"field one example text\",
        \"field_2\": \"field two example text\"
    }"""}
]

messages = [
    {"role": "system", "content": "You are a helpful assistant"}
] + few_shot_examples + [
    {"role": "user", "content": "actual user query"}
]

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,
    response_format=Example
)

Approach 2: Use a Python String Representation of the Object
Instead of a JSON string, I represent the structured output as a Python object in string form:

few_shot_examples = [
    {"role": "user", "content": "example user query"},
    {"role": "assistant", "content": "Example(field_1=\'field one example text\', "
                                     "field_2=\'field two example text\')"}
]

Should I stick with the JSON string or use the Python string representation, or is there a better way to provide the examples?
Thanks in Advance!

arata · December 7, 2024, 1:23pm

Both of the examples you show need improvement.

jul_gabel:

few_shot_examples = [
    {"role": "user", "content": "example user query"},
    {"role": "assistant", "content": "Example(field_1=\'field one example text\', "
                                     "field_2=\'field two example text\')"}
]

The AI model produces JSON as its output.

When strict is used (a BaseModel as response_format), the AI cannot produce anything other than the JSON object with keys and structure provided as a schema. This is plain text like any other response.

Therefore, a “python object” collection assistant response as example, that is contrary to what the AI can and will produce, is unproductive as a training guide.

jul_gabel:

few_shot_examples = [
    {"role": "user", "content": "example user query"},
    {"role": "assistant", "content": """{
        \"field_1\": \"field one example text\",
        \"field_2\": \"field two example text\"
    }"""}
]

In this triple-quoted multi-line string literal, the escaping within is not necessary or purposeful. Single-quotes don’t break out of the string. The AI just produces the plain text. The OpenAI library does any JSON escaping needed on string contents as UTF-8.

You can still get the “content” instead of “parsed” from the response object, and see what the typical AI structured output is.

You do not need to train on the JSON format, as it is enforced, just the contents and understanding of the task. However, new models supporting structured outputs do not follow in-context multishot learning well to alter their style or behavior. You can evaluate the quality with and without to see if it is worthwhile, or if instead more instruction is the correct path.

Topic		Replies	Views
Ingesting Few-Shot examples with Structured Output Prompting api , assistants-api	4	1021	June 6, 2025
Recommended approach for few shot examples in structured output Prompting api , structured-output	0	364	April 28, 2025
Few shot learning with gpt-4 - is it needed and what is best practice? API gpt-4 , api	3	8409	October 27, 2023
Few shots with multiple images API api , lost-user	1	464	January 28, 2025
How to design few shot prompt with API API gpt-4 , api	4	15083	March 21, 2024

Few-Shot Prompting with Structured Outputs

Related topics