Few-Shot Prompting with Structured Outputs

Hey everyone, I’m new to the OpenAI API, and wondering how to give few-shot examples when using structured outputs.
Approach 1: Serialize as JSON string
Here, I convert the structured output to a JSON string and use that as the content for the few-shot examples:

import openai
from pydantic import BaseModel
client = openai.OpenAI(api_key=api_key)

class Example(BaseModel):
    field_1 : str
    field_2 : str
    
few_shot_examples = [
    {"role": "user", "content": "example user query"},
    {"role": "assistant", "content": """{
        \"field_1\": \"field one example text\",
        \"field_2\": \"field two example text\"
    }"""}
]

messages = [
    {"role": "system", "content": "You are a helpful assistant"}
] + few_shot_examples + [
    {"role": "user", "content": "actual user query"}
]

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,
    response_format=Example
)

Approach 2: Use a Python String Representation of the Object
Instead of a JSON string, I represent the structured output as a Python object in string form:

few_shot_examples = [
    {"role": "user", "content": "example user query"},
    {"role": "assistant", "content": "Example(field_1=\'field one example text\', "
                                     "field_2=\'field two example text\')"}
] 

Should I stick with the JSON string or use the Python string representation, or is there a better way to provide the examples?
Thanks in Advance!

1 Like

Both of the examples you show need improvement.

The AI model produces JSON as its output.

When strict is used (a BaseModel as response_format), the AI cannot produce anything other than the JSON object with keys and structure provided as a schema. This is plain text like any other response.

Therefore, a “python object” collection assistant response as example, that is contrary to what the AI can and will produce, is unproductive as a training guide.

In this triple-quoted multi-line string literal, the escaping within is not necessary or purposeful. Single-quotes don’t break out of the string. The AI just produces the plain text. The OpenAI library does any JSON escaping needed on string contents as UTF-8.

You can still get the “content” instead of “parsed” from the response object, and see what the typical AI structured output is.

You do not need to train on the JSON format, as it is enforced, just the contents and understanding of the task. However, new models supporting structured outputs do not follow in-context multishot learning well to alter their style or behavior. You can evaluate the quality with and without to see if it is worthwhile, or if instead more instruction is the correct path.

2 Likes