Few-Shot Examples for Structured Output: Best Practices
I heavily use structured outputs for various use cases and it’s an amazing feature. However, for many of my use cases, I end up dumping the few-shot examples of structured outputs into the system message itself because there is no way to properly format them in a user-assistant format.
Currently, I have two approaches:
Approach 1 (Ideal)
system: default_instructions
user: user_inp1
assistant: structured_output_1 (pydantic/dict)
user: user_inp2
assistant: structured_output_2 (pydantic/dict)
...
user: current_input
Approach 2 (What I currently do)
system: default_instructions \n
Input: user_inp1 \n
Output: str(structured_output_1.dict()) \n
Input: user_inp2 \n
Output: str(structured_output_2.dict())
user: current_input
From what I understand, few-shot examples help to create an accurate mapping from the input to the output. My experience in practice also supports this.
For example, when working on text-to-SQL, I had few-shot examples where user messages were plain text and assistant messages were SQL queries. This served two purposes:
- Grounding: The assistant learned to reply strictly with valid SQL queries, without adding extra content or descriptions.
- Improved Accuracy: With correct few-shot examples, the generated SQL queries became more contextual and accurate.
Now with structured outputs, I can define a Pydantic class with a sql_query
field and get valid SQL queries (most of the time). Adding few-shot examples with Approach 2 definitely helps. However, I can’t help but think that implementing Approach 1 would help squeeze out that last bit of accuracy gains.
Main Question
Where I am confused is: what is the correct way to include few-shot examples for structured outputs? Should I stick to Approach 2 where I dump examples into the system message using str(structured_output.dict())
, or should I try to mimic Approach 1 by explicitly modeling user-assistant turns with proper structured responses?
Is there any practical evidence or best practice suggesting that one method consistently outperforms the other, especially for improving the accuracy and consistency of complex structured outputs?
If anyone has experimented with this or has recommendations based on experience, I’d appreciate your insights.
References
Not able to add links to similar blogs but did not find any answer anywhere else
TL;DR
Which works better in practice for few-shot with structured output: Approach 1 or Approach 2?Any other approach which is even better?