How to effectively prompt for Structured Output

Hi, I’d like to understand more about how Structured Output works, so I can prompt it better.

In my example, I want the LLM to generate either a shell command or a general chat response depending on the situation. In the JSON schema I set two fields: shell_command and chat_response.

Regarding system instructions:

  • Do you still need to use a prompt like “always output in JSON schema” when you’re already using Structured Output?
  • Also, when defining output for the LLM, is it better to phrase instructions in a strict, programming-like way (e.g., “set shell_command to some value”), or in a more natural, human way (e.g., “generate a shell command for the user”)? Which one works better in practice?

Regarding the JSON schema:
Example:

“shell_command”: {
“type”: “string”,
“description”: “The shell command to accomplish the task, if applicable.”
}

  • Does the field name matter? For instance, if I used "shell_command" vs. "field_1", would it make a difference in how the model behaves?
  • How about the description field — it feels like it overlaps with the system instruction. How do they work together? Should I think of the description as repeating the instruction but specifically from the perspective of that field?

I’m building a lightweight AI shell helper where structured output plays a crucial role in providing fast and fluid responses:
:backhand_index_pointing_right: https://github.com/cjccjj/pls

Thanks a lot!
CJ

1 Like

Hi. That’s a lot, let me see if I can help a bit:

  • When you use a structured output you don’t need to refer to JSON. But JSON mode does.
  • Field names matter if you don’t provide descriptions. Like, ask for structured information about some text without explaining further. Example: “classify this customer data based on this text.”
  • Descriptions on fields work like a guidance, in pair with any prompt commands. Describing what the field is supposed to help get better results. Depending on the complexity of the request, you can use either or both.
  • Testing and experimenting is the key, there are no written answers for these.
  • For me, your schema description seems adequate to your usage.
2 Likes

System Instructions vs. JSON Schema

You DON’T need “always output in JSON schema” when using response_format={“type”: “json_object”}. The API parameter handles this automatically, so you can focus on writing clear, natural instructions.

Better approach: Use natural, human-like language in your system prompt, then let the JSON schema handle the structure.

System Instructions: Natural vs. Programming-like

Use natural language - it works much better:

python

# :white_check_mark: GOOD - Natural, human-like

system_prompt = “”"You are an AI shell helper. Analyze the user’s request and determine the best response type.

If the user needs a shell command, provide it in the shell_command field.

If the user needs general help or conversation, provide it in the chat_response field.

Consider the context and user’s intent carefully.“”"

# :cross_mark: AVOID - Too programming-like

system_prompt = “”"Set shell_command to a value when shell command is needed.

Set chat_response to a value when general chat is needed.“”"

SON Schema Design

Field names DO matter - use descriptive, semantic names that help the model understand the purpose:

schema = {
“type”: “object”,
“properties”: {
“shell_command”: {
“type”: “string”,
“description”: “The shell command to accomplish the task, if applicable. Leave empty if no shell command is needed.”
},
“chat_response”: {
“type”: “string”,
“description”: “A helpful response when no shell command is needed, or additional context about the shell command.”
}
},
“required”: [“shell_command”, “chat_response”]
}

Think of it this way:

  • System instructions = “What to do” (natural language)

  • Schema descriptions = “How to format it” (field-specific guidance)

The descriptions should complement, not repeat, your system instructions. They give the model field-specific context about what belongs in each field.

Think of it this way:

  • System instructions = “What to do” (natural language)

  • Schema descriptions = “How to format it” (field-specific guidance)

The descriptions should complement, not repeat, your system instructions. They give the model field-specific context about what belongs in each field.

If Using JSON Mode (Legacy Approach)

You absolutely need JSON guidance in your system prompt:

python

system_prompt = “”"You are an AI shell helper. Analyze the user’s request and determine the best response type.

CRITICAL: You must ALWAYS respond with valid JSON in this exact format:

{

“shell_command”: “the command or empty string”,

“chat_response”: “helpful response or empty string”

}

Do not include any text outside the JSON object. Only output the JSON.“”"

# :cross_mark: This won’t work reliably with JSON mode

system_prompt = “”“You are an AI shell helper. Provide shell commands when needed, or helpful conversation when not.”“”

f Using response_format Parameter (Modern Approach)

You can focus on natural language without heavy JSON instructions:
system_prompt = “”"You are an AI shell helper. Analyze the user’s request and determine the best response type.

If the user needs a shell command, provide it in the shell_command field.

If the user needs general help or conversation, provide it in the chat_response field.

Consider the context and user’s intent carefully.“”"

Use response_format={“type”: “json_object”} instead of JSON mode because:

  • More reliable JSON output

  • Cleaner system prompts

  • Better model understanding

  • Future-proof approach

I had the Ai use some of my project examples to write this up, it explains things better than I do lol.

2 Likes

Thanks, that’s helpful.

Testing is a real headache for me, AI adapts too easily, even when there are mistakes. It’s hard to tell just from testing myself. I might need to set up some AI testing helper.

1 Like

Thanks for the detailed reply, very helpful. That pretty much matches what I figured about how it works under the API (I’m guessing there’s a prompt template in between). The tricky part is testing, since AI adapts too well, would be great to hear from someone with more experience.

I’m going with the modern approach. For system prompt vs. JSON schema description, I think the schema should mainly handle format. But for little yet important rules like “for shell_command, don’t use Markdown or quotes,” it kind of feels like both guidance and format. In practice, both ways usually work, so maybe it doesn’t matter. Still, I’d love a clear theory on which one’s actually better.

2 Likes

I find that working this way gets you really thinking about how you are building. When you evaluate the behaviours of the Ai’s responses through multiple tests, than make changes you can see how it thinks over time and build out amazing systems with even fall back layers for things that may fall through the cracks all they way down to simple logic if you need to guide things more direct beyond ai deciders as I call them.

It would be interesting if there was a good benchmark for this.

you could simply build a test that runs one model after another capture the outputs run so many results with a profiler for speed, and then build an LLM that understands what you are trying to shape and have it compare that shape with the results and output for both to compare and grade.

Ai can build that for you to test. It’s a simple evaluator. and if you want to get fancy add a loopback system to the Evaluation processes with what was in correct along with current format and what current instructions are allowing it to rebuild the instructions with a fix to replace current and run again until results are 100% across all test numbers. You can build some really powerful ai tools / functions / ML using this concept with today’s models. This simplifies the process which you can monitor the changes have it build you a change logger or output system or you can sit and approve every change that is up to you :slight_smile:

Just my thoughts…