GPT4-o inconsistently respects field descriptions with Structured Outputs

It seems that GPT-4o follows the instructions defined in the description of a field very inconsistently (or, better to say, rarely). In contrast, if I put the description of a field directly in the system prompt, it is much better respected. (I use the OpenAI Python client with Pydantic models and the response_format option.)

For example, this works much worse:

system_prompt = "Extract information from the below medical report in JSON format:"

class Summary(BaseModel):
    ...
    score: int = Field(description="The NIH Stroke Scale/Score (NIHSS). Add 5 to the reported score.")
    ...

And this works much better:

system_prompt = """
Extract information from the below medical report in JSON format using these properties:
...
- score: The NIH Stroke Scale/Score (NIHSS). Add 5 to the reported score.
...
"""

class Summary(BaseModel):
    ...
    score: int
    ...

(Adding 5 only checks if it follows the instructions and has no medical background.)

Has anyone else had this experience, too?

3 Likes

Hi @medihack and welcome to the community!

Yes this has been my experience as well. There have been other similar discussions here and the consensus is that leaving field descriptions empty, and placing all the definitions and instructions in the system prompt, yields the best and most consistent results. So the Pydantic model is used purely to define the output schema.

3 Likes