Increased False Negatives in Attribute Extraction Using Structured Outputs

jo_gen_ai · January 16, 2025, 2:23pm

After implementing structured outputs using a Pydantic model, I’ve observed a decline in data extraction accuracy. Previously, my prompt effectively extracted attributes from documents without structured outputs. However, with structured outputs, few attribute extractions are failing, resulting in increased false negatives during evaluation. Notably, after reducing the schema hierarchy, the number of false positives decreased. I seek to understand how structured outputs influence extraction, given that the prompt remains unchanged, and how the prompt and structured output schema (Pydantic model) interact. Insights into this interaction would help me adjust the prompt accordingly.
Model - gpt-4o
version- 2024-08-06
api version - 2024-10-01-preview

Any assistance is greatly appreciated.

sergeliatko · January 16, 2025, 6:09pm

I would consider that a sign of my schema be a bit too complex for the task… Providing the prompts and schema would drastically help to inspect the issue.

platypus · January 16, 2025, 6:24pm

The Pydantic model should be used simply to provide the response schema/types - shouldn’t put descriptions into Pydantic model itself. It’s mostly to define a harness for what the response types should be, so that you can also validate those properly.

The descriptions, rules, explanations, etc should go into the system prompt.

As Serge mentioned, try to keep your Pydantic model as flat as possible, without complex hierarchies or nesting.

Have you tried the latest November models to see if they behave the same?

jo_gen_ai · February 13, 2025, 9:01pm

Thanks fort he suggestions, I did try most of them on October release and results were same. We are having struggles with November release without using structured output too.

Topic		Replies	Views
Function calling vs response format API api	2	340	February 21, 2025
Quality of response between gpt-4-1106-preview and gpt-4o API gpt-4 , openai , gpt-4o	14	951	September 11, 2024
GPT4-o inconsistently respects field descriptions with Structured Outputs API structured-output	1	171	December 21, 2024
Structured output Precision / Accuracy: Pydantic vs a Schema API structured-output	3	3128	December 20, 2024
Issue with Structured Output in OpenAI API API structured-output	1	557	March 25, 2025

Increased False Negatives in Attribute Extraction Using Structured Outputs

Related topics