I’ve been using GPT-4o to generate structured outputs for tables (converting OCR data to markdown), which range from slightly to highly complex in structure. I’m working with a predefined JSON schema that the table data needs to conform to.
However, when I run the same set of tables multiple times, I notice that the outputs vary slightly across runs. The JSON structure remains intact, but the values assigned to the JSON keys differ from run to run.
The markdown tables generated from the OCR data (via Azure Document Intelligence) are consistent, and I’ve set temperature=0
and top_p=1
to minimize randomness. Has anyone experienced something similar? Is this behavior expected?
Could anyone suggest to prevent the randomness?
1 Like
Yes, I’ve experienced this; yes, it is expected:
Structured Output is 100% spot-on with small variations in the actual content.
Dealing with the content is tricky—an inherent flaw. Here’s some stuff you can do:
- Specialize your API calls (or Assistants) so that one focuses on the content, the other on the structured output.
- You can provide examples for each structured output item to help with the format by including a ‘description’ field.
{
"name": "structure_monster_info_response",
"description": "Structure the monster's info and output it as a JSON.",
"strict": true,
"schema": {
"type": "object",
"properties": {
"monster_unique_id": {
"type": "string",
"description": "Unique ID for the monster in our database, e.g., 'dndgpt_mon_0001'. This is a unique field."
},
- You can create a critique bot to make changes afterwards.
- You can programmatically correct some things. (I used this method to standardize to UTF-8 which was particularly problematic for some reason.)
- Finally, you can fine-tune a model to specify the exact output.
None of this will reduce errors to 0, but you’ll approach 0 with each step.