How to prevent errors in GPT-4o batch processing?

Hi, i use gpt 4o batch to process the data i have, and i expect json responses in the format that i mention in the prompt, and i give reponse format as json as well. But i came over a case where few prompts in a batch broke, where it started giving repeated answers till it hit the max tokens (1000 in my case), i encountered few of these where some was just “/n” till it got to 1000 tokens and some were gibberish repeated till token limit. What could be the reason for this error, and how can i avoid it. It works well in the playground, and when i ran the same one in batch again it worked. But i want to know how this is caused and how to avoid it alltogether.

1 Like

There is a particular phenomenon when using specifically response_format: json_object that acts just as you describe, and has been happening since day one: you get a response full of repetitive newlines or tabs.

This happens when the AI might try to respond with something other than a JSON - but can’t.

The json_object response format, or “JSON mode” isn’t a strict enforcement. Rather, it is some sort of training of the AI. OpenAI forces you to put the word “JSON” somewhere in the prompt to counter this bad behavior, but really, you need to over-specify the JSON output that the AI must produce, in no uncertain terms.

I would give a system prompt along these lines:

You are an automated AI data processor, and your output is sent not to a user, but directly to an API that parses and validates output JSON that you produce. You must produce only output in this JSON format specified below, otherwise errors will occur, and there is no alternative way for you to respond except as JSON to the API (of purpose xxx).

Example response format: (JSON Examples)
Your output is validated against this schema (JSON Schema)

When the AI instructions and output already is of high enough quality that you’d never have to rely on a response format parameter - then you can turn on json_object mode.

Alternately, you can use a strict structured output schema as response format. This has less tendency to “go nuts”, and when it does, it is within the strings of JSON itself.

1 Like