Inconsistent Structured Output Results

shahmoosaraza · March 25, 2025, 10:26am

Hi, i am having trouble getting consistent results using structured output.

Context:
Translating an excel file in X number of languages.
I am using Chat Completions (Streaming Structured Output).
The source text is in English.

This is my user prompt:

const promptMessage = `Translate the following JSON array of texts to ${languages.join(
      ", "
    )}. 
Return the translations following this exact schema format:
{
  "translations": [
    {
      "id": "0", // IMPORTANT: Return this ID in your response
      ${languages
        .map((lang) => `"${lang.toLowerCase()}": "${lang} translation"`)
        .join(",\n      ")}
    }
  ]
}
IMPORTANT: Each translation must include the "id" field that was provided in the input. Do not include the "source" field in your response.

Here are the texts to translate:
${formattedContent}`;

This is how i create my translationSchema (using Zod):

function createLanguageTranslationSchema(languages) {
  const translationFields = {};
  languages.forEach((language) => {
    const langKey = language.toLowerCase();
    translationFields[langKey] = z.string().describe(`${language} translation`);
  });

  return z.object({
    translations: z.array(
      z.object({
        id: z.string().describe("Row identifier"),
        ...translationFields,
      })
    ),
  });
}

This is how it should respond me:

{
  "translations": [
    {
      "id": "0",
      "italian": "Italian translation",
      "arabic": "Arabic translation",
      "japanese": "Japanese translation",
      "russian": "Russian translation",
      "chinese - traditional (hk)": "Chinese - Traditional (HK) translation"
    }
  ]
}

I have implemented a batching solution to stay within the gpt-4o-mini max output token limit (16k), so it batches X numbers of row.

The problem:
I am having inconsistent results, for example:

The presence of special characters, escape char, or html tags, would halt a batch, and it would move to the next batch. I have implemented an html cleanup function. If i translate in a single language it works, if i translate in 10 languages the problem appears.
Certain (random) strings (that dont have any special char) would halt the batch.
After a random number of rows, it would return one language translation for all languages (italian translations in arabic, japanese, russian and chinese).

What would you guys suggest?

shahmoosaraza · March 25, 2025, 12:09pm

Interestingly enough, these issues are not happening in the base gpt-4o-mini model or the previous version of my fine-tuned model.

Topic		Replies	Views
Assistant respond in a different input language Prompting gpt-4 , assistants-api	2	955	March 17, 2024
Structured Output Issue in GPT-4o API – Response Truncation at Specific Index API api , structured-output	1	155	March 19, 2025
Completions API is returning different output for the translations Prompting	1	434	December 24, 2023
Seeking Help: Ensuring Accurate and Complete JSON Output with OpenAI API for Translating Dialogue Arrays API	0	186	July 2, 2024
4o-mini-transcribe not respecting language option Bugs transcribe , gpt-4o-mini	0	48	April 19, 2025

Inconsistent Structured Output Results

Related topics