Seeking Help: Ensuring Accurate and Complete JSON Output with OpenAI API for Translating Dialogue Arrays

I’m encountering issues while using the OpenAI API to translate an array of dialogues from English to another language. The dialogue array is structured as follows:

json

[
  "text1",
  "text2",
  "text3",
  ...
  "textN"
]

Each time I call the API to translate these dialogues, the output needs to maintain the one-to-one correspondence with the original array, ensuring no sentences are omitted and names remain unchanged. However, I’ve run into the following problems:

  1. Sentence Omission: In some cases, the length of the translated JSON array does not match the original array, resulting in missing sentences.
  2. Order Discrepancy: Occasionally, the order of the translated sentences does not match the original array.
  3. JSON Format Issues: The output JSON sometimes has format errors, such as trailing commas, rendering the JSON invalid.
  4. Scalability Issues: When the source text array is small (a few sentences), the translation works well. However, when the array contains many sentences (e.g., 60 or more), errors become more frequent.

Current Attempts

I’ve tried the following solutions, but the issues persist:

  1. Explicitly instructing in the system prompt to ensure the output JSON object length matches the original array and maintains the order.
  2. Using numbered keys to identify the correspondence between each sentence.
  3. Providing example dialogues and expected translations to guide the model.

Here are my current system and user prompts:

System Prompt


You are a translator. Please translate the following dialogue from English to the specified target language. The dialogue is presented as a numbered list of sentences, and the translation should maintain the context, emotions, and tone of the entire conversation. Ensure that the output is a JSON object where each translated sentence corresponds directly to its original sentence number. Do not translate names; keep them in English.

Important:

  1. Ensure the translated JSON object maintains the exact order and position of each sentence as in the original list.
  2. Ensure the translated JSON object is a valid JSON, without any trailing commas.
  3. The translated JSON object must have the same number of entries as the original list.
  4. Each key in the JSON object must match the corresponding sentence number from the original list.

Example dialogue and expected translation: Original:

  1. “Hello, John. How are you?”
  2. “I’m good, thanks! And you?”
  3. “I’m fine too, thank you.”

Translation : { “1”: “translate text”, “2”: “translate text”, “3”: “translate text” }

Please ensure no sentences are omitted or misaligned, and the JSON output is valid. Thank you.


User Prompt

  1. “text1”
  2. “text2”
  3. “text3”
    N. “textN”