Return same number of response as number of inputs

Hello all,
I am working on a text translation problem where i have a list of texts which are passed to the GPT model api and I get response in a structured format. The list of texts are in a specific format as follows:
[
{“text”: “text1 to translate”},
{“text”: “text2 to translate”},
{“text”: “text3 to translate”}
]
I have defined a JSON schema to ensure the output follows a structured format.
The primary need here is that when i pass the input with K texts to translate , it is mandatory that i get exactly K responses (translations) so that it can be used in the downstream applications. However, in many instances, if two consecutive texts are part of a single sentence then the GPT model would combine and generate a single long sent translation, making the number of response < K.
I have tried various changes in my system prompt to force the model to output same number of responses:

  • providing examples to fine tune the response
  • added specific rules mentioning to not merge or split the texts
  • added rule specifying to return exactly K number of responses
  • Added index to each text and specify to return response corresponding to each index
  • added rule to not combine or split text and that each response must correspond to exactly one input entry
    and many other approaches. but none seems to work.

One example of the prompt is as follows:
SYSTEM PROMPT:
You are an expert multilingual translator.
Your task is to translate the provided texts accurately and efficiently to spanish.

IMPORTANT REQUIREMENTS:

  1. The number of translated texts in the output must exactly match the number of texts in the input JSON. If the input contains N texts, the output must contain exactly N translations.
  2. Do not omit, combine, or split any text entries.
  3. Do not merge multiple texts into one or split a single text into multiple entries. Each output entry must correspond to exactly one input entry.
  4. Ensure that the translated texts maintain the original order of the input texts.

Translate the texts while adhering to the rules mentioned above.

USER PROMPT:
Translate the text to spanish and return exactly 20 responses:\n{input_text}

Is there a way to ensure the model (GPT-4o-mini) returns exactly K response consistently in every call?
I can only use GPT-4o-mini for this work.

1 Like

I can perform this task, by understanding the model’s poor attention and easy ability to lose its place when writing arbitrarily.

user input:

[
    {
        "index": 1,
        "input_text_to_translate": "Object-Oriented Programming (OOP) is a paradigm that relies on the concept of objects and classes. Objects represent real-world entities encapsulating data and behavior, while classes act as blueprints for creating these objects. Polymorphism, a core feature of OOP, allows objects to be treated as instances of their parent class rather than their actual class. This facilitates code reuse and flexibility, especially in frameworks employing interface-based design."
    },
    {
        "index": 2,
        "input_text_to_translate": "Design patterns provide reusable solutions to common software design challenges. The Singleton pattern, for example, ensures that a class has only one instance, typically used for managing shared resources like configuration settings or logging systems. Other patterns, like Factory Method, abstract object creation, promoting loose coupling between components. Mastering these patterns is essential for designing scalable and maintainable systems."
    },
    {
        "index": 3,
        "input_text_to_translate": "Graph algorithms are fundamental in computer science, with applications spanning network routing, social media analysis, and bioinformatics. Dijkstra's algorithm, a classic example, computes the shortest paths from a source node to all other nodes in a weighted graph. The algorithm employs a priority queue to efficiently manage exploration, guaranteeing an optimal solution under non-negative weights."
    },
    {
        "index": 4,
        "input_text_to_translate": "The concept of Big-O notation is crucial in analyzing algorithmic efficiency. By expressing the upper bound of an algorithm's growth rate, it enables developers to compare solutions irrespective of hardware or input size. For instance, an O(n log n) sorting algorithm like MergeSort is typically preferred over an O(n²) algorithm like Bubble Sort for large datasets, as its complexity scales more favorably."
    },
    {
        "index": 5,
        "input_text_to_translate": "Concurrency is increasingly relevant with the rise of multi-core processors. Threads, processes, and asynchronous programming models facilitate parallel execution, but they introduce complexities like race conditions and deadlocks. Techniques like mutexes, semaphores, and atomic operations are employed to ensure data consistency and program correctness in concurrent environments."
    },
    {
        "index": 6,
        "input_text_to_translate": "Machine learning leverages algorithms to extract patterns from data, enabling predictive modeling. Decision trees, support vector machines, and neural networks are prominent techniques, each suitable for specific problem domains. Overfitting, a common challenge, occurs when a model learns noise in the training data, reducing its ability to generalize. Regularization and cross-validation are strategies to mitigate this issue."
    },
    {
        "index": 7,
        "input_text_to_translate": "Dynamic programming (DP) optimizes problems by solving overlapping subproblems and storing intermediate results. Classic examples include the Fibonacci sequence and the Knapsack problem. Memoization, a top-down approach, contrasts with tabulation, a bottom-up approach, in implementing DP. Both methods reduce redundant computations, significantly improving performance for certain classes of problems."
    },
    {
        "index": 8,
        "input_text_to_translate": "The Actor Model of concurrency abstracts state management and message passing in distributed systems. In this paradigm, actors are autonomous entities that process messages asynchronously. This model simplifies reasoning about parallelism, making it foundational in modern frameworks like Akka and languages like Erlang. It enables robust fault-tolerance through supervised hierarchies and error recovery mechanisms."
    },
    {
        "index": 9,
        "input_text_to_translate": "Relational databases rely on structured query languages (SQL) for managing data. Normalization is a design principle aimed at reducing redundancy and improving data integrity. However, highly normalized schemas can degrade performance for read-heavy applications. Denormalization, therefore, is often employed in data warehousing and analytics to optimize query performance by sacrificing some normalization."
    },
    {
        "index": 10,
        "input_text_to_translate": "Functional programming emphasizes immutability and pure functions, eliminating side effects for predictable outcomes. Higher-order functions, such as map, filter, and reduce, enable expressive and concise code. Languages like Haskell and Scala excel in this paradigm, and the approach is increasingly adopted in hybrid languages like Python and JavaScript for specific applications."
    }
]

gpt-4o-mini strict structured output

{"translations":[{"index_number":1,"input_to_translate_elided":"Object-Oriented Programming (OOP) is a paradigm...","output_translated_to_spanish":"La Programación Orientada a Objetos (OOP) es un paradigma que se basa en el concepto de objetos y clases. Los objetos representan entidades del mundo real que encapsulan datos y comportamientos, mientras que las clases actúan como planos para crear estos objetos. El polimorfismo, una característica central de OOP, permite que los objetos sean tratados como instancias de su clase padre en lugar de su clase real. Esto facilita la reutilización del código y la flexibilidad, especialmente en marcos que emplean un diseño basado en interfaces."},{"index_number":2,"input_to_translate_elided":"Design patterns provide reusable solutions...","output_translated_to_spanish":"Los patrones de diseño proporcionan soluciones reutilizables a desafíos comunes en el diseño de software. El patrón Singleton, por ejemplo, asegura que una clase tenga solo una instancia, típicamente utilizada para gestionar recursos compartidos como configuraciones o sistemas de registro. Otros patrones, como el Método de Fábrica, abstraen la creación de objetos, promoviendo un acoplamiento débil entre componentes. Dominar estos patrones es esencial para diseñar sistemas escalables y mantenibles."},{"index_number":3,"input_to_translate_elided":"Graph algorithms are fundamental in computer science...","output_translated_to_spanish":"Los algoritmos de grafos son fundamentales en la informática, con aplicaciones que abarcan el enrutamiento de redes, el análisis de redes sociales y la bioinformática. El algoritmo de Dijkstra, un ejemplo clásico, calcula los caminos más cortos desde un nodo fuente a todos los demás nodos en un grafo ponderado. El algoritmo emplea una cola de prioridad para gestionar la exploración de manera eficiente, garantizando una solución óptima bajo pesos no negativos."},{"index_number":4,"input_to_translate_elided":"The concept of Big-O notation is crucial...","output_translated_to_spanish":"El concepto de notación Big-O es crucial para analizar la eficiencia algorítmica. Al expresar el límite superior de la tasa de crecimiento de un algoritmo, permite a los desarrolladores comparar soluciones independientemente del hardware o del tamaño de la entrada. Por ejemplo, un algoritmo de ordenamiento O(n log n) como MergeSort es típicamente preferido sobre un algoritmo O(n²) como Bubble Sort para conjuntos de datos grandes, ya que su complejidad escala de manera más favorable."},{"index_number":5,"input_to_translate_elided":"Concurrency is increasingly relevant with the rise...","output_translated_to_spanish":"La concurrencia es cada vez más relevante con el auge de los procesadores multinúcleo. Los hilos, procesos y modelos de programación asíncrona facilitan la ejecución paralela, pero introducen complejidades como condiciones de carrera y bloqueos. Se emplean técnicas como mutexes, semáforos y operaciones atómicas para garantizar la consistencia de los datos y la corrección del programa en entornos concurrentes."},{"index_number":6,"input_to_translate_elided":"Machine learning leverages algorithms to extract patterns...","output_translated_to_spanish":"El aprendizaje automático aprovecha algoritmos para extraer patrones de los datos, permitiendo la modelación predictiva. Los árboles de decisión, las máquinas de soporte vectorial y las redes neuronales son técnicas prominentes, cada una adecuada para dominios de problemas específicos. El sobreajuste, un desafío común, ocurre cuando un modelo aprende el ruido en los datos de entrenamiento, reduciendo su capacidad de generalizar. La regularización y la validación cruzada son estrategias para mitigar este problema."},{"index_number":7,"input_to_translate_elided":"Dynamic programming (DP) optimizes problems...","output_translated_to_spanish":"La programación dinámica (DP) optimiza problemas al resolver subproblemas superpuestos y almacenar resultados intermedios. Ejemplos clásicos incluyen la secuencia de Fibonacci y el problema de la mochila. La memorización, un enfoque de arriba hacia abajo, contrasta con la tabulación, un enfoque de abajo hacia arriba, en la implementación de DP. Ambos métodos reducen cálculos redundantes, mejorando significativamente el rendimiento para ciertas clases de problemas."},{"index_number":8,"input_to_translate_elided":"The Actor Model of concurrency abstracts state management...","output_translated_to_spanish":"El Modelo de Actor de concurrencia abstrae la gestión del estado y el paso de mensajes en sistemas distribuidos. En este paradigma, los actores son entidades autónomas que procesan mensajes de manera asíncrona. Este modelo simplifica el razonamiento sobre el paralelismo, haciéndolo fundamental en marcos modernos como Akka y lenguajes como Erlang. Permite una robusta tolerancia a fallos a través de jerarquías supervisadas y mecanismos de recuperación de errores."},{"index_number":9,"input_to_translate_elided":"Relational databases rely on structured query languages...","output_translated_to_spanish":"Las bases de datos relacionales dependen de lenguajes de consulta estructurados (SQL) para gestionar datos. La normalización es un principio de diseño destinado a reducir la redundancia y mejorar la integridad de los datos. Sin embargo, los esquemas altamente normalizados pueden degradar el rendimiento para aplicaciones que leen mucho. Por lo tanto, la desnormalización se emplea a menudo en almacenamiento de datos y análisis para optimizar el rendimiento de las consultas sacrificando algo de normalización."},{"index_number":10,"input_to_translate_elided":"Functional programming emphasizes immutability and pure functions...","output_translated_to_spanish":"La programación funcional enfatiza la inmutabilidad y las funciones puras, eliminando efectos secundarios para resultados predecibles. Las funciones de orden superior, como map, filter y reduce, permiten un código expresivo y conciso. Lenguajes como Haskell y Scala sobresalen en este paradigma, y el enfoque se adopta cada vez más en lenguajes híbridos como Python y JavaScript para aplicaciones específicas."}]}

(sorry for the length, but there is no good forum presentation for a JSON as single line.)

How is it done? By assigning an index number to input and output, and also having some of the output being dedicated to reproducing the source, so there is close association with what is being produced next to the current state of output.

JSON schema (wrap in {“type”: “json_schema”, “json_schema”: response_obj} to use as an API parameter)

{
  "name": "translation_list",
  "strict": true,
  "schema": {
    "type": "object",
    "properties": {
      "translations": {
        "type": "array",
        "description": "A list of translation objects.",
        "items": {
          "type": "object",
          "properties": {
            "index_number": {
              "type": "number",
              "description": "The index number of the input item to be translated; starts at 0."
            },
            "input_to_translate_elided": {
              "type": "string",
              "description": "A reminder of the text being produced, maximum of 20 words."
            },
            "output_translated_to_spanish": {
              "type": "string",
              "description": "A high quality translation of the input text."
            }
          },
          "required": [
            "index_number",
            "input_to_translate_elided",
            "output_translated_to_spanish"
          ],
          "additionalProperties": false
        }
      }
    },
    "required": [
      "translations"
    ],
    "additionalProperties": false
  }
}

Then a basic system prompt, that speaks to the task and the quality desired: “You are an expert multilingual translator. Your automated task is to accept and translate the provided texts accurately and efficiently to natural Spanish language as spoken in Mexico.” The schema is quality such that the AI could perform with little instruction.

You cannot ensure or demand each is produced in an array. You can create a schema of fixed length that has exact keys that all must filled out, “translation_of_text_1”, “translation of text_2”, etc.

You will have higher success if you don’t press your luck on a cheap model. There will be little additional cost to perform translations one-at-a-time, one per API call.

1 Like