Translation quality inconsistent

I use the prompts below to correct some texts, from Romanian to English. Several files. The biggest problem is that if I run the code 3 times, each time it corrects me differently, that is, it translates sometimes well, sometimes less well. Why is this happening?

        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": """You are a professional translator and editor specializing in Romanian to English translations.
                Your task is to improve the given English translation based on the Romanian original text.

                IMPORTANT:
                - Return ONLY the corrected English text
                - Do NOT add any explanatory text, prefixes, or quotes
                - Do NOT add phrases like 'Corrected version:', 'Translation:', etc.
                - Do NOT wrap the text in quotation marks
                - Keep all placeholders (like __EM_0__, __EM_1__ etc.) exactly as they are
                - Return the text exactly as it should appear, preserving all placeholders"""},
                {"role": "user", "content": f"""Romanian original:
                {context_ro}

                Current English text:
                {text_temp}

                Provide only the corrected English text:"""}
            ]
        )
        corectat = response.choices[0].message.content.strip()

Great question! You might notice that even with the same prompt, GPT-4 (and gpt-3.5-turbo that you selected) can produce different outputs on separate runs. This isn’t a bug, but a feature stemming from the core of how these large language models generate text. Unlike a simple calculator that always gives the same answer for the same input, GPT-based AI is designed to be creative and generate diverse, human-like text. Let’s explore why.

Imagine you’re asked to complete the sentence, “The cat sat on the…”. There are many valid completions: “mat,” “chair,” “fence,” and so on. A deterministic model would always choose the single most likely word, perhaps “mat.” However, a large language model like GPT-4 considers a vast range of possibilities, each with its own probability. Instead of always picking the most probable word, it uses a probabilistic approach, sampling from this distribution of possibilities. This is why you get different, yet often valid and interesting, completions each time you run the model. This probabilistic approach, based on sampling from a cumulative probability distribution, is key to the model’s ability to generate creative and nuanced text. Now, let’s dive deeper into the technical details of this sampling mechanism, and explore how settings that you can send to the API influence the text generated.

Why Cumulative Probability Sampling?

At the heart of these models lies the prediction of the next token in a sequence. The model outputs a probability distribution over its entire vocabulary for each possible next token. Instead of always picking the highest probability token (greedy decoding), which can lead to repetitive and predictable text, we employ cumulative probability sampling, also known as nucleus sampling. This method allows us to inject creativity and diversity into the generated text. Think of it like choosing a path through a branching tree of possibilities. Greedy decoding always takes the most obvious path, while cumulative probability sampling explores less-trodden paths, leading to more surprising and interesting results.

This sampling approach is crucial for several reasons:

  • Avoiding Repetition: Greedy decoding often gets stuck in loops, repeating phrases or patterns. Sampling breaks free from these loops, generating more varied and natural-sounding text.

  • Generating Creative Text: For tasks like storytelling or poetry generation, sampling is essential for producing imaginative and engaging content. It allows the model to explore different stylistic choices and come up with novel combinations of words.

  • Reflecting Uncertainty: The model’s predictions aren’t always certain. Sampling acknowledges this uncertainty by sometimes choosing less probable but potentially more interesting tokens. This can lead to more nuanced and human-like text.

Technical Underpinnings of Sampling from Softmax

The model’s output is a softmax distribution – a probability distribution where each token in the vocabulary is assigned a probability between 0 and 1, and the sum of all probabilities equals 1. Cumulative probability sampling works by:

  1. Calculating Cumulative Probabilities: We sort the tokens in descending order of probability and calculate the cumulative probability for each token. The cumulative probability of a token is the sum of its probability and the probabilities of all tokens preceding it in the sorted list.

  2. Generating a Random Number: We generate a random number between 0 and 1.

  3. Selecting the Token: We select the token whose cumulative probability is the first to exceed the random number. This ensures that tokens with higher probabilities are more likely to be selected, but lower probability tokens still have a chance.

Controlling Sampling with top_p (Nucleus Sampling) and Temperature

  • top_p (Nucleus Sampling): This parameter controls the size of the “nucleus” from which tokens are sampled. For example, top_p = 0.9 means that only the tokens comprising the top 90% of the probability mass are considered. This helps to filter out very low probability tokens, which are often nonsensical, while still allowing for some diversity.

  • Temperature: This parameter controls the “sharpness” of the probability distribution. A higher temperature (e.g., 1.0) makes the distribution flatter, increasing the probability of selecting less likely tokens and leading to more diverse, but potentially less coherent, text. A lower temperature (e.g., 0.2) makes the distribution sharper, concentrating probability mass on the most likely tokens, resulting in more predictable but often higher quality text.

Obtaining High-Quality Translations

Let’s consider the task of language translation. While sampling is important for generating fluent and natural-sounding translations, it can also introduce variability in output quality. To maximize translation quality:

  • Experiment with top_p and Temperature: Find the optimal balance between diversity and coherence for your specific translation task. Start with a moderate top_p value (e.g., 0.9) and a temperature around 0.7.

  • Try different AI models: Each will have different qualities in language fluency and the ability to carry out a translation task for you.

You can get very similar results between runs with an API parameter of temperature: 0. The key is to find the right balance between exploration and exploitation, allowing the model to be creative while ensuring that the generated text remains coherent and faithful to the source language.

it is a big mistake to treat the text with “imagination and creativity”. When it comes to legal law, accounting, laws, contracts, a single wrongly written word can be interpreted wrongly, and this is where a lot of trouble starts.

I understand using creative writing, in any field. To compositions, paraphrases, compositions, texts. But for translations, they should be as faithful as possible to the original text, in no case words

1. Legal and legal documents

Contracts: Contract clauses must be translated with extreme accuracy to avoid misunderstandings and possible legal disputes. Any change to the original meaning can have major implications on the rights and obligations of the contracting parties.

Legislation: Laws and regulations must be translated with absolute fidelity to the original text. Even a small change can lead to misinterpretations and incorrect application of the law.

Court orders: Translation of court orders requires careful attention to terminology and legal nuances to ensure that the decision is correctly understood in the new language.

2. Technical documents

User Manual: Instructions for using a piece of equipment or software must be clear and precise, without ambiguity. An inaccurate translation can lead to product misuse and potential damage.

Technical specifications: The technical specifications of a product must be translated with maximum accuracy to avoid misunderstandings between the manufacturer and the customer.

Scientific reports: Scientific reports contain very specific terms and concepts that must be translated precisely so as not to distort the research results.

3. Official documents

Birth/Marriage/Death Certificate: These documents have a special legal value and any change to the information contained in them can have legal consequences.

Passports and visas: The information in these documents must be accurately translated to avoid refusal at the border.

Tax documents: Income statements and other tax documents must be translated with great care to avoid errors in the calculation of taxes and penalties.

Why is literal translation important in these cases?

Accuracy: Legal, technical and official terms have very precise meanings and any change to them may lead to misinterpretations.

Clarity: Legal, technical and official texts must be as clear and concise as possible, without ambiguity.

Consistency: Within a document or set of documents, terminology must be uniform and consistent.

Legal validity: An inaccurate translation of a legal document can have serious legal consequences.

In conclusion, in these cases, translation is not just a simple change of language, but a complex operation that requires specialized knowledge in the respective field and special attention to detail. The main purpose of translation is to convey information in another language while preserving the original meaning and value of the text.

I’m exactly at solving this dilemma right now for one of the projects I have https://www.lawxer.ai

The problem is that you cannot translate this type of input directly into different language. You need to separate the translatable data (for example text, some terms etc.) from what should not be translated (norm names, person names, Id numbers, etc) and from what should be partially translated with some additional instructions or even rag retrieval (legal clauses and their formulations, construction norms and requirements)…

Before jumping into translating these things you need to figure out how it is translated by humans in the field and build a separate app for translations with specific workflows for each type of the document and the logic.

We ended up going gradually by finding first what is safely translatable without any logic then going to what is not translatable. Still thinking about whether we should touch the things that require logic in translation and data retrieval… Personally I will try to avoid this area.

It’s just that you have more expectations from ChatGPT.

I mean, you pay some money per month, you learn Python, you learn how to use API-Key, and at the end you realize that you did all this for nothing. It can’t do what you need well

That is normal especially after all that marketing fuzz they have created around open AI and GPT. I mean the marketers to have more eyeballs.

The tools are great but you need to play with them for quite a while to understand their limitations and do with it.

AI is not a magic box that can solve whatever problem. Just accept it.

@oanaaaa08

There are multiple reasons why you might not be getting consistent responses.

First of all, in the code you shared, the temperature parameter isn’t specified, which defaults to 1. This results in more entropic generated completions, hence the inconsistency. You need to set the temperature to 0.

Next, you need to tell the model the kind of document that is being translated. This will help the model ensure that the translations are relevant.

Then you need to experiment with different models to find the ones that provide translations that suit your needs.

Finally, you need to experiment with different models and see which ones give you the best results.

1 Like

In order for an AI to improve its way of translating from other languages, it could compare a book with the same book translated into other languages. For example, on the Internet Archive, after crawling all books in all languages, the AI ​​code finds a book translated into several languages. In this case, he must compare them, and “learn” how to translate better in the future.

In the context of a more complex translation, such as comparing books translated into multiple languages, there are some additional steps and ideas that could be useful to improve functionality.

Additional functionality useful for the future in the context of books and PDF files

To extend the functionality so that the AI ​​learns better from comparing translations between books in multiple languages, here are some useful ideas to consider:

1. Extracting text from PDFs:

  • PDF text extraction tools: Integrating a library like PyMuPDF, pdfplumber or pdfminer.six could help extract text from PDFs, which is essential for working with books and other translated materials.
  • Text preprocessing: After extraction, the text could be structured into chapters or paragraphs to facilitate comparisons between similar sections in different translations.

2. Alignment of texts between languages:

  • Alignment detection: The code could align text extracted from PDFs in different languages ​​using reference points such as chapter headings or certain key paragraphs. This process can be semi-automated using text alignment algorithms such as fast_align (for bilingual texts) or algorithms based on semantic similarity.
  • Phrase by phrase comparison: In order to have as faithful translations as possible, it would be ideal if the text in the source language is divided into sentences or phrases that are directly compared with their equivalents in the translation.

3. Analysis and improvement of translations:

  • Comparative translation analysis: The code could analyze translations between different versions of the same book in multiple languages, determining how each translator interpreted the text. This could be done by comparing the frequency and consistency of translation terms.
  • Identifying cultural expressions: Some specific cultural expressions could be detected and compared between the source and target languages. For example, to translate a proverb or an idiom, it is useful to identify the equivalent versions in each language.

4. Embedding machine learning:

  • Saving examples of translations: The code could save examples of successful translations in a database, allowing the AI ​​to “learn” from them and make better suggestions in future translations.
  • Using multilingual models: Language models trained on multilingual texts, such as mBERT or XLM-RoBERTa, can help interpret texts from different languages, significantly improving the accuracy of machine translations.

5. Complete automation of the process:

  • Automatic language identification: The code can identify the language in which each paragraph in the book is written, allowing more accurate processing when the same book is found in multiple languages.
  • Automatic comparison of books between languages: The code could be extended to automatically check available translations of a book, extract parallel sections and suggest improvements, thus pursuing continuous “learning” from professionally translated sources.

These ideas can turn the code into a powerful tool for analyzing and improving literary translations, allowing the AI ​​to constantly learn and deliver higher and higher quality translations!

1 Like