Translation ignores proper names

TuanAnh · December 29, 2023, 9:21am

Hello everyone, I am using the GPT model gpt-3.5-turbo-1106 to translate messages passed in (messages may contain HTML tags) into English. The issue I’m facing is that currently, in the entire English conversation, there are some places where the signature is the proper name of another country. For example, the proper name of Vietnam is always detected as Vietnamese by my language detection component, but my expectation is that it should be detected as English.

Some approaches I’ve tried:

Instructing GPT to ignore proper nouns or proper names (this doesn’t work; the instruction is: ‘Ignore detecting proper nouns and proper names’).
Instructing GPT to synthesize the most frequently appearing language in the message to detect the language. However, this is not feasible because some cases involve two languages (for example, the message contains both English and Spanish, and if the Spanish language has fewer words, I need to detect the language as Spanish, not English).
Does anyone have any ideas for the above situation? This is the current prompt I am providing to GPT for translation.

[
  {
    "content": "You are the language model. Your task is to detect language of the given text and translate it to English (keep html format)\n\
        Specifically, you are required to:\n\
        1. Detected language: Identify the language (ISO 639-1) used in the given text. (Eg. Chinese, Spanish)\n\
        2. Detected language code: Identify the language code (ISO 639-1 codes) used in the given text (Eg. zh, es)\n\
        3. Translated text: Translate ALL given text to English and keep html format.\n\
        4. Reason detection: Give the reason for your Detected language result.\n\
        It's crucial to always provide the output in JSON format",
    "role": "system"
  },
  {
    "content": "Translate ALL following text to ${targetLanguage}: '${message}' and keep html format\
        Your output should be structured in JSON format and must include the following fields:\
        - Detected language (Eg. Chinese, Spanish)\
        - Detected language code (Eg. zh, es)\
        - Translated text (to English, keep html format)\
        - Reason detection\
        Remember to utilize all the provided data in generating your responses.",
    "role": "user"
  }
]

Topic		Replies	Views
The translation of pronunciation is incorrect Prompting chatgpt	2	282	May 7, 2024
Prompt in English, Response in non-English API	6	1032	April 28, 2024
I need your help with prompt Prompting prompt	6	3235	July 9, 2024
Innacurate behavior from gpt-4-0125-preview model Feedback gpt-4 , api	4	434	April 18, 2024
My RAG chatbot does not reply in same language as query, Prompting chatgpt , api , languages , rag	14	1911	November 11, 2024

Translation ignores proper names

Related topics