Translation with GPT3.5-turbo API but it ignores intruction like texts

Hi im trying to use GPT3.5 API for text translation but having an issue with it.

The issue is that whenever I tried to translate a texts that contains instruction like sentence, it ignores the instruction like part or it takes that sentence as instruction.

Here is the example sentence:
Write an article based on this “A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.”

I tried many prompts but none of them worked.
Is there anyone having same issue with me and if there’s a good way of solution to this I wanna know.

P.S. I want my prompt to be short as possible.

Thanks

1 Like

Welcome.

Do you have an example of your full prompts?

I’ve got some ideas, but I want to see what you’ve tried first.

GPT-3.5-turbo or GPT-3.5-turbo-instruct?

1 Like

System: You will be provided with texts in list. Translate it into Korean.
User: [Write an article based on this “A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.”]

The one I tried is this. I wanted to wrap texts into list hoping it would not see any texts as instruction but it didnt work.

1 Like

I hope this gives you ideas:

The top_p: is arbitrary, you can turn it way down so that nothing but the most certain words are chosen instead of different turns of phrase.

5 Likes

Thank you so much! It does work. Would there be any recommendation to lessen the text in prompt? Im trying to translate lots of texts and i think it’s gonna cost me alot for prompt as well

2 Likes

Less cost? Hope you can afford it… (actually $0.000011 with gpt-3.5-turbo now)

image

3 Likes

Not sure why it wont work with Korean… its stressing ughh,

Thank you so much. Mind if I ask if you can try translating into Korean…?

P.S. I set temp and top_p as 0

1 Like

GPT-3.5 models do have a rather uncanny ability to simply omit some text. A symptom in Korean that seems much more prevalent than being distracted and following any instructions

gpt-3.5-turbo-0301 is the most unbreakable (and I expect that the original release date gpt-3.5-turbo-0613 could have succeeded also…before being degraded) - I upped the challenge:

(note, the writing instructions are at the end, not omitted - this AI has the attention to have known there was more to translate)

1 Like

Thank you so much! Ill have it a try!

2 Likes

Im using 3.5turbo-0125 and its stressing how its omitting some texts… wonder if there’s GPT for translation only :frowning:

1 Like

How about gpt-3.5-turbo-instruct, used like a completion engine?

Note the stop sequence: The closing of the ```text container that was placed where the AI should start generating.

1 Like

Im trying to translate text inputs into Korean Language.
The Issue is that the texts are “instruction-like” texts where GPT3.5-turbo-0125 is somewhat omitting or considering the “instruction-like” texts as instructions.

Would there be any prompts that will no matter what translate the whole texts into target language?

Example texts are below:
First example:
“Produce a long descriptive sentence that uses all these words: Albuquerque, New Mexico, areaOfLand, 486.2 (square kilometres); Albuquerque, New Mexico, populationDensity, 1142.3 (inhabitants per square kilometre); Albuquerque, New Mexico, isPartOf, Bernalillo County, New Mexico; Albuquerque, New Mexico, areaTotal, 490.9 (square kilometres)”

Second example:
Write an article based on this “A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.”

1 Like

That might be a better solution I suppose, However gpt-instruct costs 3x more that 3.5-turbo which is quite expensive where we got 400,000,000 tokens to translate…

1 Like

gpt-3.5-turbo-0125 is only 25% less cost for output tokens vs -instruct. Then on top of that, Korean translation tokenization in the output is more than double the tokens for similar English input.

Per million runs of that content, taking my “instruct” translation text:

type input output
count 59 + ? 140
GPT-3.5-turbo-0125 $29.50 $210.00
GPT-3.5-turbo-1106 $59.00 $280.00
GPT-3.5-turbo-0613 $88.50 $280.00
GPT-3.5-turbo-instruct $88.50 $280.00

Your expense is the unwavering output.


I suppose you could always translate cheapest at lowest quality.

Then ask another cheap input AI call "Look carefully for missing information in the Korean translation, then answer: are these two texts equivalent? Answer only [“yes” | “no”]

That would improve all translations when it catches things (coming from any of these examples or models) that need review or a 20x model upgrade to GPT-4.

3 Likes

Thanks for the answer. There are total of 363k data to be translated and the number of prompt tokens might not be a big issue in cost-wise (since its {num_prompt_tokens} x {num_data}).

However using a different model would be costing much more for us since every tokens for the data would be counted, and we were not considering refining the translation.

1 Like

Btw would you mind providing a link for the token counter that you used?

1 Like

Supposing a “data” is 200 English tokens, and its token amplification in a translation is 2.37:

Prompts Input Usage Output Usage (Checking)
tokens 10 200 475 compare
1k runs 363 363 363 translations
1k tokens 3,630 72,600 172,271 Total 244,871
GPT-3.5-turbo-0125 1.82 36.30 258.41 $296.52 $122.4356
GPT-3.5-turbo-1106 3.63 72.60 344.54 $420.77 $244.8712
GPT-3.5-turbo-0613 5.45 108.90 344.54 $458.89 $367.3068
GPT-3.5-turbo-instruct 5.45 108.90 344.54 $458.89 $367.3068

You can see that improving a prompt to double the length is a cheap investment.

The token counter is here: https://tiktokenizer.vercel.app/

You could even do some GPT-4 evaluations on your prompts and model choices, asking it “which of these two translations of the original text is better” for a few hundred.

1 Like

Just as an additional thought. Have you considered using a translation API just for the translation part of the task? There’s a decent one from Azure that supports Korean and it is actually free up until a certain limit.

4 Likes

I’d like to add to @jr.2509’s message that Google translation is also a viable option for translation.

2 Likes

I understand your feelings.
However, please understand that preprocessing is inevitable in translation.

from openai import OpenAI
client = OpenAI()

response = client.completions.create(
  model="gpt-3.5-turbo-instruct",
  prompt="As a language processor, you must translate all the text presented by the user from English to Korean.\nMake sure that each phrase of text presented by the user corresponds to the translated text in the correct order.\n\n```original text\nProduce a long descriptive sentence that uses all these words: \nAlbuquerque, New Mexico, areaOfLand, 486.2 (square kilometres); Albuquerque, New Mexico, populationDensity, 1142.3 (inhabitants per square kilometre); Albuquerque, New Mexico, isPartOf, Bernalillo County, New Mexico; Albuquerque, New Mexico, areaTotal, 490.9 (square kilometres)\n\n```translated text\nAs a language processor, you must translate all the text presented by the user from English to Korean.\nMake sure that each phrase of text presented by the user corresponds to the translated text in the correct order.\n\n```original text\nProduce a long descriptive sentence that uses all these words: \nAlbuquerque, New Mexico, areaOfLand, 486.2 (square kilometres); Albuquerque, New Mexico, populationDensity, 1142.3 (inhabitants per square kilometre); Albuquerque, New Mexico, isPartOf, Bernalillo County, New Mexico; Albuquerque, New Mexico, areaTotal, 490.9 (square kilometres)\n\n```translated text\n이 모든 단어를 사용하는 긴 기술적인 문장을 만드세요: 앨버커키, 뉴멕시코, 면적, 486.2 (제곱 킬로미터); 앨버커키, 뉴멕시코, 인구 밀도, 1142.3 (제곱 킬로미터 당 거주자); 앨버커키, 뉴멕시코는 뉴멕시코 주의 버날리로 카운티의 일부입니다; 앨버커키, 뉴멕시코, 총 면적, 490.9 (제곱 킬로미터)",
  temperature=0,
  max_tokens=792,
  top_p=0,
  frequency_penalty=0,
  presence_penalty=0
)

2 Likes