Translation with GPT3.5-turbo API but it ignores intruction like texts

js2jang · February 23, 2024, 1:25am

Hi im trying to use GPT3.5 API for text translation but having an issue with it.

The issue is that whenever I tried to translate a texts that contains instruction like sentence, it ignores the instruction like part or it takes that sentence as instruction.

Here is the example sentence:
Write an article based on this “A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.”

I tried many prompts but none of them worked.
Is there anyone having same issue with me and if there’s a good way of solution to this I wanna know.

P.S. I want my prompt to be short as possible.

Thanks

PaulBellow · February 23, 2024, 1:33am

Welcome.

Do you have an example of your full prompts?

I’ve got some ideas, but I want to see what you’ve tried first.

GPT-3.5-turbo or GPT-3.5-turbo-instruct?

js2jang · February 23, 2024, 1:49am

System: You will be provided with texts in list. Translate it into Korean.
User: [Write an article based on this “A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.”]

The one I tried is this. I wanted to wrap texts into list hoping it would not see any texts as instruction but it didnt work.

_j · February 23, 2024, 1:59am

I hope this gives you ideas:

The top_p: is arbitrary, you can turn it way down so that nothing but the most certain words are chosen instead of different turns of phrase.

js2jang · February 23, 2024, 2:04am

Thank you so much! It does work. Would there be any recommendation to lessen the text in prompt? Im trying to translate lots of texts and i think it’s gonna cost me alot for prompt as well

_j · February 23, 2024, 2:20am

Less cost? Hope you can afford it… (actually $0.000011 with gpt-3.5-turbo now)

js2jang · February 23, 2024, 2:25am

Not sure why it wont work with Korean… its stressing ughh,

Thank you so much. Mind if I ask if you can try translating into Korean…?

P.S. I set temp and top_p as 0

_j · February 23, 2024, 2:51am

GPT-3.5 models do have a rather uncanny ability to simply omit some text. A symptom in Korean that seems much more prevalent than being distracted and following any instructions

gpt-3.5-turbo-0301 is the most unbreakable (and I expect that the original release date gpt-3.5-turbo-0613 could have succeeded also…before being degraded) - I upped the challenge:

(note, the writing instructions are at the end, not omitted - this AI has the attention to have known there was more to translate)

js2jang · February 23, 2024, 3:25am

Thank you so much! Ill have it a try!

js2jang · February 23, 2024, 3:45am

Im using 3.5turbo-0125 and its stressing how its omitting some texts… wonder if there’s GPT for translation only

_j · February 23, 2024, 4:08am

How about gpt-3.5-turbo-instruct, used like a completion engine?

Note the stop sequence: The closing of the ```text container that was placed where the AI should start generating.

js2jang · February 23, 2024, 4:11am

Im trying to translate text inputs into Korean Language.
The Issue is that the texts are “instruction-like” texts where GPT3.5-turbo-0125 is somewhat omitting or considering the “instruction-like” texts as instructions.

Would there be any prompts that will no matter what translate the whole texts into target language?

Example texts are below:
First example:
“Produce a long descriptive sentence that uses all these words: Albuquerque, New Mexico, areaOfLand, 486.2 (square kilometres); Albuquerque, New Mexico, populationDensity, 1142.3 (inhabitants per square kilometre); Albuquerque, New Mexico, isPartOf, Bernalillo County, New Mexico; Albuquerque, New Mexico, areaTotal, 490.9 (square kilometres)”

Second example:
Write an article based on this “A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.”

js2jang · February 23, 2024, 4:13am

That might be a better solution I suppose, However gpt-instruct costs 3x more that 3.5-turbo which is quite expensive where we got 400,000,000 tokens to translate…

_j · February 23, 2024, 4:42am

gpt-3.5-turbo-0125 is only 25% less cost for output tokens vs -instruct. Then on top of that, Korean translation tokenization in the output is more than double the tokens for similar English input.

Per million runs of that content, taking my “instruct” translation text:

type	input	output
count	59 + ?	140
GPT-3.5-turbo-0125	$29.50	$210.00
GPT-3.5-turbo-1106	$59.00	$280.00
GPT-3.5-turbo-0613	$88.50	$280.00
GPT-3.5-turbo-instruct	$88.50	$280.00

Your expense is the unwavering output.

I suppose you could always translate cheapest at lowest quality.

Then ask another cheap input AI call "Look carefully for missing information in the Korean translation, then answer: are these two texts equivalent? Answer only [“yes” | “no”]

That would improve all translations when it catches things (coming from any of these examples or models) that need review or a 20x model upgrade to GPT-4.

js2jang · February 23, 2024, 5:13am

Thanks for the answer. There are total of 363k data to be translated and the number of prompt tokens might not be a big issue in cost-wise (since its {num_prompt_tokens} x {num_data}).

However using a different model would be costing much more for us since every tokens for the data would be counted, and we were not considering refining the translation.

js2jang · February 23, 2024, 5:19am

Btw would you mind providing a link for the token counter that you used?

_j · February 23, 2024, 5:44am

Supposing a “data” is 200 English tokens, and its token amplification in a translation is 2.37:

	Prompts	Input Usage	Output Usage		(Checking)
tokens	10	200	475		compare
1k runs	363	363	363		translations
1k tokens	3,630	72,600	172,271	Total	244,871
—	—	—	—	—	—
GPT-3.5-turbo-0125	1.82	36.30	258.41	$296.52	$122.4356
GPT-3.5-turbo-1106	3.63	72.60	344.54	$420.77	$244.8712
GPT-3.5-turbo-0613	5.45	108.90	344.54	$458.89	$367.3068
GPT-3.5-turbo-instruct	5.45	108.90	344.54	$458.89	$367.3068

You can see that improving a prompt to double the length is a cheap investment.

The token counter is here: https://tiktokenizer.vercel.app/

You could even do some GPT-4 evaluations on your prompts and model choices, asking it “which of these two translations of the original text is better” for a few hundred.

jr.2509 · February 23, 2024, 6:46am

Just as an additional thought. Have you considered using a translation API just for the translation part of the task? There’s a decent one from Azure that supports Korean and it is actually free up until a certain limit.

Innovatix · February 23, 2024, 8:05am

I’d like to add to @jr.2509’s message that Google translation is also a viable option for translation.

dignity_for_all · February 23, 2024, 9:24am

I understand your feelings.
However, please understand that preprocessing is inevitable in translation.

from openai import OpenAI
client = OpenAI()

response = client.completions.create(
  model="gpt-3.5-turbo-instruct",
  prompt="As a language processor, you must translate all the text presented by the user from English to Korean.\nMake sure that each phrase of text presented by the user corresponds to the translated text in the correct order.\n\n```original text\nProduce a long descriptive sentence that uses all these words: \nAlbuquerque, New Mexico, areaOfLand, 486.2 (square kilometres); Albuquerque, New Mexico, populationDensity, 1142.3 (inhabitants per square kilometre); Albuquerque, New Mexico, isPartOf, Bernalillo County, New Mexico; Albuquerque, New Mexico, areaTotal, 490.9 (square kilometres)\n\n```translated text\nAs a language processor, you must translate all the text presented by the user from English to Korean.\nMake sure that each phrase of text presented by the user corresponds to the translated text in the correct order.\n\n```original text\nProduce a long descriptive sentence that uses all these words: \nAlbuquerque, New Mexico, areaOfLand, 486.2 (square kilometres); Albuquerque, New Mexico, populationDensity, 1142.3 (inhabitants per square kilometre); Albuquerque, New Mexico, isPartOf, Bernalillo County, New Mexico; Albuquerque, New Mexico, areaTotal, 490.9 (square kilometres)\n\n```translated text\n이 모든 단어를 사용하는 긴 기술적인 문장을 만드세요: 앨버커키, 뉴멕시코, 면적, 486.2 (제곱 킬로미터); 앨버커키, 뉴멕시코, 인구 밀도, 1142.3 (제곱 킬로미터 당 거주자); 앨버커키, 뉴멕시코는 뉴멕시코 주의 버날리로 카운티의 일부입니다; 앨버커키, 뉴멕시코, 총 면적, 490.9 (제곱 킬로미터)",
  temperature=0,
  max_tokens=792,
  top_p=0,
  frequency_penalty=0,
  presence_penalty=0
)

Topic		Replies	Views
Anyone doing successful translations with gpt 3.5? Prompting gpt-35-turbo	16	18048	September 17, 2024
GPT-4 Turbo is lazy and truncates output arbitrarily API	3	1184	March 4, 2024
I need your help with prompt Prompting prompt	6	3235	July 9, 2024
How good is ChatGPT3.5 / GPT4 translations?! API gpt-4 , api , translation , gpt35turbo	23	17311	August 22, 2024
Translating with GPT3 Community	1	2983	February 10, 2023

Translation with GPT3.5-turbo API but it ignores intruction like texts

Related topics