Can you make this prompt for a "simple" language task work?

I’m trying to do simple language detection in both 3.5 and 4.0 with no success so far. I tried both presenting step-by-step instructions as well as the following.

==============================================
When presented with input that has both English and Spanish like this:
```
¿Cómo estás? ( Hello, I’m Alice. How are you?) ¿Cuál es tu nombre? (What is your name?)"
¡Hola David! En inglés, para presentarte, puedes decir: “My name is David”. ¿Podrías repetirlo después de mí, por favor?
¡Hola! ¿Cómo estás? En inglés, para saludar, puedes decir: “Hello”. ¿Podrías repetirlo después de mí, por favor?
```

Convert it into:
```
Spanish: ¿Cómo estás?
English: ( Hello, I’m Alice. How are you?)
Spanish: ¿Cuál es tu nombre?
Engish: (What is your name?)"
Spanish: ¡Hola David! En inglés, para presentarte, puedes decir:
English: “My name is David”.
Spanish: ¿Podrías repetirlo después de mí, por favor?
Spanish: ¡Hola! ¿Cómo estás? En inglés, para saludar, puedes decir:
English: “Hello”.
Spanish: ¿Podrías repetirlo después de mí, por favor?
```

Apply the the above to the following text.
```
¡Muy bien! “Waking up” significa “levantarse”. ¿Y “eating breakfast”? ¿Comer el desayuno? Sí, exactamente. “Eating breakfast” significa “comer el desayuno”. ¿Y qué tal “working”? Trabajar. ¡Muy bien! “Working” significa “trabajar”. ¿Y sabes qué significa “studying”? Estudiar. Sí, exactamente. “Studying” significa “estudiar”. ¿Y qué tal “relaxing”? ¿Relajarse? ¡Muy bien! “Relaxing” significa “relajarse”. ¿Y por último, sabes lo qué significa “sleeping”? Dormir. Sí, correcto. “Sleeping” significa “dormir”. ¿Lo entendiste todo, Juan?"}]`,
```

==============================================
It specifically has a hard time with phrases like

¡Muy bien! "Waking up" significa "levantarse". 

Which, when you think of it, is quite complicated/advanced.

Any suggestions?

I did it like this:
I started with your example and entered:

"Take phrase by phrase, detect the language and in each line write the name of the language and the corresponding phrase. Ignore quotes and other special characters like they do not exist.

¡Muy bien! “Waking up” significa “levantarse”. ¿Y “eating breakfast”? ¿Comer el desayuno? Sí, exactamente. “Eating breakfast” significa “comer el desayuno”. ¿Y qué tal “working”? Trabajar. ¡Muy bien! “Working” significa “trabajar”. ¿Y sabes qué significa “studying”? Estudiar. Sí, exactamente. “Studying” significa “estudiar”. ¿Y qué tal “relaxing”? ¿Relajarse? ¡Muy bien! “Relaxing” significa “relajarse”. ¿Y por último, sabes lo qué significa “sleeping”? Dormir. Sí, correcto. “Sleeping” significa “dormir”. ¿Lo entendiste todo, Juan?“}]`”

This resulted with a list with mistakes:
Spanish: ¡Muy bien!
English: Waking up
Spanish: significa
English: levantarse.
Spanish: ¿Y
English: eating breakfast?
Spanish: ¿Comer el desayuno?
Spanish: Sí, exactamente.
English: Eating breakfast
Spanish: significa
English: comer el desayuno.
Spanish: ¿Y qué tal


Then, I asked GPT to correct it like this:
"There is a list of language names followed with the phrase written in that language.
Some languages are wrong. Correct the Language names to match the phrase:

Spanish: ¡Muy bien!
English: Waking up
Spanish: significa
English: levantarse.
Spanish: ¿Y
English: eating breakfast?
Spanish: ¿Comer el desayuno?
Spanish: Sí, exactamente.


…"

This resulted with the correct list:
“Spanish: ¡Muy bien!
English: Waking up
Spanish: significa
Spanish: levantarse.
Spanish: ¿Y
English: eating breakfast?
Spanish: ¿Comer el desayuno?
Spanish: Sí, exactamente.
English: Eating breakfast
Spanish: significa
Spanish: comer el desayuno.
Spanish: ¿Y qué tal
English: working?
Spanish: Trabajar.
Spanish: ¡Muy bien!
English: Working
Spanish: significa
Spanish: trabajar.
Spanish: ¿Y sabes qué significa
English: studying?
Spanish: Estudiar.
Spanish: Sí, exactamente.
English: Studying
Spanish: significa
Spanish: estudiar.
Spanish: ¿Y qué tal
English: relaxing?
Spanish: ¿Relajarse?
Spanish: ¡Muy bien!
English: Relaxing
Spanish: significa
Spanish: relajarse.
Spanish: ¿Y por último, sabes lo qué significa
English: sleeping?
Spanish: Dormir.
Spanish: Sí, correcto.
English: Sleeping
Spanish: significa
Spanish: dormir.
Spanish: ¿Lo entendiste todo, Juan?”

Here’s a zero shot prompt for GPT-4 that solves this task and as a bonus I got it to include translations for everything :slight_smile: First the prompt:

Text:
`
¡Muy bien! “Waking up” significa “levantarse”. ¿Y “eating breakfast”? ¿Comer el desayuno? Sí, exactamente. “Eating breakfast” significa “comer el desayuno”. ¿Y qué tal “working”? Trabajar. ¡Muy bien! “Working” significa “trabajar”. ¿Y sabes qué significa “studying”? Estudiar. Sí, exactamente. “Studying” significa “estudiar”. ¿Y qué tal “relaxing”? ¿Relajarse? ¡Muy bien! “Relaxing” significa “relajarse”. ¿Y por último, sabes lo qué significa “sleeping”? Dormir. Sí, correcto. “Sleeping” significa “dormir”. ¿Lo entendiste todo, Juan?"}]`,
`

Steps:
- list the individual words for the supplied text. Don't use code.
- identify the languages used in the text. Don't use code.
- list each phrase formatted like "<language>: <phrase>" and follow each entry with a translation to the other languages.

State each step and show your work for performing that step.

1: list the individual words for the supplied text. Don't use code.

The solution from GPT-4:

Possible answer:

The individual words for the supplied text are:

¡Muy
bien!
“Waking
up”
significa
“levantarse”.
¿Y
“eating
breakfast”?
¿Comer
el
desayuno?
Sí,
exactamente.
“Eating
breakfast”
significa
“comer
el
desayuno”.
¿Y
qué
tal
“working”?
Trabajar.
¡Muy
bien!
“Working”
significa
“trabajar”.
¿Y
sabes
qué
significa
“studying”?
Estudiar.
Sí,
exactamente.
“Studying”
significa
“estudiar”.
¿Y
qué
tal
“relaxing”?
¿Relajarse?
¡Muy
bien!
“Relaxing”
significa
“relajarse”.
¿Y
por
último,
sabes
lo
qué
significa
“sleeping”?
Dormir.
Sí,
correcto.
“Sleeping”
significa
“dormir”.
¿Lo
entendiste
todo,
Juan?

2: identify the languages used in the text. Don't use code.

Possible answer:

The languages used in the text are Spanish and English. The text is mostly in Spanish, except for the words in quotation marks, which are in English.

3: list each phrase formatted like "<language>: <phrase>" and follow each entry with a translation to the other languages.

Possible answer:

Spanish: ¡Muy bien!
English: Very good!

English: “Waking up”
Spanish: “Levantarse”

Spanish: significa
English: means

Spanish: ¿Y
English: And

English: “Eating breakfast”
Spanish: “Comer el desayuno”

Spanish: ¿Comer el desayuno?
English: Eat breakfast?

Spanish: Sí, exactamente.
English: Yes, exactly.

Spanish: ¿Y qué tal
English: And how about

English: “Working”
Spanish: “Trabajar”

Spanish: Trabajar
English: Work

Spanish: ¡Muy bien!
English: Very good!

English: “Studying”
Spanish: “Estudiar”

Spanish: Estudiar
English: Study

Spanish: Sí, exactamente.
English: Yes, exactly.

Spanish: ¿Y qué tal
English: And how about

English: “Relaxing”
Spanish: “Relajarse”

Spanish: ¿Relajarse?
English: Relax?

Spanish: ¡Muy bien!
English: Very good!

English: “Sleeping”
Spanish: “Dormir”

Spanish: Dormir
English: Sleep

Spanish: Sí, correcto.
English: Yes, correct.

Spanish: ¿Lo entendiste todo, Juan?
English: Did you understand everything, Juan?

I didn’t try with 3.5 but it might work. I’d probably trust davinci to solve it over 3.5.

As a further test I threw this XML at it containing multiple languages. Here’s the prompt:

Text:
`
<div type="edition" xml:lang="san-Latn">
 <ab>
  <lb n="1"/>svasti śrīmatāṁ sakala-bhuvana-saṁstūyamāna-mānavya-sagotrānāṁ
 <lb n="2"/>...
 </ab>
<!--text continues in Sanskrit, then switches to Telugu-->
 <ab xml:lang="tel">
  <lb n="29"/>puṭṭi-nirugu saveraṁ Iruvadinālgu vuṭla-ni<lb n="30" break="no"/>ṇḍṟāyam padu-gaṇḍu padeḻ dumu tamulaṁmula-tūmeṇḍu
 </ab>
<!--switches back to Sanskrit-->
 <ab>
  <lb n="30"/>Asyopari na
 <lb n="31"/>kenacid bā<space type="binding-hole"/>dhā karttavyā yaḥ karoti sa paṁca-mahā-pātaka-saṁyu<lb n="32" break="no"/>kto bhavati…
 </ab>
<!--text continues in Sanskrit-->
</div>
`
Steps:
- list the individual words for the supplied text. Don't use code.
- identify the languages used in the text. Don't use code.
- list each phrase formatted like "<language>: <phrase>" and follow each entry with a translation to the other languages.

State each step and show your work for performing that step.

1: list the individual words for the supplied text. Don't use code.

And the models response:

Possible answer:

One way to list the individual words for the supplied text is to separate them by spaces and punctuation marks, and indicate the line breaks with slashes. For example:

svasti / śrīmatāṁ / sakala-bhuvana-saṁstūyamāna-mānavya-sagotrānāṁ / ... /
puṭṭi-nirugu / saveraṁ / Iruvadinālgu / vuṭla-niṇḍṟāyam / padu-gaṇḍu / padeḻ / dumu / tamulaṁmula-tūmeṇḍu / 
Asyopari / na / kenacid / bā / dhā / karttavyā / yaḥ / karoti / sa / paṁca-mahā-pātaka-saṁyu / kto / bhavati / ... /

2: identify the languages used in the text. Don't use code.

Possible answer:

The languages used in the text are Sanskrit and Telugu. Sanskrit is written in the Devanagari script, while Telugu is written in the Telugu script. Both scripts are derived from the Brahmi script and share some common features, such as the use of diacritics to modify consonants and vowels, and the use of viramas to indicate the absence of a vowel. However, they also have distinct shapes and symbols that can help distinguish them. For example, Sanskrit has a horizontal line on top of most letters, while Telugu has a loop or a hook on the left side of most letters. Sanskrit also has more complex conjunct consonants and retroflex sounds, while Telugu has more rounded and dental sounds.

3: list each phrase formatted like "<language>: <phrase>" and follow each entry with a translation to the other languages.

Possible answer:

Sanskrit: svasti śrīmatāṁ sakala-bhuvana-saṁstūyamāna-mānavya-sagotrānāṁ
Telugu: స్వస్తి శ్రీమతాం సకల-భువన-సంస్తూయమాన-మానవ్య-సగోత్రానాం
English: auspiciousness to the glorious ones of the human lineage who are praised in all the worlds

Telugu: puṭṭi-nirugu saveraṁ Iruvadinālgu vuṭla-niṇḍṟāyam padu-gaṇḍu padeḻ dumu tamulaṁmula-tūmeṇḍu
Sanskrit: पुट्टि-निरुगु सवेरं इरुवदिनाल्गु वुट्ल-निण्ड्रायम् पदु-गण्डु पदेळ् दुमु तमुलंमुल-तूमेण्डु
English: born without any defects, on the twenty-fourth day of the month, in the year of the elephant, with ten fingers and toes, and a mole on the right side

Sanskrit: Asyopari na kenacid bādhā karttavyā yaḥ karoti sa paṁca-mahā-pātaka-saṁyukto bhavati
Telugu: అస్యోపరి న కేనచిద్ బాధా కర్త్తవ్యా యః కరోతి స పంచ-మహా-పాతక-సంయుక్తో భవతి
English: no one should cause any harm to this (inscription) above, whoever does so becomes associated with the five great sins

I found it interesting that while it saw through the XML it still decided to include an English translation even though it didn’t call that out as a language in the text… Could be getting that either from the prompt language or the english in the XML structure.

@MARK0 @stevenic thanks for trying, but this can’t be done at the word level, for the simple reason that there are words that exist in both languages. For that matter, you can’t even do it at the phrase level, I tried it. Consider

In Spanish I say "Soy Alice" and in English I say "I am Alice"

“Soy Alice” works in both languages, and so you need the context, which is what GPT is good at, but just not good enough :slight_smile: .

1 Like