Davinci-text-003 worse than gpt-3.5-turbo in non-English language?

steluhh · March 12, 2023, 9:32am

Hello,

I am using the playground since a couple of days. I want to generate German short descriptions about German cities.

First attempt: “completion mode” with davinci-text-003.

A prompt with all settings set to default, would look like this:

"Schreibe eine Kurzbeschreibung über die Stadt “Hamburg”. (Write a short description about the city “Hamburg”.)

I was surprised that the answers vary a lot in regard to quality. There is an almost 30% chance that I get a result being unusable. It then responds clearly wrong information, such as “Hamburg is the biggest city in Germany”, which is actually Berlin and Hamburg 2nd largest. Given that Hamburg has a pop of 1.8M and Berlin 3.6M, this is a pretty big mistake.

For another smaller city that I know, it mixes up the cities’ amenities with those of a city nearby.

Asking it the same questions in English, I get better results. Hamburg is not the biggest city anymore, but it still tells me a church in “Kempten” is located in “Kaufbeuren”, which is wrong.

Second attempt: “chat-mode” with gpt-3.5-turbo.

None of such mistakes happen. The responses are totally usable and all information is correct.

I am wondering if gpt-3.5-turbo is another improved model compared to davinci-text-003.

Kind regards,

Stef

linus · March 12, 2023, 10:57am

Hi @steluhh,

When you evaluate the performance of models, there are several things to consider here: First, how good is the synthesis of the answers with respect to your question, i.e. how flexible is the response to questions, grammar, syntax, etc.? On the other hand, how much factual data is available.

Especially with your question the difference becomes evident. You can imagine it as if you ask a not so smart person a question, but he has a lexicon available to answer you and a smart person with a worse lexicon. If you ask a question about the short description, the smarter person will probably give you a worse answer, but if you ask both people to write you a poem or create something new, the smarter person would probably have the better answer.

So to answer your question, it depends on the circumstances and training data. For pure “lexicon queries” I would use Chat GPT in your place. If you want to have code written then rather Davinci.

I hope this clarifies things for you, otherwise please feel free to ask
Linus

Topic		Replies	Views
Which model to use? Text-davincii-003 or gpt-3.5-turbo? API	4	3345	December 17, 2023
Understanding GPT-3 and multi-language API	1	4572	December 25, 2022
Grammar and spelling errors in API results API chatgpt	4	1962	December 24, 2023
Davinci model much worse than gpt-3.5-turbo for data extraction? API	2	745	July 3, 2023
Stable language using API API api	3	492	December 17, 2023

Davinci-text-003 worse than gpt-3.5-turbo in non-English language?

Related topics