Fine-tuning to enhance foreign NLP?

NSY · July 30, 2021, 11:05am

Hi,

Since our product happened to be launched first in non-native English-speaking countries, I’m trying to use fine-tuning to localize our product/service better.
I tried some large chanks of literature, random Wikipedia articles, etc., as large “completions” with no prompts, but the outcome produces random completions without any relation to the later prompts.
It also lost its English abilities, so the fine-tuning replaced the existing abilities and eventually downgraded the model instead of improving it.
Do you have any suggestions for improving the model language abilities without making it lose its mind?
Second, is there a way to reduce the impact of the fine-tuning file in order to merge it with the existing abilities rather than building what seems to be a completely new Engine?

Thanks

daveshapautomator · July 30, 2021, 11:20am

It sounds like you’re trying to reproduce the training process of GPT-3 by giving it arbitrary sources. I would focus on task-specific finetuning especially since you’re using a language that it might not have seen much of during initial training.

Aside from that, I have read several times that language models trained on multiple languages tend to perform better. But perhaps that is not always true. If it is, I expect models in the future to be trained on many languages.

NSY · July 30, 2021, 11:28am

Yes, that’s exactly right. Therefore my initial thought was to actually feed it with language classes, dictionaries, etc., rather than texts, and treat it as I would if it was a person!
Still, I’m not sure how to use “prompt” “completion” in a way that it will actually become part of its assets, so I just fed it with texts.
Even if that can theoretically work, I still have the other issue of maintaining its original ability, which without it will not make any sense.

carla · July 30, 2021, 11:29am

Can you perhaps share your training file? or else explain more around what types of data you’ve included. I’ve used empty prompts to “teach” the model “insurance language” (all still English) but then the training file will also include examples of completions I do expect, such as converting data to questions, which still worked well. In other words the training file contains both:

{“prompt”:"", “completion”:" Your policy does not respond to damage by hail or windstorm."}

as well as examples such as:

{“prompt” : “policy: Homeowners, known: theft, unknown: personal property, question:”, “completion”: " To your knowledge, did the thieves remove any personal belongings, such as money, jewelry or a television set, from your home?"}

In my own personal experimentations, I’ve been playing with the model’s ability to switch between English and deliberately chose Afrikaans, a much less familiar language, and was pleasantly surprised at how well it does, but again, you need to give the model examples of how you expect it to respond.

daveshapautomator · July 30, 2021, 11:29am

If you want original usage, you don’t have to use the finetuned model, just go back to plain vanilla GPT-3 - that’s the beauty! You can use multiple models.

NSY · July 30, 2021, 11:34am

That’s the catch, the plain model doesn’t speak the language well and doesn’t have my specific need, and the trained model doesn’t have the original language abilities. If you teach a knowledgeable person a new language, the quality will be higher than a baby who speaks his first words. Therefore I’m looking to maintain the original abilities in parallel.

carla · July 30, 2021, 11:37am

I just think the key is you aught to feed it examples in your training file of all types (in the broad sense) of outcomes you expect.

NSY · July 30, 2021, 11:39am

Thanks, @carla, that makes perfect sense.
But I’m curious about your prompt and completion. Does your use case is for asking questions, or are you mixing different kinds of combinations?
My experiment also used some prompts and completions, but most of it was just completions with general knowledge, which isn’t related to my use case, to feed it with a large chunk of texts in the language.
The other question is, if I also need to feed it with English otherwise it will lose the language

carla · July 30, 2021, 12:40pm

That’s correct, it’s one model trained to “perform” different functions, I do this by using labels, if the completion is meant to be a question, or a particular type of question, you end your prompt with “empathetic question:”, “claim data extraction:”, “policy analysis:” etc., or in your case, you might want to specify “english question:” or “french answer:” etc.

You don’t need to train it to understand either of the two languages, you’d be amazed, it probably already “knows” the language in question, just be sure to feed the model examples of how you want it to respond, in both languages. Afrikaans is spoken by fewer than 20 million people, but I found GPT-3 already understands it well. With only a few examples, it confuses Afrikaans for other languages like Dutch and Flemish, but with a good set of examples, it consistently starts translating into Afrikaans correctly. I’m no expert at all, but if I where you, I’d feed the model with about 200 question answer pairs in the target language, and about 10 in English, using different “labels” at the end of your prompts, for the model to have a clear point of reference to distinguish between when you expect your responses to be in English and when it should be in the target language.

If you train with prompts like “Question: Who came first the chicken or the egg” Answer:" then be sure to call the completion endpoint with your prompt in that same format, “Question: text text text Answer:” (and sorry if I’m saying something you already know, we all have different levels of knowledge and understanding.)

NSY · July 30, 2021, 12:52pm

Thanks for your insights, I’ll try it

NSY · July 31, 2021, 6:30pm

Quick update: I added few hundred English examples (out of 100k+ foreign examples), and it seemed to make a huge difference in it’s English abilities. That’s probably due to the expectation that an English text is acceptable.

Added extra dialogues in the foreign languages, and it might helped a bit in that aspect as well.
I’m thinking of exporting large chat threads and add it to the datasets with the hope that it will help it rach the level I’m looking for.

carla · July 31, 2021, 6:58pm

That’s awesome news

I’m curious, which languages are you using?

NSY · July 31, 2021, 7:05pm

Most challenging is Hebrew since the language rules are very different that Latin languages, there are no vowels and GPT-3 wasn’t realy trained for it. But I need high quality it Romanian, Arabic and more since one of my use cases is making a text summary which requires high understanding of the text plus high quality of text generation. Maybe that’s too aspirational but I’m trying.

carla · July 31, 2021, 7:11pm

You got to love a challenge

If we can get our projects working “okay”, we still have fine-tuning Davinci to look forward to, and possibly GPT-4 too, meaning we might get our projects working “splendidly” when trained on larger models.

Good luck @NSY, sounds like a massive piece of work and very rewarding too

NSY · July 31, 2021, 7:19pm

Indeed. I’m curious about actually taking language courses with Davinci. Something is telling me that it will require much less data… I even started collecting language courses from friends who teach it
The only question is how to later use it for fine-tuning.

dandrade.jose · January 1, 2022, 4:01pm

Hi @carla

I find your solution surprising, in a positive way. Perhaps it’s the way to go in my use case.

You say the model can perform different functions and that you use labels.

Now, this other example you provide puzzled me. Seems like a multi label prompt. As if you had substituted metadata (answers endpoint) with your own clever form of metadata.

I get the last label, “question:”, could take any of the other formats: “empathetic question:”, “claim data extraction:”, “policy analysis:”

I am working with legal text. Some prompt-completions would be on constitutional rights, some on civil rights.

Based on the first quote above, I could proceed to fine tune like this:

constitutional right question:
civil rights question:

About your second quote, which seems on a deeper level of functioning, amazing, how did you come up with that? How many samples did you provide? Anywhere one could read a book or article on complex prompts?

I have so many questions. Is there a Show your prompts here in the forum?

BTW Happy new year to everyone.

carla · January 3, 2022, 8:35pm

Hi @dandrade.jose,

Happy happy new years to you too, my apologies for a late response.

The company I work for specializes in insurance claim automation. Before we gained access to OpenAI, we used StandfordNLP and traditional key phrase or regular expression based NLU for language understanding and SimpleNLG for language generation. We already had many NLP systems in place, for claim analysis, policy analysis and conversational robotics, but from around 6 months ago, we’ve started a journey of incorporating GPT tech into our existing systems, in order to improve and extend their functionalities.

So, the moment I first started playing with ideas in the OpenAI playground, our company’s known problems were the first things I wanted to solve for. Those are, in very simple terms, knowing what caused the loss from a phrase like “the discoloration and sagging of the ceiling was noticed after the weekend’s rain event.” or understanding which perils are covered in the phrase “we do not cover rain, snow, sand or dust, except rain, snow, sand or dust that enters through an opening caused by wind or hail” or as in the above mentioned post, generate appropriate questions such as “Did the rain enter through an opening cause by wind or hail?”.

By looking at examples in the OpenAI Documentation, as well as playground examples, I did notice the use of labels, even if it was just Q: and A: for questions and answers, and then started playing with multiple different labels within the same playground set/session to see if that will give me good results, and it really just developed from there. Surely if the model responds to Q: and A: it will also respond to A:, B:, C:… etc.

Other than pointing you to the OpenAI documentation and the examples page, I cannot really help with any further reading on this topic. Personally, I’m somebody who benefits from trial and error more than studying theory, so I’ve not really read much beyond what’s available here.

dandrade.jose · January 5, 2022, 3:56pm

You have done more than enough, a lovely reply, which is not late at all, and requires no apologies.

Now, this next piece is gold:

Thank you, very very much.

avifishman63 · July 8, 2023, 7:36pm

I am currently working on improving Hebrew. Any progress there?

Topic		Replies	Views
Graded reader prompt not giving expected results Prompting	14	1617	December 20, 2023
Awful results with fine-tuning (legal docs) API	24	2442	January 3, 2024
Fine Tuning Help defining Prompt/Completion API	17	2357	March 31, 2023
Determining if the user has changed a subject Prompting	11	2080	March 28, 2023
Training gpt-3.5 to autocomplete for a niche domain and a specific writing style Community chatgpt	13	1817	July 25, 2024

Fine-tuning to enhance foreign NLP?

Related topics