Fine-tune for a specific language spell check

Rariny · February 16, 2023, 12:32pm

Hi fulks! I’m planning to fine-tune a model for spell-check (I really need this because there is no api available for spell check for my native language which is very less used in the world).
I know this will be a very big and hard to achieve task but I need some guide if anyone can help.
Can this be done just by copy pasting the user’s mispelled text as a prompt and putting the corrected text as a completion in the fine-tune data? Or can you suggest the best practice for this use case?

I have tried something like this but I don’t know if it’s enough:

[
{"prompt": "Wht ar u doin\n\n###\n\n", "completion": " What are you doing?###"},
{"prompt": "I lov u\n\n###\n\n", "completion": " I love you.###"},
{"prompt": "Whr do u liv\n\n###\n\n", "completion": " Where do you live?###"},
{"prompt": "Idk\n\n###\n\n", "completion": " I don't know.###"},
...
]

ruby_coder · February 16, 2023, 12:38pm

The first thing which pops out is that your JSONL data will not work as expected because you have not followed the OpenAI Fine-Tuning “preparing your dataset” formatting guidelines.

Hope this helps.

Rariny · February 16, 2023, 12:48pm

No, this is just a simple json just to show what prompt and completion text I added. But in real, I added text separator like in the following image

ruby_coder · February 16, 2023, 12:54pm

Great. That’s good to see

How many n_epochs did you set your fine-tuning to run?

ruby_coder · February 16, 2023, 1:00pm

Note @Rariny

The screenshot of your fine tuning data is in JSON format, not JSONL and so I doubt it will correctly process. Your fine-tuning data MUST be in JSONL format.

JSON Data (Not JSONL)

FYI

Rariny · February 16, 2023, 1:12pm

I did not set this! How can I set the n_epochs while fine_tuning?

Rariny · February 16, 2023, 1:13pm

Yes I know, but I use the fine_tunes.prepare_data tool in order to convert the json file to jsonl before fine-tuning the model

ruby_coder · February 16, 2023, 1:14pm

See:

Rariny · February 16, 2023, 1:30pm

Thank you for this very useful and interesting tutorial. I will check every thing;

nikola1jankovic · March 1, 2023, 4:41pm

This is interesting topic, but I see there were no replies regarding the content for the fine-tuning, only technical details regarding the file structure.

Did you go on with this? What are your findings? Would something like this work?

Rariny · March 2, 2023, 3:49am

Right, I was not exactly asking about how to format a file or how to fine-tune. The main question is about the best approach and the best step by step guide on how to manage the fine-tune task.

Topic		Replies	Views
Fine tuning a conversational model Documentation chatgpt , fine-tuning , davinci , completions	0	781	May 26, 2023
Fine-tuning problem, multiple completion Prompting	2	1377	December 25, 2023
Fine tuning - how exactly does it work? API	6	1636	December 23, 2023
Fine-tuning a model without using prompt-completion API fine-tuning	1	614	July 4, 2023
Trying to fine tune in python? API	4	924	April 28, 2023

Fine-tune for a specific language spell check

JSON Data (Not JSONL)

Related Topics