Fine-tune for a specific language spell check

Hi fulks! I’m planning to fine-tune a model for spell-check (I really need this because there is no api available for spell check for my native language which is very less used in the world).
I know this will be a very big and hard to achieve task but I need some guide if anyone can help.
Can this be done just by copy pasting the user’s mispelled text as a prompt and putting the corrected text as a completion in the fine-tune data? Or can you suggest the best practice for this use case?

I have tried something like this but I don’t know if it’s enough:

[
{"prompt": "Wht ar u doin\n\n###\n\n", "completion": " What are you doing?###"},
{"prompt": "I lov u\n\n###\n\n", "completion": " I love you.###"},
{"prompt": "Whr do u liv\n\n###\n\n", "completion": " Where do you live?###"},
{"prompt": "Idk\n\n###\n\n", "completion": " I don't know.###"},
...
]
  

The first thing which pops out is that your JSONL data will not work as expected because you have not followed the OpenAI Fine-Tuning “preparing your dataset” formatting guidelines.

Hope this helps.

No, this is just a simple json just to show what prompt and completion text I added. But in real, I added text separator like in the following image

Great. That’s good to see

How many n_epochs did you set your fine-tuning to run?

Note @Rariny

The screenshot of your fine tuning data is in JSON format, not JSONL and so I doubt it will correctly process. Your fine-tuning data MUST be in JSONL format.

JSON Data (Not JSONL)

FYI

:slight_smile:

I did not set this! How can I set the n_epochs while fine_tuning?

Yes I know, but I use the fine_tunes.prepare_data tool in order to convert the json file to jsonl before fine-tuning the model

1 Like

See:

2 Likes

Thank you for this very useful and interesting tutorial. I will check every thing;

1 Like

This is interesting topic, but I see there were no replies regarding the content for the fine-tuning, only technical details regarding the file structure.

Did you go on with this? What are your findings? Would something like this work?

1 Like

Right, I was not exactly asking about how to format a file or how to fine-tune. The main question is about the best approach and the best step by step guide on how to manage the fine-tune task.