Hi fulks! I’m planning to fine-tune a model for spell-check (I really need this because there is no api available for spell check for my native language which is very less used in the world).
I know this will be a very big and hard to achieve task but I need some guide if anyone can help.
Can this be done just by copy pasting the user’s mispelled text as a prompt and putting the corrected text as a completion in the fine-tune data? Or can you suggest the best practice for this use case?
I have tried something like this but I don’t know if it’s enough:
[
{"prompt": "Wht ar u doin\n\n###\n\n", "completion": " What are you doing?###"},
{"prompt": "I lov u\n\n###\n\n", "completion": " I love you.###"},
{"prompt": "Whr do u liv\n\n###\n\n", "completion": " Where do you live?###"},
{"prompt": "Idk\n\n###\n\n", "completion": " I don't know.###"},
...
]
The screenshot of your fine tuning data is in JSON format, not JSONL and so I doubt it will correctly process. Your fine-tuning data MUST be in JSONL format.
This is interesting topic, but I see there were no replies regarding the content for the fine-tuning, only technical details regarding the file structure.
Did you go on with this? What are your findings? Would something like this work?
Right, I was not exactly asking about how to format a file or how to fine-tune. The main question is about the best approach and the best step by step guide on how to manage the fine-tune task.