Hello, I’m using ChatGPT for a task of generating website code. The websites I’m working with are in Japanese so I want to use fine-tuning to create a model that remembers all of the following definations:
- Kanji: They are Chinese characters using in Japanese writing systems. They only have full-width form, there is no half-width form for kanji. Example kanji: 漢, 字, 日, 本, 語, etc.
- Hiragana: They are a Japanese alphabet. They only have full-width form, there is no half-width form for hiragana. Example katakana: あ, い, う, え, お, か, き, く, け, こ, etc.
- Katakana: They are another Japanese alphabet. They have both full-width and half-width form. Example full-width katakana: ア, イ, ウ, エ, オ, カ, キ, ク, ケ, コ, etc. Example half-width katakana: ア, イ, ウ, エ, オ, カ, キ, ク, ケ, コ, etc.
- Romaji: They are English alphabet letters. They have both full-width and half-width form. Example full-width romaji: A, B, C, D, a, b, c, d, etc. Example half-width romaji: A, B, C, D, a, b, c, d, etc.
- Digits: Digits have both full-width and half-width form. Example full-width digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Example half-width digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
- Special characters: They have both full-width and half-width form. Example full-width special characters: !, ", #, $, %, &, ', (, ), ~, =, ~, |, , etc. Example half-width special characters: !, ", #, $, %, &, ', (, ), ~, =, ~, |, , etc.
I really appreciate it if someone can suggest me an efficient way to do that.