Fine Tuning: Replacing Some text in Prompts with Emojis instead of Text

josephUL · April 4, 2023, 1:13pm

I’ve been fine-tuning for the past week now, and after 5 iterations, and increasingly larger data-sets after each fine-tune. I realize the continue fine-tuning wasn’t really adding the an existing fine-tuned dataset with the new datasets, I am looking for other ways to improve on performance and cost.

What I’ve decided to do is use emojis that represent some of the text being replaced in the Prompts (not the completions - those remain in text format only).
Are there any issues with such a method?

When I check via the Tokenizer tool; I see this: �� for the emoji, which converts to 3 tokens (funnily enough), and then a note appears: Note: Your input contained one or more unicode characters that map to multiple tokens. The output visualization may display the bytes in each token in a non-standard way.

Are there any issues with such a method? and am I not benefiting since the Emoji is actually being calculated as 3 tokens?

logankilpatrick · April 4, 2023, 11:24pm

Fine tuning does not sound like the answer to this problem, our fine tuning models are really not that powerful, I would try this with a well crafted prompt!

Topic		Replies	Views
Should prompts be unique for fine-tuning? Prompting	9	1742	December 25, 2023
Fine tuning reducing randomness API	4	694	December 20, 2023
Pseudo fine-tuning chat completions... best practices? Prompting gpt-4	4	1016	December 24, 2023
Fine tuning - how exactly does it work? API	6	2634	December 23, 2023
Trying To Fine-Tune To Overcome Prompt Size Limit API	4	1452	December 17, 2023

Fine Tuning: Replacing Some text in Prompts with Emojis instead of Text

Related topics