I’ve been training my own fine-tuned AI classifier and have about 1.1k~ manually classified rows of data for 9 categories. The completions are all numbered from 0 - 8 depending on the prompt and what category it falls into.
The problem is, sometimes my completion response is !. What does that mean? I’m not sure why it would classify something as that when none of my completions are like that.
I’m really unsure when it comes to debugging, is there anything I should be looking for that I’m missing? I’ve read the documentation. I am using the curie engine if it helps.
A screenshot is attached. I believe this is massively affecting my accuracy.
What are the API call parameters? (Temp, max tokens, stop sequence etc.)
What is a format of training data? Do all prompts follow the same pattern?
Do all prompts stop with the same set of characters? (<|endoftext|> for exemple)
Do some of prompts have empty spaces in training or production?
Do all completions start with an empty space in training data?
Do all prompts use utf8?
Checking those out usually gets rid of this issue. Please let me know if not. I’m interested as well