Hi guys!
i wanted to fine tune davinci model to learn coinmarketcap data added after 2021 so my models knows about crypto tokens added to coinmarketcap after that time so when i ask about new crypto tokens detected in documents.
i have generated commands in the following format:
{‘prompt’: ‘is SPOT a cryptocurrency token?’,
‘completion’: “yes, it’s a cryptocurrency token introduced by CoinMarketCap at 2022-12-25”},
{‘prompt’: ‘is Spot a cryptocurrency token?’,
‘completion’: “yes, it’s a cryptocurrency token introduced by CoinMarketCap at 2022-12-25”},
{‘prompt’: ‘what is the adjacent CoinMarketCap name for the new cryptocurrency token SPOT?’,
‘completion’: ‘Spot is the CoinMarketCap name for this cryptocurrency token’},
{‘prompt’: ‘what is SPOT rank on CoinMarketCap?’,
‘completion’: ‘SPOT rank in CoinMarketCap is 5577’}
now the problem is that I have fine-tuned davinci model but i get weird results in a way that it seems not to be fine-tuned but instead learning from scratch and cleared its own knowledge
the models response to the word “hello” is ’ MMMMMMMMMMMMMMMMMM’!
i would really appreciate the help
these are samples of my training data
{‘prompt’: ‘is SPOT a cryptocurrency token?’,
‘completion’: “yes, it’s a cryptocurrency token introduced by CoinMarketCap at 2022-12-25”},
{‘prompt’: ‘is Spot a cryptocurrency token?’,
‘completion’: “yes, it’s a cryptocurrency token introduced by CoinMarketCap at 2022-12-25”},
{‘prompt’: ‘what is the adjacent CoinMarketCap name for the new cryptocurrency token SPOT?’,
‘completion’: ‘Spot is the CoinMarketCap name for this cryptocurrency token’},
{‘prompt’: ‘what is SPOT rank on CoinMarketCap?’,
‘completion’: ‘SPOT rank in CoinMarketCap is 5577’}
these are parameters I set for training
create_args = {
“training_file”: “file-34Ok5rxlHnpCockzJLLx0221”,
“model”: “davinci”,
“n_epochs”: 10,
“batch_size”: 3,
“learning_rate_multiplier”: 0.3
}
and this is how I start training with my args
fine_tune_response = openai.FineTune.create(**create_args)
Using hello as a starting prompt might not be the best check here. If you try sending a normal question or a question regarding your fine-tuning, how does it respond ?
new_prompt = “”“is XRP a crypto token?”“”
response= “”“no, it’s been untracked in CoinMarketCap since 2023-07-08 12:48:00+00:00:00:00+00:00:00:00:00+00:00:00:00:00:00:00+00:00:00:00:00:00+00:00:00:00:00:00:00+00:00:00:00:00:00:00:00:00"”"
that looks like you’re not using a proper stop token.
The model will generate random garbage in the output buffer after it’s reached the stop token.
Your completion also does not use a consistent stop completion token, like END or +++ or <|endoftext|> (which you should then provide as a stop token when running inference.)
Also, 10 epochs seems like way too many. I’d start with 2 or 3.
I’d also try very low learning rates.
Also, I’d put variants of the same question into the training data:
Is WEED a crypto token? → Yes, it was introduced on 2023-04-20<|endoftext|>
When was WEED introduced? → WEED was introduced on 2023-04-20<|endoftext|>
On 2023-04-20, what tokens were introduced? → On 2023-04-20, the tokens WEED, SCAM, and BEST were introduced.
Generating training data for LLMs is hard work, just like curating datasets for any data science project.
Nah, currently no one can mitigate this issue. Although there were papers published on multi-task learning to preserve original abilities but overall the performance will drop.