Is there a way to find out how many examples from training set were actually used during fine tuning?
I received a warning as below, so I think eventually not all examples from training set will be used
{
“object”: “fine_tuning.job.event”,
“id”: “ftevent-UHsZx19gfCcvSSMhTicAQLuE”,
“created_at”: 1693557461,
“level”: “warn”,
“message”: “File file-LsniTnGh4cc1gmWkgxXppjcm contains examples greater than the supported context size for model gpt-3.5-turbo-0613
(4096 tokens)”,
“data”: null,
“type”: “message”
},
Also, how to interpret step information during fine-tuning status? I thought the max step is number_of_examples * number_of_epochs, but in the example below, 1589 is almost 10 times less than number of examples in a trainings set
{
“object”: “fine_tuning.job.event”,
“id”: “…”,
“created_at”: 1693558359,
“level”: “info”,
“message”: “Step 100/1589: training loss=1.37”,
“data”: {
“step”: 100,
“train_loss”: 1.3682537078857422,
“train_mean_token_accuracy”: 0.6614681680997213
},
“type”: “metrics”
},