Hello, I’m using the deprecated curie model for intent classification, it is fine-tuned using a dataset with the following format:
{"prompt": "<question> ->", "completion": " <intent_id>|"}
, where intent_id
is an integer that maps to some intent, with about 35,000 examples like this.
This is working flawlessly with curie, using 8 epochs, and it fits within my budget. So, after the deprecation notice I went ahead and trained the new “davinci-002” model, the recommended replacement, just to find that the performance is awful, it does not recognize almost any of the intents, and looking at the logprobs the first token of the sequence has about 4% probabilities, with the same question the curie model had 99%. My question is, is this expected? should I tweak the hyperparameters? because if having more epochs say 16, is necessary, that would be too much money for me. Is there anything else I can do? Is gpt-turbo my only option?
Edit: I’m using temperature 0 and the rest of the parameters are default, both models are tested equally.