We use quite a lot of ada fine-tuned models, for classification, and are pretty happy with the performance. Now, due to deprecation of ada in January I retrained them for gpt-3.5turbo.
Overall on the very same training data the performance of gpt-3.5 is much worse. The accuracy would be 0.7- where for ada was 0.8+.
Did anyone faced the issue?
ada is a base completion model, and therefore you basically had a blank slate.
gpt-3.5-turbo however is massively pretrained to chat before your first tuning instruction, and as you likely know, needs different message format.
The replacement completion model you can just fine-tune with the same dataset, and drop in, is
babbage-002, however in its untrained form it has remarkably high perplexity, so you might need to adjust the epochs or step up to
davinci-002 for particular applications (both are new models released September)
hey! Yes, the format is not the case here. New format is Ok. It seems though that being a very “chat” model 3.5 is not doing good with non-chat stuff.
As for babbage-002 and new davinci, the uncertainty was - how long will they stay? If they go after 1/2 year followings ada , re-train all again? So, we thought 3.5 has definitely longer lifecycle, but seems its chat nature is preventing us from using it.
The blog announcement about deprecation continues to be updated, now that the chat endpoint is not going away, but the current models are the end of the ones that will be made for completions.
Without a replacement coming, they could last a while, until there is a gpt-5 base model or something like that.
(and then yesterday they put a mystery model on completions, billing shows up as gpt-4…)