I have finetuned a curie model for a conditional generation task. Think about it like this: given a small text create an appropriate headline. It’s in production and everybody is super happy with the results. Now they are going to shut down curie and I have to retrain, however I get nowhere near the previous performance. Interstingly there is no real differnced in performance regardless of whether I train babbage or davinci.

Perviously my training went like this: I had an initial data set with a ~1500 texts. For each of those I have provided 2 different options for a headline. So in total there where ~3000 examples. Subsequently we identified texts for which the model didn’t produce good results. Then we continued training with these smaller datasets (100-500 examples). Training only for half the number of epochs and half the learning rate multiplier. So I can’t even use the same training paradigma.

Is there anybody in a similar situation and has some advice regarding how to improve the performance.