General reduction in model accuracy after latest model finetunes (0613,1106)


We’re using GPT-3.5-Turbo for translation tasks, and then pass the resulting text in english to GPT-4 for usage.

We’ve noticed over the board reductions in accuracy (leading to awful translations) with the last GPT-3.5-Turbo models, which I attribute to the finetuning dataset used not having any data in Romanian.

Is there any possibility or future plans to extend the current lifespan of old models due to better (over the board) performance on languages that aren’t yet frequently found in ChatGPT usage (I’m assuming the model is now being finetuned using RLHF via Chat users - A/B testing which not having many users using Romanian, is troublesome for it’s training).

For ChatGPT use, we don’t use Romanian ourselves when tasking it, due to better results when using English, although the gap in performance has increased, and not only because the upper gap of the english results has increased (only in a minor way since GPT-4 - not turbo) but because GPT-4 in Romanian has seen worsening of quality of completions.

Any suggestions here? I wanted to finetune a model on the old GPT-3.5-0311 but I can’t.