Chinese Performance: GPT-3.5 Turbo-0125 is worse than GPT-3.5 Turbo-1106?

Hello everyone,

I wanted to bring up an issue I’ve encountered while using GPT-3.5-turbo models, specifically comparing the performance of GPT-3.5-turbo-0125 and GPT-3.5-turbo-1106 in Chinese language tasks. Upon experimenting with both models using the same prompts, I’ve noticed a significant difference in their performance, particularly in keyword extraction for Chinese content.

In my tests, which involved around 100 text samples, GPT-3.5-turbo-0125 consistently exhibited inferior performance compared to GPT-3.5-turbo-1106 when it comes to extracting keywords from Chinese content. This has raised concerns regarding the reliability and suitability of GPT-3.5-turbo-0125 for tasks involving Chinese language processing.

I’m curious to know if anyone else has encountered similar issues or if there are any insights or suggestions on how to address this discrepancy. Your input would be greatly appreciated.

Looking forward to hearing from you all.

I am noticing the same with information extraction from Dutch text. I am currently sticking to 1106. The 0125 model cuts off certain pieces of information and fails to extract any information at all (like city of residence) in some cases. Its definetly not an improvement over 1106 for information extraction use cases. I hope Open AI will not depreciate 1106 anytime soon.

Let me know if you figure out any other solutions!

In my usecase, the older version 0613 is preform even better. But the model will be depriciated in Jun, so I’m current using 1106 too.

