Initial test results: Why is gpt-4-1106-preview way worse than the former gpt-3.5-turbo-0613?

kdtbdk · November 9, 2023, 10:36am

I am using gpt api to generate content (informative articles). I have tried to adjust all my prompts to work with the new gpt-4-1106-preview and the content it generates is consistently worse than the former gpt-3.5-turbo-0613 which I used to use before the devday. My question is: Is the performance/quality difference a result of the fact that openai states the new gpt-4-1106-preview model is “not yet suited for production traffic.”? Or is it due to something else?

I just think it is weird. I have been testing the difference on 15 different articles, and gpt 3.5 is way better across all articles.

Topic		Replies	Views
Huge quality drop in gpt-4-turbo Bugs	13	1215	May 30, 2024
Gpt 4 Quality Fluctuation API gpt-4	5	1092	May 1, 2024
Gpt-4-turbo slightly worse than gpt-4-turbo-preview when generating text based on images? Feedback gpt-4-turbo	1	779	April 19, 2024
Is gpt4 turbo preview now slower than gpt 4? API gpt-4 , gpt-4-turbo	3	8590	January 23, 2024
GPT 4 Turbo regression over GPT 4 API gpt-4 , gpt-4-turbo	2	1614	November 24, 2023

Initial test results: Why is gpt-4-1106-preview way worse than the former gpt-3.5-turbo-0613?

Related topics