Thoughts on GPT-3.5-Turbo vs. Claude 3 Haiku

Just some food for thought. Like many others, I have been recently switching to Claude 3 Haiku, finding it excellent and outstanding for its pricing level ($0.25/M input tokens, $1.25/M output compared to GPT-3.5-Turbo’s $0.50/M input, $1.50/M output).

For context, Haiku is way above GPT-3.5-Turbo on many benchmarks, including the LLM arena. Haiku is at “GPT-3.75-Turbo” level with an estimated ELO score of 1181; the latest GPT-3.5-Turbo has ELO 1104 (with earlier models up to 1119) and GPT-4-Turbo sits at ELO 1251-1261 depending on the version, so Haiku is smack in the middle.

On top of that, Haiku has a 200k context window (against the meager 16k of GPT-3.5-Turbo), vision capabilities, and a knowledge cutoff of August 2023 (vs. Sep 2021 of GPT-3.5.-Turbo).

All of this to say: while I care about “big models” and look forward to the next generation of “bigger models” (GPT-4.5, etc.), it seems OpenAI has made a major mistake (among many others…) in essentially completely neglecting the “GPT-3.5-Turbo” class of models. GPT-3.5-Turbo is old, overpriced, and under-performing.

One reason for this neglect may be that OpenAI is an “AGI lab” so it doesn’t care much about improving smaller models, but even this reason seems wrong. A uber-intelligent agent will likely want to dispatch smaller, cheaper agents to do (easy) tasks, so even in the path towards AGI it seems there is inherent research value in figuring out how to maximize intelligence and performance of small models; something that clearly Anthropic managed to do.

Am I missing something, and any reason for sticking to GPT-3.5-Turbo?

2 Likes

This is not a very fair comparison; GPT-3.5 was released in 2022, while Claude 3 Haiku was released in 2024.

I personally do not use GPT-3.5 very much, but a big benefit is being able to fine-tune GPT-3.5, which you cannot do with Claude 3 Haiku.

I agree, and that’s exactly my point.

OpenAI has worked on improving GPT-4 (and then GPT-4-Turbo), but - at least so far - seems to have given up on substantially improving on the smaller/faster/cheaper model family; an opportunity that has been seized by Anthropic. If anything, GPT-3.5-Turbo has gotten worse over time (according to LLM arena at least).

I personally do not use GPT-3.5 very much, but a big benefit is being able to fine-tune GPT-3.5, which you cannot do with Claude 3 Haiku.

Fair point. Maybe a fine-tuned GPT-3.5 would perform better than Haiku at the task, that’d be interesting to know. This is not completely obvious, since Haiku is smarter and can cheaply pack a lot of in-context examples.

I believe that the biggest benefit of GPT-3.5 is being able to fine-tune it and then use the fine-tuned model at a good price to respond exactly how you want it to, such as for support bots. If Anthropic added fine-tuning access, it would completely replace GPT-3.5 for the time being, there speeds are both around the same.

1 Like

gpt-3.5-turbo used to be better, still somewhat seen by the API developer that maintains access to gpt-3.5-turbo-0301 (although this snapshot and especially -0613, never a snapshot, have been impacted by continued changes).

There was full GPT-3 Davinci-based ChatGPT for a few months before turbo. -turbo is speculated via numerous sources to be almost a magnitude size drop in parameters vs GPT-3-davinci, making competition with it easier if you’ve got the data to train.

It’s pretty clear from the quality of the latest model that the focus is on providing “free” for ChatGPT at minimum inference cost with this AI series, and API only gets a byproduct of OpenAI’s new focus on its own chat product.