When should I use 3.5 and when 4?

dragonek · September 28, 2023, 7:24am

Hello, I am trying to make my app more cost efficient. Would you recommend some techniques for how to know when the answer from GPT 3.5 is good enough and when to use GPT 4?

_j · September 28, 2023, 9:16am

Well, one could always ask gpt-4 if the answer of 3.5 is good enough. A technique that would alone cost 15x as much as the 3.5 answer, so only good for testing your own prompting quality, not production.

Because of the cost difference, you can see if 4x or even 8x as much prompting, and the -16k model can yield anywhere near the performance of gpt-4.

You can also do the same comparison between 3.5-0301 and 3.5-0613 and see which now consistently wins and follows the instructions of programming.

PaulBellow · September 28, 2023, 4:32pm

I would recommend using the Playground.

Start with GPT-4 and get your prompt so that it gives you what you want every single request. Then try to replicate that success with GPT-3.5 if you can. Won’t always be possible with exact same prompt, but you might be able to give GPT-3.5 more info in the prompt to help it get the results you want.

Let us know how it goes…

hyper5ai · September 29, 2023, 10:54am

In my experience 3.5 is rarely good enough - even for simple tasks - to the point I only briefly used it, was ever ever so grateful when I was granted access to GPT-4 and have used GPT-4 ever since. Then again maybe I am biased as I am working on an incredibly complex app (a multiplayer video game) and maybe you are working on a supremely simple app.

SomebodySysop · September 29, 2023, 11:09am

Ditto. My app focuses on regulatory text, but I had the same experience. Mines is a RAG implementation, and with the same exact question and same exact context documents, gpt-3.5 consistently failed to give good answers. It many times appears to not even be able to comprehend the documents right in front of it. I have multiple instances of this.

Now, I have brought this up in posts before, and was told that the temperature, token limits and prompt contribute to the success of the completion. This is true. But I think the reality is that gpt-3.5 is simply a lot dumber than gpt-4. I mean, a whole lot.

Topic		Replies	Views
GPT3.5 with chain of thought vs. GPT4 API gpt4 , gpt35turbo	4	1140	January 24, 2024
Switching from GPT-4 to GPT-3.5: prompting best practices Prompting gpt-4 , gpt-35-turbo	4	6547	December 14, 2023
Just got access to GPT-4 but it responds like 3.5 API gpt-4	13	8128	July 8, 2023
Why is GPT 4's response and performance on playground is so different from when using chatgpt 4 API gpt-4	12	9191	December 16, 2023
Cut down GPT-4 model that sits in between 3.5 and 4? API	2	1305	December 21, 2023

When should I use 3.5 and when 4?

Related topics