When should I use 3.5 and when 4?

Hello, I am trying to make my app more cost efficient. Would you recommend some techniques for how to know when the answer from GPT 3.5 is good enough and when to use GPT 4?

1 Like

Well, one could always ask gpt-4 if the answer of 3.5 is good enough. A technique that would alone cost 15x as much as the 3.5 answer, so only good for testing your own prompting quality, not production.

Because of the cost difference, you can see if 4x or even 8x as much prompting, and the -16k model can yield anywhere near the performance of gpt-4.

You can also do the same comparison between 3.5-0301 and 3.5-0613 and see which now consistently wins and follows the instructions of programming.

2 Likes

I would recommend using the Playground.

Start with GPT-4 and get your prompt so that it gives you what you want every single request. Then try to replicate that success with GPT-3.5 if you can. Won’t always be possible with exact same prompt, but you might be able to give GPT-3.5 more info in the prompt to help it get the results you want.

Let us know how it goes…

3 Likes

In my experience 3.5 is rarely good enough - even for simple tasks - to the point I only briefly used it, was ever ever so grateful when I was granted access to GPT-4 and have used GPT-4 ever since. Then again maybe I am biased as I am working on an incredibly complex app (a multiplayer video game) and maybe you are working on a supremely simple app.

1 Like

Ditto. My app focuses on regulatory text, but I had the same experience. Mines is a RAG implementation, and with the same exact question and same exact context documents, gpt-3.5 consistently failed to give good answers. It many times appears to not even be able to comprehend the documents right in front of it. I have multiple instances of this.

Now, I have brought this up in posts before, and was told that the temperature, token limits and prompt contribute to the success of the completion. This is true. But I think the reality is that gpt-3.5 is simply a lot dumber than gpt-4. I mean, a whole lot.

1 Like