Davinci still seems like the gold standard, compared to turbo

For my current use case, Davinci seems more flexible, more reliable, and less likely to return the “As a large language model…” style responses.

gpt-3.5-turbo is amazingly cheap and the latency is better, but after some experimentation, I’m coming to the realization that I may have to ship my new project using text-davinci-003 primarily. Is anyone else coming to the same realization?

I hope OpenAI continues to release new models in the davinci mold, going forward.

4 Likes

It’s not as cheap as it first appears for long chatbot conversations if you consider that there is no state management and you need to resend the same chat messages back to the server with each sequential chat (and you pay for this).

So, roughly speaking (not token-speak), let’s assign each message a cost of 1 unit.

  1. First message, 1 + First Reply, 1 = 2
  2. First Message, 1 + First Reply, 1 + Second Message, 1 + Second Reply, 1 = 4
  3. First Message, 1 + First Reply, 1 + Second Message, 1 + Second Reply, 1 + Third Message, 1 + Third Reply, 1 = 6

Total for sending three messages, 12

I don’t think OpenAI will support this very long and when they solve their growing pains / scaling issues, they will move to session management where it is not necessary to send the same messages back to the server.

This is also why, in my view, OpenAI dropped the costs 90%.

HTH

:slight_smile:

1 Like

Plain old davinci (not the instruct or later text- versions) is still my favorite. But gpt-3.5-turbo is a lot more fun and gives more control, I feel. instruct is an awkward route.

I had the same realization! As soon as it was launched I began replacing Davinci with Turbo on our application only to realize the results were not as good for our use cases. Our app isn’t chat related, so it makes sense that Turbo might not fit that role well, but I was really drawn in by the cheaper pricing.

For example, one of our cases is rephrasing text. Davinci simply outputs the rephrased text, whereas Turbo will respond with something like “I can definitely rephrase that text for you. Here is the rephrased version: blah blah”. That bit of commentary messes up our desired result. Or if we ask Turbo to rephrase text in an irritated tone of voice it sometimes responds with its typical “As a large language model, we cannot…”

Even with specific instructions and prompt tweaks, I’ve been unable to reliably prevent this behavior. Guess that’s to be expected from a chat model though. Would love to see cheaper Davinci prices too now :slight_smile:

3 Likes

For my product, I have realised that gpt turbo has been giving me better performance and accuracy both, so we have recently switched over to that for our testing phase. It is something that will vary product to product for sure.

1 Like

Correct, seeing the same in my limited testing. Davinci is much stronger for general reasoning over large text for example.

I have noticed a big difference between chatGPT on the website and gpt-3.5-turbo. I ask it the same questions and get totally different responses. I have found that the responses on the website are much better than using the gpt-3.5-turbo API. I wonder if they really are the same thing?

2 Likes

I am actually getting great results - when I get the results… It seems that the API reacts with Bad Request when the “role” : “system” has “content” like this:

“You are a writer tasked with creating engaging articles, blogs, and descriptions based on user prompts. Your goal is to use your creativity and writing skills to craft compelling content that captures the user’s attention and provides them with valuable information. Think about how you can use storytelling techniques, descriptive language, and a clear writing style to bring your articles to life and engage your readers. Whether you’re writing a product description or a blog post, your writing should inform, entertain, and leave a lasting impression on your audience. So, how will you use your writing skills to craft content that resonates with your readers and achieves your client’s goals?”

Is it too long? If yes, what is the limit? and if there is a limit why is it not documented?

So, I realize - when API is young then there are some kinks to work out on the server side. However, the company like OpenAI that funded adequately (I think) should have produced better documentation with examples and at least few SDKs for major languages, including C#. Just my 2 cents so far.

The bottom line: I would not substitute text-davinci-3 with the gpt-3.5-turbo, at least not yet.

You haven’t said what is your use case - it would definitely help understanding what is going on and maybe give something back to the conversation.

That is not true if you write your code correctly, @Securigy

I just took your system role content and tested it as follows:

Completion

As you can see @Securigy , and sorry to be so direct and so blunt, but your statement against the gpt-3.5-turbo-0301 and getting bad requests because of a “kink” in the system role is not correct.

I’ve been testing this for the last few hours and it works very well. :+1:

:slight_smile:

Well, assuming that your statement is correct, and I do believe that it works on your side, then I don’t know then how to explain this - the same code is used in both cases - the short content of the “system” role works fine but the long produces Bad request. The same user content is used which is very short (two words).
and BTW, I do not provide any Assistant role…

So here it is again:

The C# code:

            string jsonContent = JsonConvert.SerializeObject(request, new JsonSerializerSettings() { NullValueHandling = NullValueHandling.Ignore });
            var stringContent = new StringContent(jsonContent, UnicodeEncoding.UTF8, "application/json");
            string url = String.Format("{0}/chat/completions", Api.BaseUrl);

            Log.VerboseFormat("Calling SendAsync with URL: {0}, and Engine: {1}", url, request.Model);

            using (HttpRequestMessage req = new HttpRequestMessage(HttpMethod.Post, url))
            {
                req.Content = stringContent;
                req.Headers.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", Api.Auth.ApiKey);
                req.Headers.Add("Accept", "application/json");
                req.Headers.Add("User-Agent", "GSS/OpenAI_GPT3");

                var response = await client.SendAsync(req, HttpCompletionOption.ResponseHeadersRead);

                if (response.IsSuccessStatusCode)

I comprise a short “content” for “system” that looks like that and produces nice result.

{“messages”:[{“role”:“system”,“content”:“You are a professional writer tasked with creating engaging articles, blogs, and descriptions based on user prompts.”},{“role”:“user”,“content”:“Traveling Canada”}],“model”:“gpt-3.5-turbo”,“max_tokens”:4000,“temperature”:0.9,“top_p”:1.0,“presence_penalty”:0.5,“frequency_penalty”:0.5,“n”:1,“stream”:true}

Then I comprise a longer “content” for the “system” role that looks like that, but produces Bad Request response (400).

{“messages”:[{“role”:“system”,“content”:“You are a writer tasked with creating engaging articles, blogs, and descriptions based on user prompts. Your goal is to use your creativity and writing skills to craft compelling content that captures the user’s attention and provides them with valuable information. Think about how you can use storytelling techniques, descriptive language, and a clear writing style to bring your articles to life and engage your readers. Whether you’re writing a product description or a blog post, your writing should inform, entertain, and leave a lasting impression on your audience. So, how will you use your writing skills to craft content that resonates with your readers and achieves your client’s goals?”},{“role”:“user”,“content”:“Traveling Canada”}],“model”:“gpt-3.5-turbo”,“max_tokens”:4000,“temperature”:0.9,“top_p”:1.0,“presence_penalty”:0.5,“frequency_penalty”:0.5,“n”:1,“stream”:true}

Both of them are the ‘jsonContent’ string of the code.
I compared two and both look valid.

That brought me to the question what I am doing wrong here, because ruby_coder says it works on his side (in a different language, I presume, and seems with different parameters…).

After testing I’ve come to the conclusion that it’s apples to oranges. I hope that OpenAI establish the differences a little more. Although it’s literally in the name

The current cGPT model is great for surface-level conversation, and continuing a conversation without prompt injection or topic derailment being an issue. Seriously, with the price reduction I’m just straight sending whatever context I want to hit the max

iGPT (instruct) is great for adding AI to the pipeline. Such as prompt optimization, determining the actual purpose of the user’s message (is it a query? a statement?). These are just my observations and could be wrong.

I’m really looking forward to further advancements; hope to see some great synergies.

1 Like

Just curious what do you refer to by “iGPT (instruct)” - are those new API that I am not aware of? or existing ones that are in Instruct category?

I apologize. I figured that it’d be easier to call it Instruct.

I’m referring to the already existent models such as Davinci

davinci is Instruct? I’ve never seen anything in documentation that calls it Instruct…

Text-Davinci-003 (sometimes referred to as TD3) is in the “instruct family” that branched from the core Davinci…I kinda like iGPT and cGPT naming conventions, tho…

2 Likes

I ran text-gpt-003 for a couple of months now, and indeed noticed that the results are highly dependent on what you ask GPT to do, never knew that it belongs to Instruct family though… Thanks.

1 Like

Replying to myself…
I ran again and cut the length of the “content” in “system” roughly by 40%, so my jsonString looks like this:

{“messages”:[{“role”:“system”,“content”:“You are a writer tasked with creating engaging articles, blogs, and descriptions based on user prompts. Your goal is to use your creativity and writing skills to craft compelling content that captures the user’s attention and provides them with valuable information. Think about how you can use storytelling techniques, descriptive language, and a clear writing style to bring your articles to life and engage your readers.”},{“role”:“user”,“content”:“Traveling Canada”}],“model”:“gpt-3.5-turbo”,“max_tokens”:4000,“temperature”:0.9,“top_p”:1.0,“presence_penalty”:0.5,“frequency_penalty”:0.5,“n”:1,“stream”:true}

and guess what: it worked.
So, no doubt that there is some limit to the size of this “content” - at least with the parameters that I run with… Whether it is some anomaly or undocumented feature I have no idea and hope someone to shed light on it…

1 Like

Hi! I am using content much bigger than this, without any problem. The difference from what I see is that I turned off the streamming. Did you try it?

It does not feel that streaming is the culprit, but what do I know… I will definitely try it, although streaming was a nice looking “feature”.