Exact model used by ChatGPT

I have been developing my own applications using the gpt-3.5-turbo model for a while now. I have noticed, however, it is much slower than the default model on ChatGPT, which claimed to be gpt-3.5. After a bit of investigation, I have been led to believe that the default model is more specifically text-davinici-002, as it is listed in the URL when creating a new chat.

After testing in the playground, I noticed that the speeds were right on - text-davinci-002 seemed to be the same speed as whatever ChatGPT was using.

However, it is just plain stupid. It seems that ChatGPT is harnessing both the speed of davinci and the ‘smartness’ of gpt-3.5-turbo. I am not even sure exactly how they are formatting their requests into a chat.

My initial thought was that they either are getting some super high priority when it comes to the completion speeds, but that doesn’t explain the text-davinci-002 in the URL. Another though was that somehow they are using a super fine-tuned version of text-davinci-002 or another low-leveled model that offers high speeds.

I have done quite a bit of research into this, and just cannot seem to find anything. I am not even sure if anybody else has the same questions. Please let me know if I am missing some crucial info or if you have any ideas.

text-davinci-002 is used to create the title of your chatGPT conversations, the model your interacting with is GPT-3.5-TURBO :laughing:

1 Like

I see, that makes a whole lot more sense. However, that still doesn’t explain how it’s so fast. OpenAI has seemed to be very transparent and fair about how they distribute usage between companies and applications

Aside from the resources allocated to chatGPT vs gpt-3.5-turbo, they must be doing some pretty fancy conversation management behind the scenes, probably keeping a lot of context in token form?

Also, I wonder what parameter settings chat uses? esp. wrt the ‘hidden’ ones like beam search width, top_k, etc

I don’t think GPT is doing beam search at all, I’ve tried to find some good resources that explains the different types decoding strategies:

The temperature of chatGPT is probably around .7, but my best guess is that these, as well as the choice of what part of the conversation should be included in the context window, is part of OpenAI’s secret sauce.

When you’re using the API you are interacting with OpenAI’s servers from your own device, when you’re using chatGPT you’re using a webserver that’s connected directly to openAI’s server’s. There’s always the option of having OpenAI or Microsoft spin up a dedicated GPT instance, if you really need to have your response time reduced, but get ready to pay ~100k $

I actually didn’t know you could host a dedicated instance! Thanks for letting me know

1 Like

I thought they use a different model for plus subscribers, 3.5-Turbo on plus is blazingly fast, much slower through API. I believe this is a different model rather than a connection issue. Connection to server would play a part for sure, but the difference in speed is too large to be that alone.