How can ChatGPT do both text and code at the same time?

Hi there, I’m trying to build an app that can do both Text and Code in the same way ChatGPT does. But text-davinci and code-davinci are 2 separate models. How does ChatGPT know the nature of the user query to fire up the relevant engine? How can I detect the nature of the query in the same way?

2 Likes

The short answer is that you can’t

Davinci is OK at doing some tasks related to code though.

Maybe you could offer the user a “mode” switch to allow them to pick the best engine?

2 Likes

@raymonddavey but how is it that ChatGPT can do it? Any ideas?

Chatgpt uses a later version of Davinci. There is no API for it yet

It does switch models on the fly (as far as I know) so it gets all it’s answers from Davinci. This includes the code ones

We don’t have access to the same engine, but Davinci 003 does a fairly good job anyway

No, according to the documentation on OpenAI’s website, it uses the same text-Davinci-003 but fine-tuned a bit. Text-Davinci-003 and Code-Davinci-002 are part of the GPT3.5 collection. We already have access to the models powering ChatGPT. I’m just not sure how they’re synchronising the different models so seamlessly.

If anyone has ideas, I’d really appreciate any insight.

1 Like

Hi @rami10000

Could you please share this exact reference?

According that what I have read about ChatGPT, the architecture is more in line to what @raymonddavey mentioned in that ChatGPT uses multiple models, which includes both davinci and a codex.

Would be interesting to see the reference where you backup your “No” reply.

Thanks!

:slight_smile:

Chatgpt is based on Davinci 3.5

It a model we don’t have access to in the API

It combines several models into 1. Refer to the link above for detailed breakdown of the models

3.5 is a single model so there is no need for chatgpt to swap models on the fly

1 Like

Excellent reference!

This about sums it up from the reference kindly provided by @raymonddavey:

GPT-3.5 series is a series of models that was trained on a blend of text and code from before Q4 2021. The following models are in the GPT-3.5 series:

  1. code-davinci-002 is a base model, so good for pure code-completion tasks
  2. text-davinci-002 is an InstructGPT model based on code-davinci-002
  3. text-davinci-003 is an improvement on text-davinci-002

:+1:

1 Like

“ ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series [here]”

And here are the models included in GPT3.5:

As you can see it includes text-Davinci-003 and code-Davinci-002 both of which are accessible via API.

So in theory, we should be able to build our own ChatGPT via these existing models if we can fine-tune them in the same way.

Yes, but it is highly unlikely you can “fine-tune them in the same way”, especially since you have no idea how it was fine-tuned by OpenAI and that information is proprietary.

:slight_smile:

Hence my original question in this thread to see if anyone in the community has figured it out yet.

Understood; and I think @raymonddavey answered correctly:

In my view, you would you need a huge amount of computing resources, a small fortune of money and a team of top data scientists to build ChatGPT as you wish, so from an engineering perspective, you are probably “better off” to wait until the"coming soon" ChatGPT API is released if you want to build apps which mimic ChatGPT functionality.

I agree with you @rami10000 that ChatGPT more-than-likely does not “switch” between models of course and they surely integrate the models, but how they perform this integration is anyones guess.

If you find out, please post back and enlighten all of us.

Personally, I’m waiting for the ChatGPT API to be released.

Thanks

:slight_smile:

You can build smaller versions of ChatGPT on your own now, here from the father of AI at Tesla nonetheless! :grin:

3 Likes

@curt.kennedy BEST Link EVER !!

1 Like

Hey @curt.kennedy

Went though the video, and I think it’s a very long (impossible) stretch to say that the “baby-steps, learn to crawl” with GPT video is anywhere close to ChatGPT, which costs millions and millions of USD to develop.

Don’t get me wrong, that’s a nice tutorial; but it is very, very far from even being a “smaller version of ChatGPT”. It just a “first steps” with GPT tutorial which, I am quite sure, @rami10000 will not be able to get the ChatGPT functionality he is looking for from starting GPT from scratch!

It would take many years and many millions of dollars to get close to where OpenAI is with ChatGPT, starting from scratch like that. Otherwise, there would be no need for OpenAI.

:slight_smile:

1 Like

To your initial problem, a common way to achieve this is to:

  1. Use text-davinci-003 with a prompt designed to analyze the semantic of the user query
  2. Select generate a completion with the appropriate model and parameters based on the response.

Prompt example:

Is the query below about generating code? Answer with yes/no.

Query: ${query}

Answer:

If the answer is yes, generate the completion for the query with code-davinci-002, otherwise with text-davinci-003.

Thanks Patrick, I had thought of this solution already. But it means we’re duplicating the number of tokens/cost with every query. So I wondered if there’s a different way ChatGPT is doing it.

Do you think ChatGPT makes a semantic assessment of every query and then fire it to the relevant engine for completions?

but please remove the locks on the amount of code the chatgpt can respond to and the characters it is impossible to debug or consider using it as an ai tool if it has that limitation it is impressive how it can assist in error checking and help write the code but if when one asks for a response and it cuts in half and then cannot follow the context sometimes it does wonderfully and sometimes it does not it is a tool with very big flaws to help us coders.

Gotcha! Indeed, it does mean duplicating tokens.

I’m not sure what ChatGPT does, although I remember Sam Altman said in an interview recently that building it was fairly simple and he was surprised no one had built something like ChatGPT since the release of the API and models. That being said, I have no other insights on this and the consensus from the replies above seems to be that it’s quite complex and costly.

In my application, I use a much shorter prompt for the semantic analysis compared to the answer generation, which means the added cost is negligible.

But if you are trying to optimize for cost, I’d consider performing your semantic assessment with:

  • Cheaper models (ie. Ada/Curie, but requires more prompt engineering for good results)
  • Open AI competitors, or self-hosted open source pre-trained models (plenty on Hugging Face)
  • An NLP API, such as Google Natural Language, Amazon Lex, or Wit.ai (free)

That’s great food for thought, thanks Patrick! I’ll check out the other models and see how well they can semantically identify a query.

1 Like