Just got access to GPT-4 but it responds like 3.5

In Playground, Mode: Chat. Model: gpt-4
User:
What GPT model are you based on ?
Assistant:
I am based on OpenAI’s GPT-3 (third generation of the Generative Pre-trained Transformer) model, which is an advanced language model for natural language processing tasks.

But if i ask the GPT-4 Model on chat.openai.com the same:
User:
What GPT model are you based on ?
GPT-4:
I am based on the GPT-4 architecture, which is an advanced version of the GPT-3 model developed by OpenAI.

4 Likes

I’m getting the same result as you, I was trying out gpt4 api for the first time by asking:
user: what version of gpt are you:

and this is full completion output:
completion: {
“choices”: [
{
“finish_reason”: “stop”,
“index”: 0,
“message”: {
“content”: “I am GPT-3, the third version of OpenAI’s Generative Pre-trained Transformer.”,
“role”: “assistant”
}
}
],
“created”: 1683231341,
“id”: “chatcmpl-7CZeXqe2wrQuQLxwffb82V0Xa1vR8”,
“model”: “gpt-4-0314”,
“object”: “chat.completion”,
“usage”: {
“completion_tokens”: 20,
“prompt_tokens”: 21,
“total_tokens”: 41
}
}

I guess the answers are both true. And after some more testing, the 3.5 model is more prone to tell you the exact version of it. While GPT-4 through the API will tell you something like more in the line of “I dont know exactly which version of model that i’m running on”

1 Like

Scratch that. When i rephrase the question like this, the response is not what i was expecting.

This is also my first few tries with the gpt-4 model through the API. I got some answers that looked more similar to how gpt-3.5-turbo model used to answer.

Guess we have to wait for som input from the more seasoned gpt-4 api users. :sweat_smile:

ran into the same issue here, both with the playground thing and using python.

1 Like

Who knows even if there’s a big difference here, aside from different token caps, configurations, plugins (when these become available.) and allocated resources. As ChatGPT said v4 is ‘advanced’ version of gpt-3 model. Tho who cares, if it can utilize 32k tokens. I’m still not able to use the v4 with the playground/API. There are better ways to test capabilities of v4 vs v3.5 than to ask it about its version.

yeah you’re right, i guess it’s maybe not the best way to check this stuff :sweat_smile:

if you print out the chat completion, it says “model”: “gpt-4-0314”, so it seems to work yes :slight_smile:

1 Like

I had the same problem, and the quality of the answer did not look like gpt4

so I asked gpt4 on chatgpt+ to help me prompt it to test if it is indeed chatgpt4 here’s what I got…

Me:
lets pretend you're someone who is going to be a prompt tester and you can create a prompt for me to test the model, what would you say exactly? I'll be the mediator in between and pass its response back to you.
gpt4(chatgpt+):

As a prompt tester, I would suggest the following prompt to test the model's capabilities:


Please provide a brief summary of the main advancements and differences between GPT-3 and GPT-4, with specific focus on their architecture, capabilities, and potential applications. Additionally, highlight any notable limitations or concerns related to the use of these language models.


This prompt will test the model's ability to understand the differences between the two versions and provide a detailed response. Once you receive the response, feel free to share it, and I'll be happy to evaluate its quality.

me: passed the prompt to gpt4 api

gpt4(via api):
completion: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "I apologize for the confusion earlier, but as of now, GPT-4 has not been released, and I am an AI language model based on OpenAI's GPT-3. Therefore, I cannot provide a comparison between GPT-3 and GPT-4, as the latter does not yet exist. \n\nHowever, I can provide insights into GPT-3's architecture, capabilities, limitations, and potential applications:\n\n1. Architecture: GPT-3 is a transformer-based model with 175 billion parameters, making it one of the largest language models. This scale allows it to generate more accurate, coherent, and context-aware responses.\n\n2. Capabilities: GPT-3 can perform various natural language processing tasks, including translation, summarization, question-answering, sentiment analysis, and text generation, often with minimal training or fine-tuning.\n\n3. Applications: GPT-3 has been used in applications such as email drafting, code generation, conversational AI, tutoring, content creation, and many others.\n\n4. Limitations and concerns:\n   a. Comprehension depth: GPT-3 may not have deep understanding and can generate plausible-sounding answers that may be incorrect or nonsensical.\n   b. Resource usage: The computational power needed to train and deploy such a large model is resource-intensive, which can limit wider access to the model.\n   c. Lack of context: GPT-3 may not carry the context of a conversation effectively, possibly resulting in inconsistent or disconnected responses.\n   d. Bias and ethics: GPT-3 can implicitly learn and amplify biases present in the data it was trained on, raising concerns about fairness, accountability, and ethical use.\n\nWhen GPT-4 is eventually developed and released, it will likely address some of these limitations and offer new capabilities, but I cannot provide specific details until that occurs.",
        "role": "assistant"
      }
    }
  ],
  "created": 1683237164,
  "id": "chatcmpl-7CbASfZQ0UhcGdo0YDWb4AcDWCfhz",
  "model": "gpt-4-0314",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 381,
    "prompt_tokens": 224,
    "total_tokens": 605
  }
}

Me passed pack to chatgpt+ gpt4:

gpt4(chatgpt+):

It seems that the response generated by the model is inconsistent with its actual architecture. The response you received incorrectly states that it is based on GPT-3, while the API call was made to the GPT-4 model. This demonstrates that even advanced language models like GPT-4 can sometimes produce incorrect or misleading information.

As a result, the provided response does not adequately address the prompt since it does not offer a comparison between GPT-3 and GPT-4. It is important to consider that AI models like GPT-4 are not perfect and can sometimes generate responses that do not align with the intended query or their underlying architecture.

If you are certain that you are using the GPT-4 API, it's advisable to try the prompt again to see if the model provides a more accurate response. However, it's essential to remember that no AI model is perfect, and even GPT-4 can make mistakes or generate misleading information.

Me:
Welp, when I received the welcome email after gotten access to gpt4 api they did say:
As an early customer we’d love to hear about your experience. Feel free to share feedback on our [community forum](pointed me here literally) or reach out directly to our team.

True there may be nothing to this, I will do some more testing today. I had some specific scenarios where gpt-3.5 sometimes gave an unwanted response. This is of course expected from both of them. But when doing some comparing before we got access to the gpt-4 model, there was a big difference in the quality on the answer given by GPT-4 on chat.openai.com when asked the same question, or asked to perform the same task.

I’m pretty new to this, and I know that the gpt-4 model we got access to through the API are 8k.
But I thought that if the current completion (system, question, response) were < 8k tokens, the response was expected to be the same as for the 32k version presented with the same <8k completion?

The training data is cut off at September 2021, so no wonder it states that it is gpt3. If it says gpt4, I guess it is the result of further finetuning.

This is an interesting comment.
GPT-4 is far more powerful than GPT-3.5-turbo (GPT-3-5 is nearly deprecated).

I’ve been running a significant amount of evaluations. There is a caveat. It requires careful prompt designing and some time to learn how to steer the model in the direction we want.

We are the pilots!

I have two theories as to the GPT’s responses to questions about itself:

  1. It is never trained on the specifics of itself at this level (“You are GPT-4, now go forth!”), thus, it has to respond confidently and probably inaccurately from the info it it is trained on which only goes to 2 years ago, or whatever. Thus, it could be GPT-4 and just not aware that it is. The API JSON attribute tends to bear this out. This is a hallucination or two types, not being specifically given a piece of information (which I’m not sure it ever is) so it must guess (not to anthropomorphize it), and lack of recent info generally.

  2. I forget what my second theory was but its probably been spun into theory 1.

  3. Oh, here it is, when GPT-4 gets overloaded, us pleebs get responses from lower versions of the thing, which is kinda explained here there any other places.

But it all changes so fast who can know? BTW - I’m just along for the ride just like all of us. I’m just an old dev keeping the brain cells poppin.

-p