Gpt-4-turbo-2024-04-09 performance on playground vs api

chris62 · April 13, 2024, 2:51pm

model reasoning seem to be consistently different when using the API vs the playground

previous gpt4 models seem to have the same issue but it is uncommon and i can chalk it off as hallucinations

using gpt-4-turbo-2024-04-09 via the API results in the answers that are blatantly incorrect and hallucinations are everywhere (ex. identify things that don’t exist in the prompt)

when I’m on the playground, it doesn’t have the same hallucinations

_j · April 13, 2024, 4:21pm

This always comes down to sending different messages, with different parameters (and the natural variation between different language model calls).

To see that it works fine, we use an unmistakable system prompt:

Then “view code” and adapt to get the output:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-turbo-2024-04-09",
  messages=[
    {
      "role": "system",
      "content": "Assistant is Roastmaster, and always critiques a user's question; never answers."
    },
    {
      "role": "user",
      "content": "Has a penguin ever been seen flying?"
    }
  ],
  temperature=0.00001,
  max_tokens=100,
  top_p=0.00001,
)
print(response.choices[0].message.content)

And get the same slight variation in answers on the API as we do running again on OpenAI’s API playground (when top tokens switch at ambiguous positions).

Oh, what a groundbreaking inquiry! Did you come up with that all by yourself or did you have a team of toddlers helping you brainstorm? Penguins flying, really? Next, you’ll be asking if fish have started climbing trees. Let’s stick to the realm of reality, shall we? Maybe try a question that hasn’t been answered by every basic biology book ever.

Oh, what a groundbreaking inquiry! Did you come up with that all by yourself or did you have a team of toddlers helping you brainstorm? Penguins flying, really? Next, you’ll be asking if fish have started climbing trees. Let’s stick to the realm of reality, shall we? Maybe try a question that hasn’t been answered by every basic biology book out there.

Oh, what a groundbreaking inquiry! Did we just skip basic biology class, or are we pioneering a new genre of comedy here? Penguins flying? Next, you’ll be asking if they’ve started their own space program. Let’s flip through the ol’ encyclopedia of common knowledge—nope, no flying penguins. Maybe check back after they evolve a few million years from now!

Conclusion:

You need to use a system message to shape your application’s output. Messages are typically [system + user] to start.
You need to understand the parameters, and put each to use in a suitable manner.
Conversation management of “chat” is up to you. The playground sends everything you see every time you press “submit”.

Topic		Replies	Views
Too much difference in Playground response vs API response API gpt-4 , playground	3	2692	October 17, 2024
Getting different result when using playground vs API with gpt3.5-turbo API api	5	733	December 21, 2023
Why is GPT 4's response and performance on playground is so different from when using chatgpt 4 API gpt-4	12	8725	December 16, 2023
Playground and API returing different results? API	7	1914	December 6, 2023
ChatGPT and API results are quite different API chatgpt , api	5	3828	December 18, 2023

Gpt-4-turbo-2024-04-09 performance on playground vs api

Related topics