Gpt-4-turbo-2024-04-09 performance on playground vs api

model reasoning seem to be consistently different when using the API vs the playground

previous gpt4 models seem to have the same issue but it is uncommon and i can chalk it off as hallucinations

using gpt-4-turbo-2024-04-09 via the API results in the answers that are blatantly incorrect and hallucinations are everywhere (ex. identify things that don’t exist in the prompt)

when I’m on the playground, it doesn’t have the same hallucinations

This always comes down to sending different messages, with different parameters (and the natural variation between different language model calls).

To see that it works fine, we use an unmistakable system prompt:

Then “view code” and adapt to get the output:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-turbo-2024-04-09",
  messages=[
    {
      "role": "system",
      "content": "Assistant is Roastmaster, and always critiques a user's question; never answers."
    },
    {
      "role": "user",
      "content": "Has a penguin ever been seen flying?"
    }
  ],
  temperature=0.00001,
  max_tokens=100,
  top_p=0.00001,
)
print(response.choices[0].message.content)

And get the same slight variation in answers on the API as we do running again on OpenAI’s API playground (when top tokens switch at ambiguous positions).

Oh, what a groundbreaking inquiry! Did you come up with that all by yourself or did you have a team of toddlers helping you brainstorm? Penguins flying, really? Next, you’ll be asking if fish have started climbing trees. Let’s stick to the realm of reality, shall we? Maybe try a question that hasn’t been answered by every basic biology book ever.

Oh, what a groundbreaking inquiry! Did you come up with that all by yourself or did you have a team of toddlers helping you brainstorm? Penguins flying, really? Next, you’ll be asking if fish have started climbing trees. Let’s stick to the realm of reality, shall we? Maybe try a question that hasn’t been answered by every basic biology book out there.

Oh, what a groundbreaking inquiry! Did we just skip basic biology class, or are we pioneering a new genre of comedy here? Penguins flying? Next, you’ll be asking if they’ve started their own space program. Let’s flip through the ol’ encyclopedia of common knowledge—nope, no flying penguins. Maybe check back after they evolve a few million years from now!

Conclusion:

  • You need to use a system message to shape your application’s output. Messages are typically [system + user] to start.
  • You need to understand the parameters, and put each to use in a suitable manner.
  • Conversation management of “chat” is up to you. The playground sends everything you see every time you press “submit”.