Why are gpt-4-preview models giving me subpar performance? Please advise

sandeep.talukdar · March 20, 2024, 12:16pm

When I ask a question with the following parameters

System prompt that includes instructions in how to answer (a persona)
User prompt = question + retrieved nodes (docs with data to answer question)

The AI is no listening to the persona in the system prompt in the following models:

gpt-4-turbo-preview
gpt-4-1106-preview
gpt-4-0125-preview
I saw this trend across all preview (latest models with 128K context size) models.

But I see the AI honor my persona very well in the older models for the exact same prompt combination:

gpt-4-0613
gpt-4
gpt-3.5-turbo-16-0613

Is it possible that preview models are not good at honoring the system prompt?

Diet · March 21, 2024, 2:57am

Welcome to the community!

What you’re seeing seems to be in line with what is commonly observed.

the gpt-4 models (0314, 0613) are the actual gpt 4 models. They’re stronger in terms of reasoning and understanding, but more expensive.
- strengths:
  - instruction following
- weakness:
  - more prone to hallucinations
the gpt-4-turbo models (1106, 0125) don’t seem to be actual gpt-4 models. Turbo means that they’re faster, cheaper, but also a little less capable in some regards. It looks like they have a wholly different architecture compared to gpt-4 and I wouldn’t consider them an upgrade. They’re something different.
- strengths:
  - slightly less prone to hallucinations
  - strong adherence to markdown
  - very predictable response format
- weaknesses:
  - more opinionated
  - worse instruction following

They both have their pros and cons.

Now the system prompt, well, it’s a curious thing.

In all the demos and docs you’ll typically find the system prompt tacked to the front of the conversation.

However, the bigger your document gets, the less relevant your system prompt will become, especially if you tacked it onto the beginning of the conversation. The absolute best way to ensure the model follows your instructions (in my experience) is to tack either a system message or a user message to the very bottom of the conversation telling the model what to do or how to behave.

nance2 · March 21, 2024, 3:46am

Interesting. I am aware that you can string together multiple ‘user’ and ‘assistant’ messages and pass them, but can you also put in multiple ‘system’ messages? If so, that’s a game changer.

Diet · March 21, 2024, 4:32am

sure!

most of these prompt abstractions are just made up anyways and have no actual programmatic foundation, so there’s a lot of stuff you can do that nobody ever intentioned.

OpenAI is just putting validators on some stuff but as long as you can get past those you can do whatever.

nance2 · March 22, 2024, 8:29am

I tried it out, but observing in Helicone, it looks like it preserves the order of my user and assistant messages, but pushes all system messages to the beginning (however, still preserving their order). So it looks like we can’t “refresh” the system prompt later in the convo, while still preceding it with the message history.

caydennormanton · March 23, 2024, 7:41am

Maybe not what you are looking for, but it seems like this is possible using runs in the Assistants API. The instructions parameter can overwrite the instructions (per/at run - or response, if you prefer), and the additional_instructions (not sure if that wording is 100% correct), well, add to the pre-existing instructions.

This would imply that the instructions (akin to the system message) apply at the response level, though it is unclear where, exactly, this is placed in the order of messages and, moreover, would require the use of the Assistants API, which might not suit your needs.

Topic		Replies	Views
How much control do you really have over your chatbot? Prompting api , prompt-engineering	5	552	May 2, 2025
What am I doing wrong with my prompt Prompting gpt-35-turbo	3	999	March 19, 2024
What should be included in the System part of the Prompt? API api	14	39454	March 30, 2024
Prompt Usage for Fine-Tuned Models Community gpt-35-turbo , fine-tuning	1	2206	January 4, 2024
New gpt-4-turbo-preview saying it can't help on complex prompt Prompting gpt-4 , api , gpt-4-turbo	7	2597	January 29, 2024

Why are gpt-4-preview models giving me subpar performance? Please advise

Related topics