What youâre seeing seems to be in line with what is commonly observed.
the gpt-4 models (0314, 0613) are the actual gpt 4 models. Theyâre stronger in terms of reasoning and understanding, but more expensive.
strengths:
instruction following
weakness:
more prone to hallucinations
the gpt-4-turbo models (1106, 0125) donât seem to be actual gpt-4 models. Turbo means that theyâre faster, cheaper, but also a little less capable in some regards. It looks like they have a wholly different architecture compared to gpt-4 and I wouldnât consider them an upgrade. Theyâre something different.
strengths:
slightly less prone to hallucinations
strong adherence to markdown
very predictable response format
weaknesses:
more opinionated
worse instruction following
They both have their pros and cons.
Now the system prompt, well, itâs a curious thing.
In all the demos and docs youâll typically find the system prompt tacked to the front of the conversation.
However, the bigger your document gets, the less relevant your system prompt will become, especially if you tacked it onto the beginning of the conversation. The absolute best way to ensure the model follows your instructions (in my experience) is to tack either a system message or a user message to the very bottom of the conversation telling the model what to do or how to behave.
Interesting. I am aware that you can string together multiple âuserâ and âassistantâ messages and pass them, but can you also put in multiple âsystemâ messages? If so, thatâs a game changer.
most of these prompt abstractions are just made up anyways and have no actual programmatic foundation, so thereâs a lot of stuff you can do that nobody ever intentioned.
OpenAI is just putting validators on some stuff but as long as you can get past those you can do whatever.
I tried it out, but observing in Helicone, it looks like it preserves the order of my user and assistant messages, but pushes all system messages to the beginning (however, still preserving their order). So it looks like we canât ârefreshâ the system prompt later in the convo, while still preceding it with the message history.
Maybe not what you are looking for, but it seems like this is possible using runs in the Assistants API. The instructions parameter can overwrite the instructions (per/at run - or response, if you prefer), and the additional_instructions (not sure if that wording is 100% correct), well, add to the pre-existing instructions.
This would imply that the instructions (akin to the system message) apply at the response level, though it is unclear where, exactly, this is placed in the order of messages and, moreover, would require the use of the Assistants API, which might not suit your needs.