I posted this as a reply to another post that explained how the API converts between system prompt and developer message, and vice versa.
I’m reposting it as its own topic because I drifted into feedback territory, and “API - Feedback” is a better place for it.
It is very good that you do such conversions behind the scenes, as it’s a great quality of life improvement.
However, the documentation still lacks explanation of what is germane to each family of models. Well, we know what is germane to non-reasoning models, of course.
I tried to think about this conceptually over the past few days.
You have:
- non-reasoning models
- reasoning models
- a chat-focused API
- an assistant-focused API
This is rapidly leading to 4 types of parameter arrangements and might become error-prone for the unwary.
Should reasoning models have an API of their own? Should you think of “chat,” “assist,” and “reason” as modes of operation that should fall under a single API?
Unrelated, but also important…
I am very glad to see progress on the conceptualization of prompt hierarchy and the prioritization of prompts at each level. Documenting this is very important, I think.
The temptation would be to allow developers to assign priority factors to levels, but I think it would just make things more unpredictable. Hence, I think you should control and define this, as you are right now.
In fact, the way the hierarchy of prompts works reminds me of the OSI model (OSI model - Wikipedia). I think that approach could be formalized further to establish degrees of model interaction.
Lastly, there is the text of prompts themselves. After using this for a few years now, it has become clear there is often a need for a markup hierarchy inside prompts themselves. This will become even more acute for reasoning model prompts.
An example of what I mean: If you structure your prompt using Markdown at the top level, then Markdown deeper inside the structure of the prompt causes problems.
I have arrived at a markup hierarchy that uses all of: XML, Markdown, Plain Text, JSON.
If the encapsulation of prompt text “makes sense,” I found that most models interpret the input as I do.
That being said, the argument will be that any truly intelligent model will be uncannily good at parsing prompt text regardless of its structure. Until then, though, it will make a huge difference.
As a follow-up thought, behind-the-scenes conversion between system prompt and developer message kind of encourages people to keep doing what they have always been doing.
Is that a quality-of-life feature you are ready to maintain forever?