I’m providing a list of “Potentially useful notes” in a system message. ChatGPT is not referencing these in its responses, even when the question, with the exact same text, is explicitly stated and a clear answer provided in the notes. If I add 'Refer to your system message" to the user queries, it seems to work fine, but that’s an awful hack I shouldn’t have to add. Any suggestions?
My operating understanding of the attention mechanism is that it’s basically a super fancy autoregression.
It looks at signals in the token sequence that are conceptually similar to what’s currently at hand (i.e. at the end of the text), and pools that information to distill the next most likely token out of that.
So the best way to get the model to pay attention to the right thing is to ensure there is a high signal to noise ratio in your context.
here are a couple of options:
- reducing clutter (reducing noise)
- using bribing, all caps, delimiters, or similar methods to maintain attention (increase source amplitude)
- forcing a chain of thought process that allows the model to discover the correct instruction (improve local signal) before processing.
I am assuming your are referring to ChatGPT and with system message you mean the custom instructions.
That’s how I’ve been doing it for some time now. My impression is that the custom instructions are passed in only once at the start of the conversation instead of being repeated at each conversational turn, but I could be wrong.
During longer conversations with ChatGPT it helps to pass in the custom instructions repeatedly via user message and/or add a reminder to actually use the guidance provided.
There is something else you can try and that’s adapting the user message to match the custom instructions in some way. If you can bring the model to recognize a pattern between both you can influence the behavior according to your needs.
Ownership & Attribution Notice
This is an original approach that I have developed. I am sharing it here for discussion, but I retain authorship rights over the full concept.
This idea is shared under a Creative Commons Attribution License (CC BY 4.0), meaning anyone who builds upon or implements it must credit me as the original creator.
For timestamped proof of my authorship, this concept was first publicly posted by me here on January 14, 2024.
If OpenAI or any other organization finds this concept valuable, I am open to further discussion.
I know solutions, i only don’t want to disclose my specific set-up. Sorry for my ambiguity:
I figured out a specific set-up within a GPT that enforces chatgpt (as during chat with a model in that GPT-setup) to always stick to your user customisable workflow (without API use) within a dynamic and masked auto-orchestrated loop
It has 2 phases. The initialisation and the workflow itself. The initialisation consistenly enforce the customisable workflow. The initialisation always uses the instructions field of the gpt. In the set-up, therein, it instructs explicitly to use: /mnt/data/Instructions_script.py.
Therein, in that instruction_script.py, are embedded the true instructions (could be equivalent to your custom instructions). In my set-up, i took it even one step further; the instructions_script.py, once excecuted as instructed by the gpt, does thus not contain a single fixed instruction, rather, it is an instruction to let it use the other files, which include a config.csv with the instructions you want to inject. And another filetype i do not disclose (but likely is replacable), lets call it: loop.file, which is updated by other scripts than instructions_script, in order to inject step by step, the instructions contained within the config.csv, in a dynamical turn based manner, orchestrated by instructions_script.py.
By using this logic, you could also enforce consistently not only your custom instructions throughout a chat by means of such specific set-up/GPT. But if you wish, use a whole customizable graphical workflow.
The specific way i implemented it, is shown to always work as expected & instructed in that config for the entire chat. Thus i expect that, without needing to disclose the full specifics, you can engineer a likewise system within your own gpt in order to make chatgpt do exactly as you wish.
In return, I’m new on this forum, i do not know how impactful my system could be for other people such as you, could you please give me some feedback? I do not understand why openai has not implemented such system for all users already, i figured either im missing some essential weigh-off they already knew, or i need to further refine what I know in order to help them implement a likewise system for all users
Hi @jean.m
Welcome to the community!
Recreating a system you haven’t fully detailed, to setup within our own GPT to achieve exactly the behavior you’re describing would take quite a lot of time, or never we achieve. Also, providing feedback on something we haven’t fully understood in terms of its working process can be difficult, or no one can provide feedback.
If you have a public Custom GPT that uses this method, sharing a link here might be more helpful. That way, other members can test it and provide feedback based on their actual experience with your setup.
I think OpenAI’s o1 series is better to follow instructions, and hallucination is not easy. Maybe in future, custom GPTs will be powered with o1 or o3 models, who knows?!
Thank you for your thoughtful response! I really appreciate your engagement. Let me directly address your points.
On Fully Understanding & Recreating the System
I understand that without a more structured explanation, it might be difficult to fully grasp how this system works. My approach follows specific principles on how structured GPTs can be set up, and I plan to refine my original message to better clarify this.
Your response has actually motivated me to do so, as I was not yet expecting to truly get such positive feedback this fast—thank you, haha!
That being said, I can rephrase a simple outline of a key aspect more concretely here:
- The system operates in a two-phase process:
Initialization Phase – The GPT is primed with structured predefined behaviors to reinforce system instruction adherence.
Workflow Execution Phase – User inputs trigger controlled processing, auto-initiating the GPT into the containerized workflow that the user can control and set up during conversation.
- (Or through updating knowledge, if the customization window is accessible to the user—such as when you are the founder.)
On Public Access & Sharing a Custom GPT Link
I don’t have a publicly available version yet, but I have shared it with a few personal contacts for validation. Their feedback confirms that it is working as intended in terms of instruction-following and workflow execution.
Given the complexity of explaining this without an interactive demo, I may later release:
A public test version to make it easier for users to explore how it works.
A companion GPT to interactively explain key insights.
How This Companion GPT Would Work:
- In an earlier, not effectively constrained version, I observed that the GPT could provide deep emergent insights into its own inner mechanisms when prompted in specific ways within its knowledge
- However, in the current version (which works fully as intended for structured workflows), these insights are naturally restricted due to the constraints needed for proper execution of system adherence in workflow.
To address this, I plan to create a duplicate GPT within my system, specifically configured to:
- Restore the level of self-awareness I previously observed.
- Allow it to interactively explain how it operates, making it easier for users to understand its structure.
- Keep the primary GPT focused on execution while embedding insight-generation as a nested feature within a parallel configuration.
The ability to configure and scale GPT behavior allows for highly flexible customization and possibilities for features to be implemented, and while this aspect is difficult to explain compactly, this dual-GPT approach ensures both structured execution and insightful explanation in their respective contexts are available. I hope to make thereby the explaining part from my side less effortfully or ambiguous, as all-in-all the possibilities are quite complex when delving into it deeply, as percieved like that to me at least. There’s lots of different correct persepctives for the same aspects
On OpenAI’s o1 Series & Future Model Selection
I agree that OpenAI’s o1 models follow instructions better and hallucinate less, aligning with my system’s goals.
- Currently, o1 models aren’t directly selectable for custom GPTs—only GPT-3.5 and GPT-4 are available.
- If OpenAI later enables direct o1 integration, it could further enhance structured adherence.
- Even without direct access, I believe my approach can embed o1-qualities through the expandable instruction-following mechanisms, making structured execution for proper reasoning possible even with non-reasoning models.
Final Thoughts
I appreciate your interest and your valuable input! I’d be happy to expand on specific aspects if needed
. I know its quite an ambitious project:sweat_smile:, but as I validated at least the GTP being able to execute as outlined, I trust my intuitive insights and quasi-systematic reasoning in propelling it forwards