I’m using gpt-4o with the new Responses API and the tool_choice is web_search_preview, and I’m facing an issue where the model does not persistently follow the system message across turns.
Expected Behavior:
I set a system message:
"You must only classify companies by industry in two words. Do not answer anything else."
I ask:
"What industry is Tesla in?"
Model correctly responds: "Electric Vehicles"
Then, I ask:
"Where is Tesla's headquarters?"
Unexpected Behavior → Instead of refusing, the model answers the question normally.
Any irrelevant question that would require a web search would be answered regardless of the system message.
Issue:
Even though the first response follows the system instruction [Note: this also requires a web search], subsequent queries are treated as normal questions instead of sticking to the original rule.
I expected the model to deny answering unrelated questions since the system message says it should only classify industries.
Even reinforcing the instruction in the user prompt doesn’t work.
What I’ve Tried:
Repeating the system message inside each user query.
Lowering temperature and top_p to make responses more deterministic.
None of these seem to work consistently.
Question:
Is there a way to force the model to strictly follow the system message even when it performs a web search?
It seems that the way the instructions are given in the system message might not be optimal. Please try the following system message:
Your task is to classify companies into industry categories using only two words.
If a user says something that does not match the above task, simply reply, "Please ask only the task of classifying companies into industry categories."
What industry is Tesla in?
Automotive Technology
Where is Tesla’s headquarters?
Please ask only the task of classifying companies into industry categories.
Here’s what I found: with this 1-shot example classifier, it was just impossible to get web search invoked by system message making it mandatory and justifying the use. It took user message “classify with web search:”“”{text}“”“”
Once it was invoked, the free-form schema that could not be broken got broke.
If the system message was later being ignored, with a search, later questions discovering search results don’t persist, more questions to invoke again, I would not be able to ask gpt-4o deliberately intrusive diagnostic questions:
I think what you might have is the huge search results distract the question, the injected instructions about not repeating search results steal the focus of the task, and few-shot learning of past results can make bad behavior persist.
There is no reason to make a “conversational” classifier with actual API call history. The past usage can affect the quality, where your own multi-shot should be improving the quality.