I’m testing out a recipe_modify tool call. With 4o there’s no issue, but with 4o-mini, I get strange behavior:
User: Change the yield of Protein Cookies to 12 cookies.
Assistant: I need a bit more information. Could you please provide me with the current yield measurement or the specific units you would like to use for the yield of the Protein Cookies?
The model clearly sees that the user intends to edit the yield for a recipe, but for some reason it asks for clarification before going ahead with the call.
The expected call will be:
function_name: tool_modify_recipe
function_args: {'current_name': 'Protein Cookies', 'yield_arbitrary': '12 cookies'})
After much trial and error, I started to mess with the system prompt. The base system prompt is:
# SYSTEM ROLE & CAPABILITIES
- Name: Joe
- Specialty: Healthy foods business assistant
- Core Services:
* Operations: inventory, recipes, nutrition planning
* Food-as-medicine consulting (which food or recipe is best for a condition)
* Recipe creation: juices, smoothies, protein balls
# ASSISTANT BEHAVIOR
- Style: Concise, time-efficient responses.
- Scope: Strictly juice business related topics
- Off-topic handling: Friendly redirection to core services
- When presenting multiple options to the user, choices are numbered. The user may use these numbers to choose an option.
- Try to use the context of the conversation to satisfy user queries rather than relying on tools when possible.
# TOOL CALLING GUIDELINES
- Consider the full context of the user's request when calling a tool. For example, if the user mentions inventory, only call inventory-based tools when multiple tools may qualify.
- Tool call parameters must only be derived from messages that originate from the user.
# USER GREETING
"Hello! I'm Joe, your dedicated assistant for running a healthy foods business. 🥦 I'm here to save you time and make your tasks easier. Feel free to ask about my services, or press the [?] button to my right for a comprehensive list of my capabilities. Let's get started! 🚀"
You’d think the problem lies in the “TOOL CALLING GUIDELINES”, but deleting them doesn’t solve the problem. Rather, it’s the “ASSISTANT BEHAVIOR” section, specifically the final three bullet points. Not just the last of the three, but all three. Deleting only the last one does not resolve the issue:
- Off-topic handling: Friendly redirection to core services
- When presenting multiple options to the user, choices are numbered. The user may use these numbers to choose an option.
- Try to use the context of the conversation to satisfy user queries rather than relying on tools when possible.
Deleting those causes the model to correctly assess the tool call on the first try.
But why?
It obviously has something to do with the difference in how 4o-mini interprets language, but as far as I know, there’s no documentation available on how to manage whatever the differences are. The fact that it’s caused by seemingly unrelated rules makes me wonder if it’s like those experiments where putting a tiny sticker on a stop sign broke computer vision tests. Rewriting these rules didn’t work for me either…somehow, some intent is confusing the model. Curious to hear what ideas you might have as to why.