How does gpt-4-1-mini compare with gpt-4o-mini for tool calling

I’m using gpt-4o-mini for selecting which function to call and setting the selected function’s parameters. How does gpt-4-1-mini compare to gpt-4o-mini for this purpose? I’ve noticed that gpt-4o-mini may confabulate an argument that does not exist or try to set its value with an improper type. I’ve also noticed that it will often pick the wrong tool when two tools are too similar. However, I’ve implemented guardrails to catch and correct these errors. I’d upgrade to using gpt-4-1-mini instead if it were a LOT more reliable, but not if its only slightly more so.