My project uses GPT-4.1 and it does well. I wanted to migrate to GPT-5 from GPT-4.1 for the advantages GPT-5 has, but I’ve noticed my agent is now very poor at parallel tool calling. It rarely does it unless I prompt aggressively. I didn’t need to prompt so much with GPT-4.1. Even with extra instructions in the system prompt to always parallel tool call when possible, batch calls, and do them simultaneously to save time for our users, GPT-5 still barely performs parallel calls. It only became slightly more likely after additional prompting. I’m not sure why this is happening and haven’t seen many others mention it, so I’m curious if anyone knows the reason. I made no other changes except swapping out GPT-4.1 with GPT-5, and doing hyperparam tweaks like turning off a custom temperature (since only t=1 is supported with GPT-5) and setting reasoning_effort to minimal (to match prior behavior with GPT-4.1)
GPT-4.1 with no extra prompting would parallel tool call when it made sense about 80% of the time.
GPT-5 WITH extra prompting to do it, does it in the same scenarios about 25% of the time.
Without the extra prompting to do it, it is like 5% of the time or never.