Given that GPT-4o has a different underlying architecture than the other GPT-4 models, it is fair to assume that amendments to instructions are required to achieve similar outputs than before.
I do agree however with your observation - which is consistent with what others have reported and also reflects my own experience so far - that instruction following occasionally is a challenge under this model. See also this thread: GPT-4o vs. gpt-4-turbo-2024-04-09, gpt-4o loses - #10 by elmstedt
OpenAI itself has also made the point that the model may underperform GPT-4-turbo in some cases and that they are still looking to gather feedback as to when that specifically is the case:
As measured on traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.
Through our testing and iteration with the model, we have observed several limitations that exist across all of the model’s modalities. We would love feedback to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so we can continue to improve the model.