Gpt-4o is really bad at NER tasks

I am trying to extract some medical symptom entity from the user’s question, but I found gpt4o is unable to extract these entities, I was surprised at the result, then I tested other platforms(e.g: gemini and other platform), all they works well, I then try gpt-4.1-mini, it can give the answer. So what’s wrong with gpt-4o, for LLM, this should be a basic question, but gpt-4o cannot give the answer, I feel that my money was wasted.(although it is very low price for my tasks.)

you can see the snapshot for my task. (Comparing gpt-4o and gpt-4.1o). Should I use more expensive model gpt-4.1 or use another LLM company api?

below is gpt-4.1-mini