are you talking about playground tests?
It’s possible that you’ve used gpt-4-turbo, and not gpt-4. there’s a significant difference. set your model to either gpt-4-1106-preview or gpt-4-0125-preview.
gpt-4-0613 is the worst of the gpt-4 models, a significant slump from 0314 (which isn’t available to most people anymore)
the turbo models are slightly dumber, but more stable in terms of hallucinatons and such.