While I was unable to reproduce the example from OP in Playground, according to these reports, this model frequently hallucinates when presented with addresses from the database.
This is undesirable behavior as it would break the RAG pipeline.
Since GPT-4o has just been released, it might be wise to refrain from using it in production for the time being.