Text input that hits the cache costs 50% less. Audio input that hits the cache costs 80% less.
Here is the announcement regarding prompt caching on the Realtime API: