Two realtime voice agent communication pattern

Why not pass the transcripts back forth instead of voice?