Audio notes for OpenAI realtime on Apple platforms

I’m starting a new thread off of this question, because that thread’s author was asking specifically for a linux solution.

Ok, @tyler10 asks:

I’m using iOS and have run into this issue before. We use AVAudioEngine rather than AVCaptureSession, but also had to setVoiceProcessingEnabled to true on both the input and output nodes.

We still run into occasional issues though. Can you share what you are doing? Curious to compare notes.

It’s a chore, isn’t it! I have written the audio portion of my realtime SDK three times. First I used AVCaptureSession, then AVAudioEngine, and finally AudioToolbox.

I’m releasing all my work on an MIT license really soon. It will be a shared core that powers realtime in both lzell/AIProxySwift and jamesrochabrun/SwiftOpenAI (coincidentally, we have had no luck getting these two libs added to the OpenAI community list, so if anyone from OpenAI is reading along please help us out!)

There are some tricks that I can share for you now. The first is that on macOS you must init the voice processing audio unit to use 44100 khz. It’s the only sample rate that it accepts. So I use that rate for both macOS and iOS for consistency, and use my own AVAudioConverter on the output to convert to 24000.

One thing about converting sample rates is you can’t use the AVAudioConverter’s convenience methods. They will result in all sorts of pops in the audio. See this technical note: TN3136: AVAudioConverter - performing sample rate conversions | Apple Developer Documentation

Another trick is to init your AVAudioEngine for RT playback before you init your AU with kAudioUnitSubType_VoiceProcessingIO. This is a really unfortunate ordering requirement. If you flip the order, the playback will be really quiet. See the sidenote on this stackoverflow question.

Another trick is disable the output scope of element zero of your AU. This will prevent a bunch of warning logs on iOS

var zero: UInt32 = 0
err = AudioUnitSetProperty(audioUnit,
                           kAudioOutputUnitProperty_EnableIO,
                           kAudioUnitScope_Output,
                           0,
                           &zero, // <-- This is not a mistake! If you leave this on, iOS spams the logs with: "from AU (address): auou/vpio/appl, render err: -1"
                           UInt32(MemoryLayout.size(ofValue: zero)))

Element zero is described in the “Essential Characteristics of I/O Units” section of this great doc (If you are scrolling the doc, skip the first image because it’s misleading. You are specifically looking for the I/O unit section).

I also used this old obj-c sample project to understand how to set up the audio unit with the right flags. See the setupIOUnit method in AudioController.mm. But instead of kAudioUnitSubType_RemoteIO, use kAudioUnitSubType_VoiceProcessingIO

I’ll drop the lib here as soon as possible (it is truly hacked up right now, but the audio is buttery smooth). I hope this helps in the meantime.

2 Likes