Audio notes for OpenAI realtime on Apple platforms

I’m starting a new thread off of this question, because that thread’s author was asking specifically for a linux solution.

Ok, @tyler10 asks:

I’m using iOS and have run into this issue before. We use AVAudioEngine rather than AVCaptureSession, but also had to setVoiceProcessingEnabled to true on both the input and output nodes.

We still run into occasional issues though. Can you share what you are doing? Curious to compare notes.

It’s a chore, isn’t it! I have written the audio portion of my realtime SDK three times. First I used AVCaptureSession, then AVAudioEngine, and finally AudioToolbox.

I’m releasing all my work on an MIT license really soon. It will be a shared core that powers realtime in both lzell/AIProxySwift and jamesrochabrun/SwiftOpenAI (coincidentally, we have had no luck getting these two libs added to the OpenAI community list, so if anyone from OpenAI is reading along please help us out!)

There are some tricks that I can share for you now. The first is that on macOS you must init the voice processing audio unit to use 44100 khz. It’s the only sample rate that it accepts. So I use that rate for both macOS and iOS for consistency, and use my own AVAudioConverter on the output to convert to 24000.

One thing about converting sample rates is you can’t use the AVAudioConverter’s convenience methods. They will result in all sorts of pops in the audio. See this technical note: TN3136: AVAudioConverter - performing sample rate conversions | Apple Developer Documentation

Another trick is to init your AVAudioEngine for RT playback before you init your AU with kAudioUnitSubType_VoiceProcessingIO. This is a really unfortunate ordering requirement. If you flip the order, the playback will be really quiet. See the sidenote on this stackoverflow question.

Another trick is disable the output scope of element zero of your AU. This will prevent a bunch of warning logs on iOS

var zero: UInt32 = 0
err = AudioUnitSetProperty(audioUnit,
                           kAudioOutputUnitProperty_EnableIO,
                           kAudioUnitScope_Output,
                           0,
                           &zero, // <-- This is not a mistake! If you leave this on, iOS spams the logs with: "from AU (address): auou/vpio/appl, render err: -1"
                           UInt32(MemoryLayout.size(ofValue: zero)))

Element zero is described in the “Essential Characteristics of I/O Units” section of this great doc (If you are scrolling the doc, skip the first image because it’s misleading. You are specifically looking for the I/O unit section).

I also used this old obj-c sample project to understand how to set up the audio unit with the right flags. See the setupIOUnit method in AudioController.mm. But instead of kAudioUnitSubType_RemoteIO, use kAudioUnitSubType_VoiceProcessingIO

I’ll drop the lib here as soon as possible (it is truly hacked up right now, but the audio is buttery smooth). I hope this helps in the meantime.

2 Likes

Hey @louzell ,

Any update on your progress? I’m using AVAudioEngine with setVoiceProcessingEnabled on the input and output nodes but still seeing issues with interruptions pretty consistently.

Was wondering of moving down a level yielded success in this area?

1 Like

Yes, I’ve been using AudioToolbox successfully internally. I just pushed up everything I have, which has been working well for me on iOS, and works well on macOS without headphones.

Take a look at Add OpenAI realtime support by lzell · Pull Request #108 · lzell/AIProxySwift · GitHub. There’s a readme addition that has a copy-pasteable snippet that should work out of the box for you. If you don’t want to use our service, you can connect straight to OpenAI by uncommenting the following:

       /* Uncomment for BYOK use cases */
        // let openAIService = AIProxy.openAIDirectService(
        //     unprotectedAPIKey: "your-openai-key"
        // )

If you just want to swipe the AudioUnit code, the files that you want are:

MicrophonePCMSampleVendor.swift
MicrophonePCMSampleVendorError.swift
AudioPCMPlayer.swift
AudioPCMPlayerError.swift

Take a look at the README addition for usage on those types. Have fun!

1 Like

@louzell Thank you so much for sharing! Super helpful to compare my implementation and ultimately ended up using yours because it was just much more consistent and better quality :slight_smile:

To anyone considering a Swift impl, can’t recommend taking a look at the linked repo above !

1 Like

Really happy to hear that! Responses like yours make trudging through the audio frameworks worthwhile :smiley: