Calling TTS from a Swift app

wwessels1 · April 2, 2024, 3:37pm

I saw that OpenAI published an endpoint for TextToSpeech, but I could only find a sample for Node.js and one for Python, using some installs on my Mac. Don’t want that

So, I wrote a bit of code in Swift that produces a file with the spoken text.
No guarantee!! But it works for me!

Please add your own error handling and remove the print statements!
Please give the file a name, now it is a generated name.
The file is a tmp file. It is probably an MP3 file or something.
Or use the stream to play the resulting speech in your app.
You can remove the organisation if you are a one-person-team.

import Foundation

class OpenAITTS {
    
    private enum constants {
        enum openAI {
            static let url = URL(string: "https://api.openai.com/v1/audio/speech")
            static let apiKey = "<your apiKey here>"
            static let organisation = "<your organisation ID here>"
        }
    }
    
    private var urlSession: URLSession = {
        let configuration = URLSessionConfiguration.default
        let session = URLSession(configuration: configuration)
        return session
    }()
    
    func speak(_ text: String) {
        guard let request = self.request(text) else {
            print("No request")
            return
        }
        self.send(request: request)
    }
    
    private func send(request: URLRequest) {
        
        let task = self.urlSession.downloadTask(with: request) { urlOrNil, responseOrNil, errorOrNil in
            if let errorOrNil {
                print(errorOrNil)
                return
            }

            if let response = responseOrNil as? HTTPURLResponse {
                print(response.statusCode)
            }
            
            guard let fileURL = urlOrNil else { return }

            do {
                let documentsURL = try
                    FileManager.default.url(for: .documentDirectory,
                                            in: .userDomainMask,
                                            appropriateFor: nil,
                                            create: false)
                let savedURL = documentsURL.appendingPathComponent(fileURL.lastPathComponent)
                print(savedURL)
                try FileManager.default.moveItem(at: fileURL, to: savedURL)
            } catch {
                print ("file error: \(error)")
            }
        }

        task.resume()
    }
    
    private func request(_ text: String) -> URLRequest? {
        guard let baseURL = Self.constants.openAI.url else {
            return nil
        }
        
        let request = NSMutableURLRequest(url: baseURL)
        let parameters: [String: Any] = [
            "model": "tts-1",
            "voice": "nova",
            "response_format": "mp3",
            "speed": "0.98",  // hidden feature in OpenAI TTS! Range: 0.25 - 4.0, Default 1.0
            "input": text
        ]
        
        request.addValue("Bearer \(Self.constants.openAI.apiKey)", forHTTPHeaderField: "Authorization")
        request.addValue(Self.constants.openAI.organisation, forHTTPHeaderField: "OpenAI-Organization") // Optional
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")

        request.httpMethod = "POST"
        
        if let jsonData = try? JSONSerialization.data(withJSONObject: parameters, options: .prettyPrinted) {
            request.httpBody = jsonData
        }
        
        return request as URLRequest
    }
}

darcschnider · April 2, 2024, 6:47pm

oh nice find on the speed. I have to add that to my system.

wwessels1 · April 3, 2024, 10:14pm

Hi Ben,

I found one more argument. I updated the code sample.

darcschnider · April 3, 2024, 11:12pm

I do not see what was added haha, that or I had known it and didn’t recall it was not there to begin with

wwessels1 · April 3, 2024, 11:36pm

“response_format": “mp3”,

darcschnider · April 4, 2024, 12:16am

ah, that is another good find. that one I had used before when playing with streaming the voice. I look forward to the cloning system down the road which will be fun.

wwessels1 · April 4, 2024, 6:05am

Hi Ben,
Are there any other arguments/parameters?

darcschnider · April 4, 2024, 12:08pm

not that I have seen yet, I pretty much just know what they had in the documents. only been using openai’s voice the last few months. pre that used a lot of other systems during testing. Elevenlabs, GTTS, EDGEtts, Open.ai, and a few other less real voices. Elevenlabs is my fav but $$$ to run on a real time bot. Openai is almost as good and once they get the voice clone tech going it will get better imo as we can than source other voice options to match the projects.

for free voice EDGETTS is the best. its processed local and almost instance and well not amazing its pretty much like google assistant/siri/alexa quality but lacks emotion on the same level as the the top ones for ranges ect.

Also with Elevenlabs they now are testing sounds generators which are pretty sweet. Openai I think will be coming out with something like that down the road as well to go with SORA system. Ai bubble is just starting

j8 · April 10, 2024, 7:02pm

Is it possible to get the status of the TTS generation once you call it? I find that when I call the TTS, it often takes 5 - 10 seconds before I get a response, depending on how long the text is. Is there a way to determine the time it will take to generate the audio or a way to get real-time updates on the generation? Appreciate any help

darcschnider · April 13, 2024, 1:29pm

you can stream it, depending on your setup. the issue with the delay is the building of the file than playback, where stream would be direct to an output.

Openai supports it and I think Elevenlabs, but I have not played with it yet. EDGETTS is almost instant. well not a Ferrari it’s pretty good.

You could build a algorithm to predict the approx time based on number of words and some mapped timings.

Topic		Replies	Views
ChatGPT API TTS streaming API api	3	3720	January 21, 2025
Successful implementation of TTS on Google Colab API api , tts	0	2039	November 8, 2023
Assistants API (and more) Wrapper in Swift GPT builders swift , ios	6	2369	October 20, 2024
Text-to-speech API code for Python API	0	904	June 12, 2024
OpenAI API wrapper for Swift, Whisper support Community whisper	0	2827	March 5, 2023

Calling TTS from a Swift app

Related topics