Is there a way to encode timed pauses in the TTS audio file output?
I’d like to generate guided meditations with 2-3 minute pauses
Is there a way to encode timed pauses in the TTS audio file output?
I’d like to generate guided meditations with 2-3 minute pauses
Hi @ben21
Welcome to the OpenAI dev forum.
In my knowledge, pauses for that long duration can be implemented by code.
Simply generate two separate files and then append a pause after the first one before concatenating the second one.
Thanks for the welcome and the advice sps! I have started with this workaround
Hey Ben! wondering if you found a workaround that works well enough - i have a similar issue for something im tryna build and havent found a decent solution yet.
Hi there, I managed a basic hack to break it up to segments and place pauses in between.
Here’s my code:
from openai import OpenAI
from pydub import AudioSegment
import os
# Initialize final_audio as a silent segment of zero duration
final_audio = AudioSegment.silent(duration=0)
# OpenAI API Setup
client = OpenAI(api_key="YOUR_KEY")
# Guided Meditation Script Segments
segments = [
"MEDITATION_TEXT_1", "MEDITATION_TEXT_2"
]
# Define a one-minute pause
one_minute_silence = AudioSegment.silent(duration=60000) # 60,000 milliseconds
# Generate and combine segments with pauses
for i, segment in enumerate(segments):
# Generate the audio using OpenAI's text-to-speech
response = client.audio.speech.create(
model="tts-1",
voice="onyx",
input=segment,
)
# Save and load each segment
temp_audio_file = f"segment_{i}.mp3"
with open(temp_audio_file, "wb") as f:
f.write(response.content)
segment_audio = AudioSegment.from_file(temp_audio_file)
final_audio += segment_audio
# Add pause after each segment except the last one
if i < len(segments) - 1:
final_audio += one_minute_silence
# Clean up the temporary file
os.remove(temp_audio_file)
# Export the final audio file
final_audio.export("audio.mp3", format="mp3")
print("Audio with pauses created successfully!")