PCM16 to Opus Conversion Working But Silent Audio in Telnyx WebSocket Calls

Paul_Diamant · August 2, 2025, 6:30pm

Hi everyone,

I’m working on real-time audio conversion for Telnyx WebSocket calls and running into an issue where my PCM16 to Opus conversion appears successful but produces silent audio during calls.

What I’m trying to do:

Convert PCM16 base64 audio to Opus base64 for Telnyx real-time media streaming
Send converted Opus audio via WebSocket like this:

const convertedAudio = await this.converterClient.post('/convert', {
              audio_data: response.delta,
              input_format: 'pcm16',
            });

            const audioDelta = {
              event: 'media',
              media: {
                payload: convertedAudio.data.audio_data,
                track: 'outbound',
              },
            };

            connectedWs.send(JSON.stringify(audioDelta));

Telnyx controller (answering an incoming call):

  private async handleCallInitiated(payload: any) {
    this.logger.log('📞 Call initiated - auto-answering...');

    const callControlId = payload.call_control_id;
    const direction = payload.direction;

    // Only auto-answer incoming calls
    if (callControlId && direction === 'incoming') {
      try {
        await this.telnyxService.answerCall(callControlId, {
          stream_url: this.telnyxService.generateStreamingWebSocketUrl(),
          stream_track: 'both_tracks',
          stream_codec: 'default',
          stream_bidirectional_mode: 'rtp',
          stream_bidirectional_codec: 'OPUS',
          send_silence_when_idle: true,
          webhook_url: `${process.env.REMOTE_URL}/telnyx-call-webhook`,
          client_state: btoa(
            JSON.stringify({ autoAnswered: true, streaming: true }),
          ),
        } as Telnyx.CallsAnswerParams);

        this.logger.log('');
        this.logger.log(
          '✅ Auto-answered incoming call with streaming enabled',
        );
      } catch (error) {
        this.logger.error('Failed to auto-answer call:', error);
      }
    } else if (direction === 'outgoing') {
      this.logger.log('📤 Outgoing call initiated');
    }
  }

OpenAI session update data:

const sessionUpdate = {
      type: 'session.update',
      session: {
        turn_detection: {
          type: 'server_vad',
          threshold: 0.4, // Lower threshold for better speech detection (more sensitive)
          prefix_padding_ms: 300, // Increased to capture speech start better
          silence_duration_ms: 1500, // Much longer - allows for natural pauses, breathing, thinking
          create_response: false, // Turn off automatic responses for custom control
        },
        input_audio_format: 'pcm16', // 24kHz, 16-bit, mono (HD quality)
        output_audio_format: 'pcm16', // NOT g711_ulaw
        voice: 'alloy',
        instructions: 'Just say "Hello my friend, welcome" in Hebrew.',
        modalities: ['text', 'audio'],
        temperature: 0.7,
      },
    };

What’s working:

PCM16 to Opus conversion completes successfully using Python opuslib
Generated WAV files from original PCM16 play correctly
Conversion logs show reasonable compression ratios (e.g., 12000 bytes → 35-66 bytes)
Tested multiple sample rates: 8kHz, 16kHz, 24kHz, 48kHz

The problem:

Original PCM16 audio (when converted to WAV) plays perfectly
Opus converted audio is completely silent in Telnyx calls
Round-trip conversion (PCM16 → Opus → PCM16) also produces silent audio

My conversion setup:

Using 20ms frames (160 samples at 8kHz, 320 at 16kHz, etc.)
opuslib.APPLICATION_VOIP for telephony use
Trying various sample rates but Telnyx docs suggest 8kHz is preferred
Single channel (mono) audio

Questions:

Are there specific Opus encoding parameters required for Telnyx compatibility?
Could this be a frame concatenation issue? (I’m joining multiple Opus frames)
Are there additional headers or formatting requirements for Telnyx WebSocket audio?
Has anyone successfully implemented PCM16 → Opus conversion for Telnyx calls?

Any insights would be greatly appreciated! The fact that the original audio works but converted audio doesn’t suggests something specific about the Opus encoding process.

Thanks!

This is my python coverter:

import base64
import logging
from typing import List, Optional
import numpy as np
from fastapi import FastAPI, HTTPException, status
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, Field
import opuslib
import uvicorn

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Audio Converter Service",
    description="Convert audio between PCM16 and Opus formats with base64 encoding",
    version="2.0.0"
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Audio configuration
DEFAULT_SAMPLE_RATE = 16000
DEFAULT_CHANNELS = 1
FRAME_DURATION_MS = 20  # 20ms frames
BITS_PER_SAMPLE = 16


class AudioConversionRequest(BaseModel):
    """Request model for audio conversion."""
    audio_data: str = Field(..., description="Base64 encoded audio data")
    input_format: str = Field(..., description="Input format: 'pcm16' or 'opus'")
    sample_rate: Optional[int] = Field(DEFAULT_SAMPLE_RATE, description="Sample rate in Hz")
    channels: Optional[int] = Field(DEFAULT_CHANNELS, description="Number of audio channels")


class AudioConversionResponse(BaseModel):
    """Response model for audio conversion."""
    audio_data: str = Field(..., description="Base64 encoded converted audio data")
    output_format: str = Field(..., description="Output format: 'pcm16' or 'opus'")
    sample_rate: int = Field(..., description="Sample rate in Hz")
    channels: int = Field(..., description="Number of audio channels")
    success: bool = Field(..., description="Conversion success status")
    message: Optional[str] = Field(None, description="Status message")


class AudioConverter:
    """Main audio converter class handling PCM16 ⟷ Opus conversions."""
    
    def __init__(self):
        """Initialize the audio converter."""
        self._encoders = {}
        self._decoders = {}
    
    def _get_encoder(self, sample_rate: int, channels: int) -> opuslib.Encoder:
        """Get or create an Opus encoder for the given parameters."""
        key = (sample_rate, channels)
        if key not in self._encoders:
            self._encoders[key] = opuslib.Encoder(
                sample_rate, 
                channels, 
                opuslib.APPLICATION_VOIP
            )
        return self._encoders[key]
    
    def _get_decoder(self, sample_rate: int, channels: int) -> opuslib.Decoder:
        """Get or create an Opus decoder for the given parameters."""
        key = (sample_rate, channels)
        if key not in self._decoders:
            self._decoders[key] = opuslib.Decoder(sample_rate, channels)
        return self._decoders[key]
    
    def _validate_audio_params(self, sample_rate: int, channels: int) -> None:
        """Validate audio parameters."""
        if sample_rate not in [8000, 12000, 16000, 24000, 48000]:
            raise ValueError(f"Unsupported sample rate: {sample_rate}. Supported: 8000, 12000, 16000, 24000, 48000")
        
        if channels not in [1, 2]:
            raise ValueError(f"Unsupported channel count: {channels}. Supported: 1 (mono), 2 (stereo)")
    
    def _calculate_frame_size(self, sample_rate: int) -> int:
        """Calculate frame size for the given sample rate."""
        return int(sample_rate * FRAME_DURATION_MS / 1000)
    
    def pcm16_to_opus(self, pcm_data: bytes, sample_rate: int, channels: int) -> bytes:
        """
        Convert PCM16 audio data to Opus format.
        
        Args:
            pcm_data: Raw PCM16 audio data (16-bit signed integers)
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            Opus encoded audio data
        """
        print(f"Converting PCM16 to Opus: {sample_rate} Hz, {channels} channels")

        try:
            # self._validate_audio_params(sample_rate, channels)
            
            # Decode incoming PCM16
            pcm_bytes = base64.b64decode(pcm_data)

            # Calculate frame sizes
            frame_size = self._calculate_frame_size(sample_rate)
            bytes_per_frame = frame_size * channels * 2               # 2 bytes per sample
            
            # Get encoder and frame size
            encoder = self._get_encoder(sample_rate, channels)

            opus_frames: List[bytes] = []

            # Process in fixed‐size chunks
            for offset in range(0, len(pcm_bytes), bytes_per_frame):
                chunk = pcm_bytes[offset:offset + bytes_per_frame]
                if len(chunk) < bytes_per_frame:
                    # pad with silence if last frame is short
                    chunk += b'\x00' * (bytes_per_frame - len(chunk))

                # Encode raw PCM16 → Opus
                opus_frame = encoder.encode(chunk, frame_size)
                opus_frames.append(opus_frame)
            
            # Concatenate all Opus frames into single bytes object
            opus_bytes = b''.join(opus_frames)
            
            logger.info(f"Converted PCM16 to Opus: {len(pcm_data)} bytes → {len(opus_bytes)} bytes")
            return opus_bytes
            
        except Exception as e:
            logger.error(f"PCM16 to Opus conversion failed: {e}")
            raise ValueError(f"PCM16 to Opus conversion failed: {str(e)}")
    
    def opus_to_pcm16(self, opus_data: bytes, sample_rate: int, channels: int) -> bytes:
        """
        Convert Opus audio data to PCM16 format.
        
        Args:
            opus_data: Opus encoded audio data
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            Raw PCM16 audio data (16-bit signed integers)
        """
        try:
            self._validate_audio_params(sample_rate, channels)
            
            # Validate Opus data
            if len(opus_data) < 1:
                raise ValueError("Empty Opus data")
            
            # Get decoder and frame size
            decoder = self._get_decoder(sample_rate, channels)
            frame_size = self._calculate_frame_size(sample_rate)
            
            # Decode Opus to float32 PCM
            pcm_data = decoder.decode(opus_data, frame_size)
            pcm_array = np.frombuffer(pcm_data, dtype=np.float32)
            
            # Convert float32 to int16
            pcm_int16 = (pcm_array * 32767.0).astype(np.int16)
            
            # Clip values to prevent overflow
            pcm_int16 = np.clip(pcm_int16, -32768, 32767)
            
            logger.info(f"Converted Opus to PCM16: {len(opus_data)} bytes → {len(pcm_int16.tobytes())} bytes")
            return pcm_int16.tobytes()
            
        except Exception as e:
            logger.error(f"Opus to PCM16 conversion failed: {e}")
            raise ValueError(f"Opus to PCM16 conversion failed: {str(e)}")


# Global converter instance
converter = AudioConverter()


@app.post("/convert", response_model=AudioConversionResponse)
async def convert_audio(request: AudioConversionRequest):
    """
    Convert audio between PCM16 and Opus formats.
    
    - **audio_data**: Base64 encoded audio data
    - **input_format**: Either 'pcm16' or 'opus'
    - **sample_rate**: Sample rate in Hz (8000, 12000, 16000, 24000, 48000)
    - **channels**: Number of channels (1 or 2)
    """
    try:
        # Decode base64 input
        try:
            audio_bytes = base64.b64decode(request.audio_data)
        except Exception as e:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=f"Invalid base64 audio data: {str(e)}"
            )
        
        # Validate input format
        input_format = request.input_format.lower()
        if input_format not in ['pcm16', 'opus']:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail="Input format must be 'pcm16' or 'opus'"
            )
        
        # Perform conversion
        if input_format == 'pcm16':
            # Convert PCM16 to Opus
            converted_bytes = converter.pcm16_to_opus(
                audio_bytes, 
                request.sample_rate, 
                request.channels
            )
            output_format = 'opus'
        else:
            # Convert Opus to PCM16
            converted_bytes = converter.opus_to_pcm16(
                audio_bytes, 
                request.sample_rate, 
                request.channels
            )
            output_format = 'pcm16'
        
        # Encode result to base64
        converted_base64 = base64.b64encode(converted_bytes).decode('utf-8')

        print(f"{converted_base64}")
        
        return AudioConversionResponse(
            audio_data=converted_base64,
            output_format=output_format,
            sample_rate=request.sample_rate,
            channels=request.channels,
            success=True,
            message=f"Successfully converted {input_format} to {output_format}"
        )
        
    except HTTPException:
        raise
    except ValueError as e:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=str(e)
        )
    except Exception as e:
        logger.error(f"Unexpected conversion error: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=f"Conversion failed: {str(e)}"
        )


@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "service": "Audio Converter",
        "version": "2.0.0"
    }


@app.get("/")
async def root():
    """Root endpoint with service information."""
    return {
        "service": "Audio Converter Service",
        "version": "2.0.0",
        "description": "Convert audio between PCM16 and Opus formats with base64 encoding",
        "endpoints": {
            "convert": "/convert (POST)",
            "health": "/health (GET)",
            "docs": "/docs (GET)"
        },
        "supported_formats": ["pcm16", "opus"],
        "supported_sample_rates": [8000, 12000, 16000, 24000, 48000],
        "supported_channels": [1, 2],
        "default_sample_rate": DEFAULT_SAMPLE_RATE,
        "default_channels": DEFAULT_CHANNELS
    }


def main():
    """Entry point for the audio converter service."""
    logger.info("Starting Audio Converter Service...")
    uvicorn.run(
        app, 
        host="0.0.0.0", 
        port=8000, 
        log_level="info"
    )


if __name__ == "__main__":
    main()

Foxalabs · August 2, 2025, 7:20pm

a fair few things might be going on here, the one that sticks out is that Telnyx wants one RTP‐payload-sized Opus packet per media message

Every media frame you send over the WebSocket must be a single Opus RTP payload (no RTP header, up to 30s of audio).
If you concatenate multiple encoder.encode() returns and ship them as one payload, the first TOC byte is parsed and the rest is garbage, the decoder plays silence.
You could try to run the encoder once per 20ms slice or encode a whole chunk in one call, but never glue packets together.

frame_size = int(sample_rate * 0.02) # 20 ms opus_packet = encoder.encode(pcm_chunk, frame_size) b64 = base64.b64encode(opus_packet).decode() ws.send(json.dumps({"event": "media", "media": {"payload": b64}}))

Also, don’t add extra keys to the outbound media object

Telnyxs bidirectional example only shows payload. Including track, chunk, etc. on outbound frames can make the frame fail validation and you get silence with no explicit error.
See: developers.telnyx.com

// good { event: "media", media: { payload: "<base64-opus>" } }

Only 8 kHz and 16 kHz are accepted for RTP / Opus. Anything higher is down-sampled and often ends up silent.

YOu should Pass int16 samples, not raw bytes, to opuslib

opuslib.Encoder.encode() expects a buffer of 16-bit signed ints. Passing a bytes oject means it sees 8-bit chars, the result decodes as silence.

pcm_i16 = np.frombuffer(pcm_bytes, dtype=‘<i2’)     # little-endian int16 
opus_packet = encoder.encode(pcm_i16.tobytes(), frame_size)

I think this should be a minimal working send loop example:

for pcm_chunk in incoming_pcm_frames:          # 20 ms eachopus_packet = 
encoder.encode(pcm_chunk, 320)   # 16 kHz → 320 samplespayload = 
base64.b64encode(opus_packet).decode()ws.send(json.dumps({“event”: “media”, “media”: {“payload”: payload}}))

Paul_Diamant · August 2, 2025, 7:22pm

Just to clarify its not silence it sounds very muffled and disoriented.

Foxalabs · August 2, 2025, 7:30pm

Gotcha, could be the int16 thing then.. possibly, audio is weird and stuff that should not sound at all often makes it though, like the top 8 bits of a 16bit object only contain the information for the loud stuff, but if you stuff those into an 8bit object the result can still sound like something, but very odd.

Hopefully I’ve covered the issue in those points above, happy Saturday debugging!

Paul_Diamant · August 2, 2025, 9:46pm

Got it working
Full python code:

Now I’m able to send multiple base64 buffers from openai’s delta message, and then I get back an array of opus packets which then streams back to telnyx’s media

Now i gotta work on converting opus stream to pcm16 back to openai.. hehe.

Example json:

{
  "opus_packets": [
    "aIAKP1go9VVDwTCi1BryN3ECIwEb9kpIazrOpIfVxDxmintyNGIWEtPJOjpTdPkE8Cjrkv2TIJTnc47T",
    "aIKIJVpnWMZZyytsJRCGeITOeBi1WVI3CrJ9xJxJJrigPtAi8PAPLLVFOfbDJ1q9pv+J8amWtxNRFFLaQapLdUlvLzXMk4uSuC7Uy7kL+A==",
    "aJPC19XCnmDygTMPrB/GRxUejsuc9zADP9th3z1KBRPr3qLB3uoWnuMh+uX9OmPEgpTulOcuMMZJyrOmZMil5HRWGWSYhrCh9Q==",
    "aLZSWPBLpEGTDcSdFii5KSuxc2Uq6fKQljYfR1Sg//fPCYqqSgChRskWgXCsVvkIbUB0D28xuH6VYGrw91UBnU7qcqBeQauwileBcIUC0sP3",
    "aLWwksgsPnsBtlLg9juBxhOqK+ePzDfBmOGmM9+189kE9GMYxtgxNJlElWnEvNa8MxM66abG0qUlM5/s3CjEmafgIAGI5mWZ",
    "aK/C85IxDV9bVbRsW3oOTWUG9ffF89qdCkQXkTkysBh+GRCd2oder1xjHZIfCyKET02W5FNWR7pdIQNkW950CRUjEy1a",
    "aKxrVMn1l7SxGDrUYuB0lkddbQe9hI8YW58Q1LLva0yXaSbTnz2ugl8bDW2Tjx4ESR43fOIKHV3bjssCcw4k21/uoqFmKM5PuhE=",
    "aKtaxTyf1953XqFx394DmRjpvUB50ivFeCSELZeCUdpqLZC7Og42PLiBYYjTtKzcs+/Xe+QTyglgKfxyoDCeV7MDhT+Fn3FbVQxyuUvfknSc",
    "aLD8vOuIGSVxySeZpz6VOK9z8iVii3YkvlLPRUE7WEzJdVxKdZXziVWakcIkehPSlDbRZS+hWLdxv2gdCs5Nkxk78uBwV+K+WtVdSYAwhcCO",
    "aLTugDyKURtoo55sXPb6mxNjoiFMKG5pG9s7V64PkRJnWsSZfX9dHvdx124+jtli+fNy6csyIlM7fj/J5+TGxe6A2sA/HVeGROauaubg",
    "aLNhQvnT4aSbnrDfNcGu8bAuLDkeNZ6TGAkVCOrk6t4dJncR+CEF5e23FMyCwPJuPbMrmyX2/4Qy8r8OsqQRk0kEWSQ=",
  ],
  "packet_count": 10,
  "format": "base64_opus_packets"
}

 def pcm16_to_opus(self, pcm_data: bytes, sample_rate: int, channels: int) -> List[bytes]:
        """
        Convert PCM16 audio data to Opus format.
        
        Args:
            pcm_data: Raw PCM16 audio data (16-bit signed integers)
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            List of individual Opus packets (one per 20ms frame)
        """
        logger.info(f"Converting PCM16 to Opus: {sample_rate} Hz, {channels} channels")

        try:
            if not OPUS_AVAILABLE:
                raise RuntimeError("opuslib not available - cannot encode Opus")
            
            # Calculate frame sizes for 20ms
            frame_size = self._calculate_frame_size(sample_rate)  # samples per frame
            bytes_per_frame = frame_size * channels * 2           # 2 bytes per sample
            
            # Get encoder
            encoder = self._get_encoder(sample_rate, channels)

            opus_packets: List[bytes] = []

            # Process in 20ms chunks - each becomes ONE Opus packet
            for offset in range(0, len(pcm_data), bytes_per_frame):
                chunk = pcm_data[offset:offset + bytes_per_frame]
                if len(chunk) < bytes_per_frame:
                    # pad with silence if last frame is short
                    chunk += b'\x00' * (bytes_per_frame - len(chunk))

                # Convert bytes to int16 array (critical for opuslib)
                pcm_i16 = np.frombuffer(chunk, dtype='<i2')  # little-endian int16
                
                # Encode this 20ms frame as ONE Opus packet
                opus_packet = encoder.encode(pcm_i16.tobytes(), frame_size)
                opus_packets.append(opus_packet)
            
            logger.info(f"Converted PCM16 to Opus: {len(pcm_data)} bytes → {len(opus_packets)} packets")
            return opus_packets
            
        except Exception as e:
            logger.error(f"PCM16 to Opus conversion failed: {e}")
            raise ValueError(f"PCM16 to Opus conversion failed: {str(e)}")

Paul_Diamant · August 2, 2025, 10:08pm

After I got it working on converting pcm16 to opus packets , my new challenge is to convert incoming media messages (inbound - caller speaking) to from opus to pcm this time, however in telnyx constant messages are being sent over by websocket even if I’m not talking , which can cause too many API calls to the converter , what would you recommend doing in that case? Send them in batches of 50’s? To avoid overwhelming the API?

Foxalabs · August 2, 2025, 10:27pm

Yea, I tend to batch them up into some convient chunk and send those, so long as you’re not adding too much latency.. all good. If you have server side voice detetction it’s fine to be sending dead air to it, it’ll just throw them away,

Paul_Diamant · August 3, 2025, 6:39pm

I got it working way better now! Both ways are working now, I can talk to openai and also send audio to it using Opus > PCM converter in my python code:

Also not sure if it’s good practice, but unless I added this function which cleans up the opus buffer I receive from telnyx websocket, I would not give me accurate transcripts at all:

for example:
All right.
That’s great.
Thank you.
Please visit www.//airdreamers…///com for more information.

But after adding my script which filters out silent/invalid opus packets:

im getting THE MOST accurate transcripts ever:
Hey, what’s up?
I’m good. My name is Paul. What’s your name?

  private isVoicePacket(payload: any) {
    // Remove quotes and newlines
    payload = payload.replace(/[",\n]/g, '');

    if (!payload) return false;

    // 1. Length check - voice packets are typically shorter
    if (payload.length > 100) {
      return false;
    }

    // 5. Entropy check - voice data should have reasonable variation
    const uniqueChars = new Set(payload).size;
    const entropy = uniqueChars / payload.length;

    // If entropy is too low (too repetitive), likely silence
    if (entropy < 0.3) {
      return false;
    }

    return true;
  }

import base64
import json
import logging
from typing import List, Optional, Union
import numpy as np
from fastapi import FastAPI, HTTPException, status
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, Field
import uvicorn

# Try to import opuslib for Opus support
try:
    import opuslib
    OPUS_AVAILABLE = True
except ImportError:
    OPUS_AVAILABLE = False
    logging.warning("opuslib not available - Opus conversion will not work. Install with: pip install opuslib")

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Audio Converter Service",
    description="Convert audio between PCM16 and Opus formats with base64 encoding",
    version="2.1.0"
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Audio configuration
DEFAULT_SAMPLE_RATE = 24000
DEFAULT_CHANNELS = 1
FRAME_DURATION_MS = 20  # 20ms frames
BITS_PER_SAMPLE = 16


class AudioConversionRequest(BaseModel):
    """Request model for audio conversion."""
    audio_data: Union[str, List[str]] = Field(..., description="Base64 encoded audio data (string for single chunk, list for multiple chunks)")
    input_format: str = Field(..., description="Input format: 'pcm16' or 'opus'")
    sample_rate: Optional[int] = Field(DEFAULT_SAMPLE_RATE, description="Sample rate in Hz")
    channels: Optional[int] = Field(DEFAULT_CHANNELS, description="Number of audio channels")


class AudioConversionResponse(BaseModel):
    """Response model for audio conversion."""
    audio_data: str = Field(..., description="Base64 encoded converted audio data (for PCM16) or JSON array of packets (for Opus)")
    output_format: str = Field(..., description="Output format: 'pcm16' or 'opus'")
    sample_rate: int = Field(..., description="Sample rate in Hz")
    channels: int = Field(..., description="Number of audio channels")
    success: bool = Field(..., description="Conversion success status")
    message: Optional[str] = Field(None, description="Status message")
    packet_count: Optional[int] = Field(None, description="Number of Opus packets (for Opus output only)")


class AudioConverter:
    """Main audio converter class handling PCM16 ⟷ Opus conversions."""
    
    def __init__(self):
        """Initialize the audio converter."""
        self._encoders = {}
        self._decoders = {}
    
    def _get_encoder(self, sample_rate: int, channels: int) -> opuslib.Encoder:
        """Get or create an Opus encoder for the given parameters."""
        key = (sample_rate, channels)
        if key not in self._encoders:
            self._encoders[key] = opuslib.Encoder(
                sample_rate, 
                channels, 
                opuslib.APPLICATION_VOIP
            )
        return self._encoders[key]
    
    def _get_decoder(self, sample_rate: int, channels: int) -> opuslib.Decoder:
        """Get or create an Opus decoder for the given parameters."""
        key = (sample_rate, channels)
        if key not in self._decoders:
            self._decoders[key] = opuslib.Decoder(sample_rate, channels)
        return self._decoders[key]
    
    def _validate_audio_params(self, sample_rate: int, channels: int) -> None:
        """Validate audio parameters."""
        if sample_rate not in [8000, 12000, 16000, 24000, 48000]:
            raise ValueError(f"Unsupported sample rate: {sample_rate}. Supported: 8000, 12000, 16000, 24000, 48000")
        
        if channels not in [1, 2]:
            raise ValueError(f"Unsupported channel count: {channels}. Supported: 1 (mono), 2 (stereo)")
    
    def _calculate_frame_size(self, sample_rate: int) -> int:
        """Calculate frame size for the given sample rate."""
        return int(sample_rate * FRAME_DURATION_MS / 1000)
    
    def pcm16_to_opus(self, pcm_data: bytes, sample_rate: int, channels: int) -> List[bytes]:
        """
        Convert PCM16 audio data to Opus format.
        
        Args:
            pcm_data: Raw PCM16 audio data (16-bit signed integers)
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            List of individual Opus packets (one per 20ms frame)
        """
        logger.info(f"Converting PCM16 to Opus: {sample_rate} Hz, {channels} channels")

        try:
            if not OPUS_AVAILABLE:
                raise RuntimeError("opuslib not available - cannot encode Opus")
            
            # Calculate frame sizes for 20ms
            frame_size = self._calculate_frame_size(sample_rate)  # samples per frame
            bytes_per_frame = frame_size * channels * 2           # 2 bytes per sample
            
            # Get encoder
            encoder = self._get_encoder(sample_rate, channels)

            opus_packets: List[bytes] = []

            # Process in 20ms chunks - each becomes ONE Opus packet
            for offset in range(0, len(pcm_data), bytes_per_frame):
                chunk = pcm_data[offset:offset + bytes_per_frame]
                if len(chunk) < bytes_per_frame:
                    # pad with silence if last frame is short
                    chunk += b'\x00' * (bytes_per_frame - len(chunk))

                # Convert bytes to int16 array (critical for opuslib)
                pcm_i16 = np.frombuffer(chunk, dtype='<i2')  # little-endian int16
                
                # Encode this 20ms frame as ONE Opus packet
                opus_packet = encoder.encode(pcm_i16.tobytes(), frame_size)
                opus_packets.append(opus_packet)
            
            logger.info(f"Converted PCM16 to Opus: {len(pcm_data)} bytes → {len(opus_packets)} packets")
            return opus_packets
            
        except Exception as e:
            logger.error(f"PCM16 to Opus conversion failed: {e}")
            raise ValueError(f"PCM16 to Opus conversion failed: {str(e)}")
    
    def _is_valid_opus_packet(self, opus_data: bytes) -> bool:
        """
        Validate if the data is a valid Opus packet.
        
        Args:
            opus_data: The Opus packet data to validate
            
        Returns:
            True if valid Opus packet, False otherwise
        """
        if len(opus_data) < 1:
            return False
        
        # Check for minimum packet size (Opus packets are usually at least 3-4 bytes)
        if len(opus_data) < 3:
            logger.debug(f"Packet too small: {len(opus_data)} bytes")
            return False
        
        # Check TOC (Table of Contents) byte - first byte of Opus packet
        toc_byte = opus_data[0]
        
        # Extract configuration from TOC byte (bits 3-7)
        config = (toc_byte >> 3) & 0x1F
        
        # Valid Opus configurations are 0-31
        if config > 31:
            logger.debug(f"Invalid Opus configuration: {config}")
            return False
        
        # Check for obvious invalid patterns
        # Opus packets shouldn't be all zeros or all 0xFF
        if all(b == 0 for b in opus_data) or all(b == 0xFF for b in opus_data):
            logger.debug("Packet contains invalid pattern (all zeros or all 0xFF)")
            return False
        
        return True
    
    def _generate_silence_pcm16(self, sample_rate: int, channels: int) -> bytes:
        """
        Generate silence PCM16 data for one frame (20ms).
        
        Args:
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            Silent PCM16 data
        """
        frame_size = self._calculate_frame_size(sample_rate)
        silence_samples = frame_size * channels
        # Generate 16-bit silence (all zeros)
        silence_array = np.zeros(silence_samples, dtype=np.int16)
        return silence_array.tobytes()

    def opus_to_pcm16(self, opus_data: bytes, sample_rate: int, channels: int) -> bytes:
        """
        Convert Opus audio data to PCM16 format.
        
        Args:
            opus_data: Opus encoded audio data
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            Raw PCM16 audio data (16-bit signed integers)
        """
        try:
            self._validate_audio_params(sample_rate, channels)
            
            # Validate Opus data
            if len(opus_data) < 1:
                raise ValueError("Empty Opus data")
            
            # Check if this is a valid Opus packet
            if not self._is_valid_opus_packet(opus_data):
                logger.info(f"Invalid Opus packet ({len(opus_data)} bytes), returning silence")
                return self._generate_silence_pcm16(sample_rate, channels)
            
            # Get decoder and frame size
            decoder = self._get_decoder(sample_rate, channels)
            frame_size = self._calculate_frame_size(sample_rate)
            
            try:
                # Decode Opus to PCM16 (opuslib returns int16 directly)
                pcm_data = decoder.decode(opus_data, frame_size)
                
                logger.info(f"Converted Opus to PCM16: {len(opus_data)} bytes → {len(pcm_data)} bytes")
                return pcm_data
                
            except Exception as decode_error:
                # If decode fails, log and return silence instead of erroring
                logger.warning(f"Opus decode failed ({len(opus_data)} bytes): {decode_error}, returning silence")
                return self._generate_silence_pcm16(sample_rate, channels)
            
        except Exception as e:
            logger.error(f"Opus to PCM16 conversion failed: {e}")
            raise ValueError(f"Opus to PCM16 conversion failed: {str(e)}")
    
    def opus_chunks_to_pcm16(self, opus_base64_chunks: List[str], sample_rate: int, channels: int) -> bytes:
        """
        Convert multiple base64 Opus chunks from Telnyx to PCM16 format.
        
        Args:
            opus_base64_chunks: List of base64 encoded Opus packets
            sample_rate: Sample rate in Hz
            channels: Number of audio channels
            
        Returns:
            Combined PCM16 audio data (16-bit signed integers)
        """
        try:
            self._validate_audio_params(sample_rate, channels)
            
            if not OPUS_AVAILABLE:
                raise RuntimeError("opuslib not available - cannot decode Opus")
            
            # Create a fresh decoder for this stream
            # Important: Use the same decoder for all chunks to maintain state
            decoder = opuslib.Decoder(sample_rate, channels)
            frame_size = self._calculate_frame_size(sample_rate)
            
            pcm_chunks = []
            successful_decodes = 0
            
            # Process each base64 Opus chunk
            for i, base64_chunk in enumerate(opus_base64_chunks):
                try:
                    # Decode base64 to binary
                    opus_binary = base64.b64decode(base64_chunk)
                    
                    # Log packet info for debugging
                    if i < 5:  # Log first few packets
                        logger.debug(f"Opus packet {i}: {len(opus_binary)} bytes, first bytes: {opus_binary[:8].hex()}")
                    
                    # Try to decode without validation first
                    # Let the decoder handle the packet format
                    try:
                        pcm_data = decoder.decode(opus_binary, frame_size)
                        pcm_chunks.append(pcm_data)
                        successful_decodes += 1
                    except Exception as decode_error:
                        # If decode fails, try with a different frame size
                        # Telnyx might use different frame sizes
                        if "buffer too small" in str(decode_error):
                            # Try with a larger frame size
                            try:
                                pcm_data = decoder.decode(opus_binary, frame_size * 2)
                                pcm_chunks.append(pcm_data)
                                successful_decodes += 1
                            except:
                                logger.warning(f"Failed to decode Opus chunk {i} even with larger frame: {decode_error}")
                                pcm_chunks.append(self._generate_silence_pcm16(sample_rate, channels))
                        else:
                            logger.warning(f"Failed to decode Opus chunk {i}: {decode_error}")
                            pcm_chunks.append(self._generate_silence_pcm16(sample_rate, channels))
                    
                except Exception as e:
                    logger.error(f"Error processing Opus chunk {i}: {e}")
                    pcm_chunks.append(self._generate_silence_pcm16(sample_rate, channels))
            
            # Combine all PCM chunks
            if pcm_chunks:
                combined_pcm = b''.join(pcm_chunks)
                logger.info(f"Converted {len(opus_base64_chunks)} Opus chunks to {len(combined_pcm)} bytes of PCM16 ({successful_decodes}/{len(opus_base64_chunks)} successful)")
                return combined_pcm
            else:
                return b''
                
        except Exception as e:
            logger.error(f"Opus chunks to PCM16 conversion failed: {e}")
            raise ValueError(f"Opus chunks to PCM16 conversion failed: {str(e)}")


# Global converter instance
converter = AudioConverter()


@app.post("/convert", response_model=AudioConversionResponse)
async def convert_audio(request: AudioConversionRequest):
    """
    Convert audio between PCM16 and Opus formats.
    
    - **audio_data**: Base64 encoded audio data (string for single chunk, list for multiple Opus chunks)
    - **input_format**: Either 'pcm16' or 'opus'
    - **sample_rate**: Sample rate in Hz (8000, 12000, 16000, 24000, 48000)
    - **channels**: Number of channels (1 or 2)
    """
    try:
        # Validate input format
        input_format = request.input_format.lower()
        if input_format not in ['pcm16', 'opus']:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail="Input format must be 'pcm16' or 'opus'"
            )
        
        # Handle different input types
        if input_format == 'pcm16':
            # PCM16 input should be a single base64 string
            if isinstance(request.audio_data, list):
                raise HTTPException(
                    status_code=status.HTTP_400_BAD_REQUEST,
                    detail="PCM16 input must be a single base64 string, not a list"
                )
            
            # Decode base64 input
            try:
                audio_bytes = base64.b64decode(request.audio_data)
            except Exception as e:
                raise HTTPException(
                    status_code=status.HTTP_400_BAD_REQUEST,
                    detail=f"Invalid base64 audio data: {str(e)}"
                )
            
            # Convert PCM16 to Opus (returns list of packets)
            opus_packets = converter.pcm16_to_opus(
                audio_bytes, 
                request.sample_rate, 
                request.channels
            )
            
            # Convert each packet to base64 for Telnyx format
            opus_b64_packets = [base64.b64encode(packet).decode('utf-8') for packet in opus_packets]
            
            # Return as JSON array string for easy consumption
            converted_base64 = json.dumps(opus_b64_packets)
            output_format = 'opus'
            packet_count = len(opus_packets)
            
            logger.info(f"Generated {packet_count} Opus packets for Telnyx")
            
        else:  # opus input
            # Handle both single string and list of chunks for Opus input
            if isinstance(request.audio_data, str):
                # Single Opus chunk
                try:
                    audio_bytes = base64.b64decode(request.audio_data)
                except Exception as e:
                    raise HTTPException(
                        status_code=status.HTTP_400_BAD_REQUEST,
                        detail=f"Invalid base64 audio data: {str(e)}"
                    )
                
                converted_bytes = converter.opus_to_pcm16(
                    audio_bytes, 
                    request.sample_rate, 
                    request.channels
                )
            else:
                # Multiple Opus chunks from Telnyx
                converted_bytes = converter.opus_chunks_to_pcm16(
                    request.audio_data,
                    request.sample_rate,
                    request.channels
                )
            
            # Encode result to base64
            converted_base64 = base64.b64encode(converted_bytes).decode('utf-8')
            output_format = 'pcm16'
            packet_count = None
            
            logger.info(f"Converted to PCM16: {len(converted_bytes)} bytes")
        
        return AudioConversionResponse(
            audio_data=converted_base64,
            output_format=output_format,
            sample_rate=request.sample_rate,
            channels=request.channels,
            success=True,
            message=f"Successfully converted {input_format} to {output_format}",
            packet_count=packet_count
        )
        
    except HTTPException:
        raise
    except ValueError as e:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=str(e)
        )
    except Exception as e:
        logger.error(f"Unexpected conversion error: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=f"Conversion failed: {str(e)}"
        )


@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "service": "Audio Converter",
        "version": "2.1.0",
        "opus_available": OPUS_AVAILABLE
    }


@app.get("/")
async def root():
    """Root endpoint with service information."""
    return {
        "service": "Audio Converter Service",
        "version": "2.1.0",
        "description": "Convert audio between PCM16 and Opus formats with base64 encoding",
        "endpoints": {
            "convert": "/convert (POST)",
            "health": "/health (GET)",
            "docs": "/docs (GET)"
        },
        "supported_formats": ["pcm16", "opus"],
        "supported_sample_rates": [8000, 12000, 16000, 24000, 48000],
        "supported_channels": [1, 2],
        "default_sample_rate": DEFAULT_SAMPLE_RATE,
        "default_channels": DEFAULT_CHANNELS,
        "notes": {
            "opus_input": "Can accept either a single base64 string or a list of base64 chunks",
            "pcm16_input": "Must be a single base64 string",
            "telnyx_support": "Optimized for Telnyx WebSocket media streaming"
        }
    }


def main():
    """Entry point for the audio converter service."""
    logger.info("Starting Audio Converter Service...")
    uvicorn.run(
        app, 
        host="0.0.0.0", 
        port=8000, 
        log_level="info"
    )


if __name__ == "__main__":
    main()

Rafal_Skorski · September 5, 2025, 9:18am

Hi @Paul_Diamant @Foxalabs ,

Telnyx now supports transcoding for all codec types (including L16) in bidirectional streaming.

You can configure this using two attributes:

stream_codec – defines the codec used for sending audio from Telnyx to your application.
bidirectional_codec – defines the codec expected by Telnyx when receiving audio.

If you want to use PCM Linear16, please set the value of both attributes to L16.

The codec used on the call will not be affected and will be transcoded from/to L16 to OPUS in your case.

Please let me know if you have any questions

Paul_Diamant · September 5, 2025, 9:38am

Is it a new update? So does that mean I don’t have to take care of conversations?

Rafal_Skorski · September 5, 2025, 10:21am

Yes.

Transcoding was available before for a limited number of combinations, basically from/to PCMA/PCMU.

Now, it is supported for all codecs. Additionally, support for PCM L16 codec was added.

Paul_Diamant · September 5, 2025, 11:02am

I set stream_bidirectional_codec to L16, and on openai session update config I set to pcm16:

input_audio_format: ‘pcm16’, // 24kHz, 16-bit, mono (HD quality)

    output_audio_format: 'pcm16', *// NOT g711_ulaw*

however It seems like the sampling rate is different, I’m hearing OpenAI’s voice in slow motion, what could I be doing wrong?

Rafal_Skorski · September 5, 2025, 12:25pm

Yes, that might be a question of sampling rate.

If OPUS is used in the call, the sampling rate is 16 kHz by default. This information is provided in the start frame:

{ 
 "event": "start",  
 "sequence_number": "1", 
 "start": {
   "user_id": "3E6F995F-85F7-4705-9741-53B116D28237", 
   "call_control_id": "v2:T02llQxIyaRkhfRKxgAP8nY511EhFLizdvdUKJiSw8d6A9BborherQ", 
   "call_session_id": "ff55a038-6f5d-11ef-9692-02420aeffb1f",
   "from": "+13122010094",
   "to": "+13122123456",
   "tags": ["TAG1", "TAG2"], 
   "client_state": "aGF2ZSBhIG5pY2UgZGF5ID1d",
   "media_format": { 
     "encoding": "OPUS",
     "sample_rate": 16000, 
     "channels": 1 
   } 
 },
 "stream_id": "32DE0DEA-53CB-4B21-89A4-9E1819C043BC"
}

However, it can also be changed using the bidirectional_sampling_rate parameter.

Please note that the downsampling process may affect audio quality, and we are currently working on improving it.

Therefore, we recommend generating audio requests from OpenAI at the same sample rate as specified in the start frame.

Paul_Diamant · September 5, 2025, 2:06pm

Where does the “bidirectional_sampling_rate” parameter goes? Doesn’t exist on telnyx.

Paul_Diamant · September 5, 2025, 2:08pm

{
“start”: {
“call_control_id”: “v3:FPoPb8qNBr2d27Jqcv–II2M0pxwN_pgiB7_82-IOA0oCKH-gFc0xA”,
“user_id”: “0256bd9f-4e3b-4546-a493-b3e34d71f42c”,
“to”: “+97223765720”,
“from”: “+972507796677”,
“tags”: ,
“client_state”: “eyJhdXRvQW5zd2VyZWQiOnRydWUsInN0cmVhbWluZyI6dHJ1ZX0=”,
“custom_parameters”: {},
“call_session_id”: “a00976ac-8a61-11f0-8bea-02420a1f0a69”,
“media_format”: {
“channels”: 1,
“encoding”: “L16”,
“sample_rate”: 8000
}
}
}

currently it shows me 8000.. I need to upsample it because OpenAI supporst 24ghz…

Paul_Diamant · September 6, 2025, 7:39pm

Any idea what to do, because the sample rate that comes from telnyx is 8000 , and openai requires 24000…

z33dd · September 10, 2025, 8:09pm

Out of curiosity, how were you able to use the OpenAI Realtime API with Telnyx?

I’m trying to migrate a Twilio application, but I’m having tons of issues with Telnyx closing the websocket out of nothing when using TeXML + Media Streams.

Btw, I think you can set the sample rate in Telnyx and OpenAI to 16 kHz

Paul_Diamant · September 10, 2025, 8:23pm

I couldn’t find anything in the docs related to setting sample rate, got discord? I can share with you my working code as well.

Rafal_Skorski · September 12, 2025, 7:15am

The parameter bidirectional_sampling_rate sets the sample rate. By default, it will be 8kHz,

It is not documented. We will add it to the documentation soon.

Paul_Diamant · September 12, 2025, 11:27am

Added “bidirectional_sampling_rate” in the answer call api , still returns 8000 sample rate.

 await this.telnyxService.answerCall(callControlId, {
          stream_url: this.telnyxService.generateStreamingWebSocketUrl(),
          stream_track: 'both_tracks',
          stream_codec: 'default',
          stream_bidirectional_mode: 'rtp',
          stream_bidirectional_codec: 'L16',
          bidirectional_sampling_rate: '24000',
          send_silence_when_idle: true,
          record: 'record-from-answer',
          webhook_url: `${process.env.REMOTE_URL}/telnyx-call-webhook`,
          client_state: btoa(
            JSON.stringify({ autoAnswered: true, streaming: true }),
          ),
        } as any);

Topic		Replies	Views
Playing audio in JS sent from realtime API API realtime	14	9354	September 6, 2025
Problems using session.update with the realtime-api (issue with "input_audio_transcription") Bugs api-realtime , api-realtime-speech	10	3518	October 15, 2024
Streaming from Text-to-Speech api API api , python , tts	53	56511	January 21, 2025
Chat completions audio output but not base64 encoded string API chat-completion , speech	5	515	October 10, 2025
Realtime API only works partially API java , realtime , api-realtime , api-realtime-speech	7	1652	October 25, 2024

PCM16 to Opus Conversion Working But Silent Audio in Telnyx WebSocket Calls

Related topics