Proxying Media Streams via Kamailio’s RTP Proxy

Kamailio is a SIP server which does not play any role by itself in media transmission path. this behaviour leads to media packets having to attempt to stream peer to peer between caller and callee which in turn many a times causes them to get dropped in absence of NAT management

To ensure that media stream is proxied via an RTP proxy kamailio can use RTP proxy module combined with a RTP proxy.

This setup also provides other benefits such as controlling media media , security , Load balancing between many rtp proxies ,bridge signalling between multiple network interfaces etc.

RTP Proxy module

Used to proxy the media stream .

RTP proxies that can be used along with this module are:

RTP proxies can be used for bridging network interfaces , load distribution and balancing etc.It does not support transcoding.

Parameters :

rtpproxy_sock – binds a ip and port for rtp proxy

 modparam("rtpproxy", "rtpproxy_sock", "udp:localhost:12221")

rtpproxy_disable_tout – when rtp proxy is disabled then timeout till when it doesnt connect

rtpproxy_tout – timeout to wait for reply

rtpproxy_retr – num of retries after timeout

nortpproxy_str – sets the SDP attribute used by rtpproxy to mark the message’s SDP attachment with information that it have already been changed. Default value is


and others like


timeout_socket (string)

ice_candidate_priority_avp (string)

extra_id_pv (string)

db_url (string)

table_name (string)

rtp_inst_pvar (string)


set_rtp_proxy_set(setid) – Sets the Id of the rtpproxy set to be used for the next unforce_rtp_proxy(), rtpproxy_offer(), rtpproxy_answer() or rtpproxy_manage() command

rtpproxy_offer([flags [, ip_address]]) – to make the media pass through RTP the SDP is altered. Value of flag can be
1 – append first Via branch to Call-ID when sending command to rtpproxy.
2 – append second Via branch to Call-ID when sending command to rtpproxy. See flag ‘1’ for its meaning.
3 – behave like flag 1 is set for a request and like flag 2 is set for a reply
a – flags that UA from which message is received doesn’t support symmetric RTP. (automatically sets the ‘r’ flag)
b – append branch specific variable to Call-ID when sending command to rtpproxy
l – force “lookup”, that is, only rewrite SDP when corresponding session already exists in the RTP proxy
i, e – direction of the SIP message when rtpproxy is running in bridge mode. ‘i’ is internal network (LAN), ‘e’ is external network (WAN). Values ie , ei , ee and ii
x – shortcut for using the “ie” or “ei”-flags, to do automatic bridging between IPv4 on the “internal network” and IPv6 on the “external network”. Differentiated by IP type in the SDP, e.g. a IPv4 Address will always call “ie” to the RTPProxy (IPv4(i) to IPv6(e)) and an IPv6Address will always call “ei” to the RTPProxy (IPv6(e) to IPv4(i))
f – instructs rtpproxy to ignore marks inserted by another rtpproxy in transit to indicate that the session is already gone through another proxy. Allows creating a chain of proxies
r – IP address in SDP should be trusted. Without this flag, rtpproxy ignores address in the SDP and uses source address of the SIP message as media address which is passed to the RTP proxy
o – flags that IP from the origin description (o=) should be also changed.
c – flags to change the session-level SDP connection (c=) IP if media-description also includes connection information.
w – flags that for the UA from which message is received, support symmetric RTP must be forced.
zNN – perform re-packetization of RTP traffic coming from the UA which has sent the current message to increase or decrease payload size per each RTP packet forwarded if possible. The NN is the target payload size in ms, for the most codecs its value should be in 10ms increments, however for some codecs the increment could differ (e.g. 30ms for GSM or 20ms for G.723).
ip_address denotes the address of new SDP

such as : rtpproxy_offer(“FRWOC+PS”) is
rtpengine_offer(“force trust-address symmetric replace-origin replace-session-connection ICE=force RTP/SAVPF”);

route { 
if (is_method("INVITE")) 
    if (has_body("application/sdp")) 
        if (rtpproxy_offer()) t_on_reply("1"); 
    } else { 

if (is_method("ACK") && has_body("application/sdp")) rtpproxy_answer(); 
onreply_route[1] { 
   if (has_body("application/sdp")) rtpproxy_answer(); 
onreply_route[2] { 
   if (has_body("application/sdp")) rtpproxy_offer(); 

rtpproxy_answer([flags [, ip_address]])- rewrite SDP to proxy media , it can be used from REQUEST_ROUTE, ONREPLY_ROUTE, FAILURE_ROUTE, BRANCH_ROUTE.

rtpproxy_destroy([flags]) – tears down RTP proxy session for current call. Flags are ,
1 – append first Via branch to Call-ID
2 – append second Via branch to Call-ID
b – append branch specific variable to Call-ID
t – do not include To tag to “delete” command to rtpproxy thus causing full call to be deleted


rtpproxy_manage([flags [, ip_address]]) – Functionality is to use predfined logic for handling requests
If INVITE with SDP, then do rtpproxy_offer()
If INVITE with SDP, when the tm module is loaded, mark transaction with internal flag FL_SDP_BODY to know that the 1xx and 2xx are for rtpproxy_answer()
If ACK with SDP, then do rtpproxy_answer()
If BYE or CANCEL, or called within a FAILURE_ROUTE[], then call unforce_rtpproxy().
If reply to INVITE with code >= 300 do unforce_rtpproxy()
If reply with SDP to INVITE having code 1xx and 2xx, then do rtpproxy_answer() if the request had SDP or tm is not loaded, otherwise do rtpproxy_offer()
This function can be used from ANY_ROUTE.

rtpproxy_stream2uac(prompt_name, count) – stream prompt/announcement pre-encoded with the makeann command. The uac/uas suffix selects who will hear the announcement relatively to the current transaction – UAC or UAS. Also used for music on hold (MOH).
Params : prompt_name – path name of the prompt to stream
count – number of times the prompt should be repeated. When count is -1, the streaming will be in loop indefinitely until the appropriate rtpproxy_stop_stream2xxx is issued.
Example rtpproxy_stream2xxx usage

if (is_method("INVITE")) { 
if (is_audio_on_hold()) {
rtpproxy_stream2uas("/var/rtpproxy/prompts/music_on_hold", "-1");
} else {

rtpproxy_stream2uas(prompt_name, count)

rtpproxy_stop_stream2uac()- Stop streaming of announcement/prompt/MOH



Exported Pseudo Variables


RPC Commands


Ref :

WebRTC Media Streams and Quality metrics

Media Stream Tracks in WebRTC

The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

Media Flow in VoIP system
Media Flow in WebRTC Call

Video Streams

Video Capture insync with hardware’s capabilities

WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

Video Capture Resolution

Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

SDP attributes for resolution, frame rate, and bitrate

SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

Dynamic FPS control based on actual hardware encoding

video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

Stream Orientation

support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer.

Audio Streams

Audio Level

Audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

Normalization is considering frequencies above 300 Hz, regardless of the sampling rate used. Can be adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

GAIN calculation

  • If the endpoint has control over the entire audio-capture path like a regular phone
    the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
  • If the endpoint does not have control over the entire audio capture like software endpoint
    then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
  • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

Acoustic Echo Cancellation (AEC)

Endpoints allow echo control mechanisms

SDP signaling and negotiation for media plane

Media plane adaptation is done at the SBC for network carried media, it should be done for all network hosted media services which face peer-to-peer media.

The high-level architecture elements of WebRTC media streams consists of

  • Encryption, RTP Multiplexing, Support for ICE
  • Audio – Interworking of differing WebRTC and codec sets
  • Video – Use of VP8, Support for H.264
  • Data – Support of MSRP ( RCS standard for messaging over DataChannel API)

Media Source

RTCVideoSource_4 (media-source)

timestamp	03/01/2022, 23:07:05
trackIdentifier	1bcab53d-1eca-41d1-a96a-00f1458c9b1b
kind	video
width	640
height	480
frames	7556
framesPerSecond	30

RTCAudioSource_3 (media-source)

timestamp	        03/01/2022, 23:06:26
trackIdentifier	        12cb979c-b40f-4de7-8b50-be6f4425e0b2
kind	                audio
audioLevel	        0.020599993896298106
totalAudioEnergy	1.8476431267450812
[Audio_Level_in_RMS]	0.02541394245734895
totalSamplesDuration	213.66999999995065
echoReturnLoss	        -0.11197675950825214
echoReturnLossEnhancement 8.111690521240234

Peer-to-Peer Media Stream

Direct connection to media servers and media gateways.

Use common codec set wherever possible to eliminate transcoding —Use regionalized transcoding where common codec not available Real-time video transcoding is expensive and performance impacting.

On-going standards/device/network work needs to be done to expand common codec set. WebRTC codec standards have not been finalized yet. WebRTC target is to support royalty free codecs within its standards.

AudioG.711, OpusG.711, AMR, AMR-WB (G.722.2)
Audio – ExtendedG.729a[b], G.726

Supporting common codecs between VoLTE devices and WebRTC endpoints requires one or more of the following:

  1. Support of WebRTC codecs on 3GPP/GSMA
  2. Support of 3GPP/GSMA codecs on WebRTC
  3. WebRTC browser support of codecs native to the device

RTP streams and RTCP stats Outbound Video

Chrome Browser on ubuntu OS

RTCOutboundRTPVideoStream_3305924664 (outbound-rtp)

timestamp	03/01/2022, 22:23:32
ssrc	3305924664
kind	video
trackId	RTCMediaStreamTrack_sender_4
transportId	RTCTransport_0_1
codecId	RTCCodec_1_Outbound_96
[codec]	VP8 (96)
mediaType	video
mediaSourceId	RTCVideoSource_4
packetsSent	171360
[packetsSent/s]	204.02266754223697
retransmittedPacketsSent	620
[retransmittedPacketsSent/s]	0
bytesSent	177210957
[bytesSent_in_bits/s]	1680050.6587655507
headerBytesSent	4218672
[headerBytesSent_in_bits/s]	39812.423281967494
retransmittedBytesSent	668008
[retransmittedBytesSent_in_bits/s]	0
framesEncoded	22003
[framesEncoded/s]	30.00333346209367
keyFramesEncoded	14
totalEncodeTime	418.017
[totalEncodeTime/framesEncoded_in_ms]	9.533333333333378
totalEncodedBytesTarget	0
[totalEncodedBytesTarget_in_bits/s]	0
framesSent	22003
[framesSent/s]	30.00333346209367
hugeFramesSent	1
totalPacketSendDelay	29963.73
[totalPacketSendDelay/packetsSent_in_ms]	31.62745098039772
qualityLimitationReason	none
qualityLimitationDurations	{bandwidth:0,cpu:174895,none:717684,other:0}
qualityLimitationResolutionChanges	0
encoderImplementation	libvpx
firCount	0
pliCount	2
nackCount	161
remoteId	RTCRemoteInboundRtpVideoStream_3305924664
frameWidth	640
frameHeight	480
framesPerSecond	30
qpSum	151000
[qpSum/framesEncoded]	9.3

RTCP statistics RTCRemoteInboundRtpVideoStream_3305924664 (remote-inbound-rtp)

timestamp	03/01/2022, 22:25:29
ssrc	984864038
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
jitter	0.026854166666666665
packetsLost	19
localId	RTCOutboundRTPAudioStream_984864038
roundTripTime	0.048
fractionLost	0
totalRoundTripTime	8.932
roundTripTimeMeasurements	201



After considerable time( 10 minutes in my case ) the quality of the media stream adjust to network conditions and variations ( peaks and dips) flat out.

after some time
after some time has passed
after some time


After some time

Bytes Send And Received

After some time has passes


after some time has passes

Outbound Audio from Ubuntu Chrome Browser

RTCOutboundRTPAudioStream_984864038 (outbound-rtp)

timestamp 03 / 01 / 2022, 22: 13: 26
ssrc 984864038
kind audio
trackId RTCMediaStreamTrack_sender_3
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
    [codec] opus(111, minptime = 10; useinbandfec = 1)
mediaType audio
mediaSourceId RTCAudioSource_3
packetsSent 14292
    [packetsSent / s] 50.003051944088384
retransmittedPacketsSent 0
    [retransmittedPacketsSent / s] 0
bytesSent 1151754
    [bytesSent_in_bits / s] 32449.980589635597
headerBytesSent 400176
    [headerBytesSent_in_bits / s] 11200.683635475798
retransmittedBytesSent 0
    [retransmittedBytesSent_in_bits / s] 0
nackCount 0
remoteId RTCRemoteInboundRtpAudioStream_984864038

RTCP statistics RTCRemoteInboundRtpAudioStream_984864038 (remote-inbound-rtp)

timestamp	03/01/2022, 22:17:05
ssrc	984864038
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
jitter	0.002
packetsLost	3
localId	RTCOutboundRTPAudioStream_984864038
roundTripTime	0.023
fractionLost	0
totalRoundTripTime	4.344
roundTripTimeMeasurements 98	

Inbound Video from Android Webrtc Browser

RTCInboundRTPVideoStream_3384287918 (inbound-rtp)

timestamp 03 / 01 / 2022, 22: 55: 35
ssrc 3384287918
kind video
trackId RTCMediaStreamTrack_receiver_4
transportId RTCTransport_0_1
mediaType video
jitter 0.027
packetsLost 78
packetsReceived 79545
    [packetsReceived / s] 0
bytesReceived 77156700
    [bytesReceived_in_bits / s] 0
headerBytesReceived 1978716
    [headerBytesReceived_in_bits / s] 0
jitterBufferDelay 2284.024
    [jitterBufferDelay / jitterBufferEmittedCount_in_ms] 0
jitterBufferEmittedCount 13100
framesReceived 13101
    [framesReceived / s] 0[framesReceived - framesDecoded] 0
framesDecoded 13101
    [framesDecoded / s] 0
keyFramesDecoded 1
    [keyFramesDecoded / s] 0
framesDropped 0
totalDecodeTime 94.229
    [totalDecodeTime / framesDecoded_in_ms] 0
totalInterFrameDelay 442.0259999999831
    [totalInterFrameDelay / framesDecoded_in_ms] 0
totalSquaredInterFrameDelay 20.370232000000772
    [interFrameDelayStDev_in_ms] 0
decoderImplementation libvpx
firCount 0
pliCount 2
nackCount 51
codecId RTCCodec_1_Inbound_96
    [codec] VP8(96)
lastPacketReceivedTimestamp 1641276962171
    [lastPacketReceivedTimestamp] 03 / 01 / 2022, 22: 16: 02
frameWidth 480
frameHeight 640
framesPerSecond 4
qpSum 97949
    [qpSum / framesDecoded] 0
estimatedPlayoutTimestamp 3850268134980
    [estimatedPlayoutTimestamp] 03 / 01 / 2092, 22: 55: 42


Inbound Audio from Android Webrtc Browser

RTCInboundRTPAudioStream_579305270 (inbound-rtp)

timestamp 03 / 01 / 2022, 22: 50: 14
ssrc 579305270
kind audio
trackId RTCMediaStreamTrack_receiver_3
transportId RTCTransport_0_1
mediaType audio
jitter 0.003
packetsLost 208
packetsDiscarded 0
packetsReceived 124469
    [packetsReceived / s] 50.03320990953163
fecPacketsReceived 0
fecPacketsDiscarded 0
bytesReceived 4433321
    [bytesReceived_in_bits / s] 14209.431614306981
headerBytesReceived 3485132
    [headerBytesReceived_in_bits / s] 11207.439019735084
jitterBufferDelay 17887008
    [jitterBufferDelay / jitterBufferEmittedCount_in_ms] 113.79999999996896
jitterBufferEmittedCount 119485440
totalSamplesReceived 118645920
    [totalSamplesReceived / s] 48031.88151315036
concealedSamples 689415
    [concealedSamples / s] 0[concealedSamples / totalSamplesReceived] 0
silentConcealedSamples 338882
    [silentConcealedSamples / s] 0
concealmentEvents 230
insertedSamplesForDeceleration 33841
    [insertedSamplesForDeceleration / s] 0
removedSamplesForAcceleration 1562246
    [removedSamplesForAcceleration / s] 0
totalAudioEnergy 4.458078675648182
    [Audio_Level_in_RMS] 0
totalSamplesDuration 2472.2900000075438
codecId RTCCodec_0_Inbound_111
    [codec] opus(111, minptime = 10; useinbandfec = 1)
lastPacketReceivedTimestamp 1641279014658
    [lastPacketReceivedTimestamp] 03 / 01 / 2022, 22: 50: 14
audioLevel 0
remoteId RTCRemoteOutboundRTPAudioStream_579305270
estimatedPlayoutTimestamp 3850267813642

RTCP statistics RTCRemoteOutboundRTPAudioStream_579305270 (remote-outbound-rtp)

timestamp 03 / 01 / 2022, 22: 48: 47
ssrc 579305270
kind audio
transportId RTCTransport_0_1
codecId RTCCodec_0_Inbound_111
packetsSent 120306
bytesSent 4285534
localId RTCInboundRTPAudioStream_579305270
remoteTimestamp 1641278927459
    [remoteTimestamp] 03 / 01 / 2022, 22: 48: 47
reportsSent 480
roundTripTimeMeasurements 0
totalRoundTripTime 0

Comparision of Media stream QoS metrics between laptop browser and mobile browser

chrome browser on laptopmobile chorme browser
higher frame received ( 30)lower frame received (20)
lower jitter (0.002) and packet loss (3)higher jitter (0.003) and packet loss (208)

Bundled Streams

Same port used for all emdia stream. Fir exmaple port 9 is used for audio video as well as their RTCP feedbacks in snippet below.

a=group:BUNDLE 0 1
a=msid-semantic: WMS kAGMqdVh7lL70CVUVZQblgjPYsuhOAiGY3ii
m=audio 9 UDP/TLS/RTP/SAVPF 111 63 103 104 9 0 8 106 105 13 110 112 113 126 (33 more lines)
a=rtcp:9 IN IP4
a=msid:kAGMqdVh7lL70CVUVZQblgjPYsuhOAiGY3ii ed96e925-4425-467b-a099-8fb2e0c67b88
m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 35 36 120 119 124 (100 more lines)
a=rtcp:9 IN IP4
a=msid:kAGMqdVh7lL70CVUVZQblgjPYsuhOAiGY3ii f45d3002-2866-44ce-a807-ac59a4f6708c

Same CSRC : To run multiple streams of media over a single RTP stream, a common SSRC is used.

a=ssrc:1683985800 cname:OYWXQ35YL2Hh+eUX
a=ssrc:1384669066 cname:OYWXQ35YL2Hh+eUX

Peer to Peer Data Transfer

Data Channel API of Webrtc allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.

  • (+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
  • (+) build in security due to p2p transfer
  • (+) high throughput than text transfer via a messaging server
  • (+) lower latency as p2p transfer takes shortest route

SCTP is the protocol that opens connectiosn for peer to peer data channel support in WebRTC. It can be configured for reliability and ordered delivery. It provides flow and congestion control to the data messages.

Data Channel Metrics

timestamp  03/01/2022, 23:13:13
label	   sctp
dataChannelIdentifier	1
state	    open
messagesSent	42
[messagesSent/s]	0
bytesSent   1962750
[bytesSent_in_bits/s]	0
messagesReceived	31
[messagesReceived/s]	0
bytesReceived	4712
[bytesReceived_in_bits/s]	0
After sharing 2 files of 1.5 Mb each


Webrtc Changes bitrate , resolution and framerate dynamically to accomodate the network conditions, policy constraints or user equipment capability. Higher the bitrate, higher the media quality.

Birate of Audio Codecs

Lossey formats
– iLBC (narrow band )13.33, 15.20 kbit/s
– iSAC ( wideband) 10–52 kbit/s
– GSM-EFR 12.2 kbit/s
– AAC 8–529 kbit/s (stereo)
– AMR-WB (G.722.2) 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s
– Opus – 6–510 kbit/s(-) higher bitrate consumes more bandwidth
(-) can cause congestion on network route

ITU-T formats
– G711 64kbps
– G.711.1 ( MDCT, A-law, μ-law) 64, 80, 96 kbit/s
– G.722 64 kbit/s (comprises 48, 56 or 64 kbit/s audio
and 16, 8 or 0 kbit/s auxiliary data)

Lossless formats (such as Dolby trueHD, MPEG-4 ALS)
consume much larger bitrates.

Bitrate of Video

QVGA 200-500 kbps
VGA 400 – 800 kbps
720p+ > 800 kbps
4K( 60fps) > 20 mbps

Packet Loss

Packet loss can cause choppy audio and distorted, blurry or frozen video.

Audio Packet loss

Video Packet loss


Jitter is the packet delay variation in an otherwise predictable normal rate of delay. This could indicate route changes, growing congestion etc.


Jitter fo Audio


Jitter for Video

Round Trip Time

High RTT is indicative of network congestion and causes delays.

Audio RTT

Video RTT

Cummulative Analysis of packet lost , RTT measurement and total RTT on an internetwork scenarios with peerfelxive and relay ICE candidates

Deeper Analysis of fraction lost , jitter and RTT

The chart shows how jitter follows RTT

References :

Read more on SDP and its attributes