Proxying Media Streams via Kamailio’s RTP Proxy

Kamailio is a SIP server which does not play any role by itself in media transmission path. this behaviour leads to media packets having to attempt to stream peer to peer between caller and callee which in turn many a times causes them to get dropped in absence of NAT management

To ensure that media stream is proxied via an RTP proxy kamailio can use RTP proxy module combined with a RTP proxy.

This setup also provides other benefits such as controlling media media , security , Load balancing between many rtp proxies ,bridge signalling between multiple network interfaces etc.

RTP Proxy module

Used to proxy the media stream .

RTP proxies that can be used along with this module are:

RTP proxies can be used for bridging network interfaces , load distribution and balancing etc.It does not support transcoding.

Parameters :

rtpproxy_sock – binds a ip and port for rtp proxy

 modparam("rtpproxy", "rtpproxy_sock", "udp:localhost:12221")

rtpproxy_disable_tout – when rtp proxy is disabled then timeout till when it doesnt connect

rtpproxy_tout – timeout to wait for reply

rtpproxy_retr – num of retries after timeout

nortpproxy_str – sets the SDP attribute used by rtpproxy to mark the message’s SDP attachment with information that it have already been changed. Default value is


and others like


timeout_socket (string)

ice_candidate_priority_avp (string)

extra_id_pv (string)

db_url (string)

table_name (string)

rtp_inst_pvar (string)


set_rtp_proxy_set(setid) – Sets the Id of the rtpproxy set to be used for the next unforce_rtp_proxy(), rtpproxy_offer(), rtpproxy_answer() or rtpproxy_manage() command

rtpproxy_offer([flags [, ip_address]]) – to make the media pass through RTP the SDP is altered. Value of flag can be
1 – append first Via branch to Call-ID when sending command to rtpproxy.
2 – append second Via branch to Call-ID when sending command to rtpproxy. See flag ‘1’ for its meaning.
3 – behave like flag 1 is set for a request and like flag 2 is set for a reply
a – flags that UA from which message is received doesn’t support symmetric RTP. (automatically sets the ‘r’ flag)
b – append branch specific variable to Call-ID when sending command to rtpproxy
l – force “lookup”, that is, only rewrite SDP when corresponding session already exists in the RTP proxy
i, e – direction of the SIP message when rtpproxy is running in bridge mode. ‘i’ is internal network (LAN), ‘e’ is external network (WAN). Values ie , ei , ee and ii
x – shortcut for using the “ie” or “ei”-flags, to do automatic bridging between IPv4 on the “internal network” and IPv6 on the “external network”. Differentiated by IP type in the SDP, e.g. a IPv4 Address will always call “ie” to the RTPProxy (IPv4(i) to IPv6(e)) and an IPv6Address will always call “ei” to the RTPProxy (IPv6(e) to IPv4(i))
f – instructs rtpproxy to ignore marks inserted by another rtpproxy in transit to indicate that the session is already gone through another proxy. Allows creating a chain of proxies
r – IP address in SDP should be trusted. Without this flag, rtpproxy ignores address in the SDP and uses source address of the SIP message as media address which is passed to the RTP proxy
o – flags that IP from the origin description (o=) should be also changed.
c – flags to change the session-level SDP connection (c=) IP if media-description also includes connection information.
w – flags that for the UA from which message is received, support symmetric RTP must be forced.
zNN – perform re-packetization of RTP traffic coming from the UA which has sent the current message to increase or decrease payload size per each RTP packet forwarded if possible. The NN is the target payload size in ms, for the most codecs its value should be in 10ms increments, however for some codecs the increment could differ (e.g. 30ms for GSM or 20ms for G.723).
ip_address denotes the address of new SDP

such as : rtpproxy_offer(“FRWOC+PS”) is
rtpengine_offer(“force trust-address symmetric replace-origin replace-session-connection ICE=force RTP/SAVPF”);

route { 
if (is_method("INVITE")) 
    if (has_body("application/sdp")) 
        if (rtpproxy_offer()) t_on_reply("1"); 
    } else { 

if (is_method("ACK") && has_body("application/sdp")) rtpproxy_answer(); 
onreply_route[1] { 
   if (has_body("application/sdp")) rtpproxy_answer(); 
onreply_route[2] { 
   if (has_body("application/sdp")) rtpproxy_offer(); 

rtpproxy_answer([flags [, ip_address]])- rewrite SDP to proxy media , it can be used from REQUEST_ROUTE, ONREPLY_ROUTE, FAILURE_ROUTE, BRANCH_ROUTE.

rtpproxy_destroy([flags]) – tears down RTP proxy session for current call. Flags are ,
1 – append first Via branch to Call-ID
2 – append second Via branch to Call-ID
b – append branch specific variable to Call-ID
t – do not include To tag to “delete” command to rtpproxy thus causing full call to be deleted


rtpproxy_manage([flags [, ip_address]]) – Functionality is to use predfined logic for handling requests
If INVITE with SDP, then do rtpproxy_offer()
If INVITE with SDP, when the tm module is loaded, mark transaction with internal flag FL_SDP_BODY to know that the 1xx and 2xx are for rtpproxy_answer()
If ACK with SDP, then do rtpproxy_answer()
If BYE or CANCEL, or called within a FAILURE_ROUTE[], then call unforce_rtpproxy().
If reply to INVITE with code >= 300 do unforce_rtpproxy()
If reply with SDP to INVITE having code 1xx and 2xx, then do rtpproxy_answer() if the request had SDP or tm is not loaded, otherwise do rtpproxy_offer()
This function can be used from ANY_ROUTE.

rtpproxy_stream2uac(prompt_name, count) – stream prompt/announcement pre-encoded with the makeann command. The uac/uas suffix selects who will hear the announcement relatively to the current transaction – UAC or UAS. Also used for music on hold (MOH).
Params : prompt_name – path name of the prompt to stream
count – number of times the prompt should be repeated. When count is -1, the streaming will be in loop indefinitely until the appropriate rtpproxy_stop_stream2xxx is issued.
Example rtpproxy_stream2xxx usage

if (is_method("INVITE")) { 
if (is_audio_on_hold()) {
rtpproxy_stream2uas("/var/rtpproxy/prompts/music_on_hold", "-1");
} else {

rtpproxy_stream2uas(prompt_name, count)

rtpproxy_stop_stream2uac()- Stop streaming of announcement/prompt/MOH



Exported Pseudo Variables


RPC Commands


Ref :

WebRTC Media Streams and Quality metrics

Media Stream Tracks in WebRTC

The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

Media Flow in VoIP system
Media Flow in WebRTC Call

Video Streams

Video Capture insync with hardware’s capabilities

WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

Video Capture Resolution

Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

SDP attributes for resolution, frame rate, and bitrate

SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

Dynamic FPS control based on actual hardware encoding

video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

Stream Orientation

support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer.

Audio Streams

Audio Level

audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

Normalization considering frequencies above 300 Hz, regardless of the sampling rate used.

Adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

GAIN calculation

  • If the endpoint has control over the entire audio-capture path like a regular phone
    the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
  • If the endpoint does not have control over the entire audio capture like software endpoint
    then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
  • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

Acoustic Echo Cancellation (AEC)

Endpoints shoud allow echo control mechanisms

SDP signaling and negotiation for media plane

Media plane adaptation is done at the SBC for network carried media, it should be done for all network hosted media services which face peer-to-peer media.

The high-level architecture elements of WebRTC media streams consists of

  • Encryption, RTP Multiplexing, Support for ICE
  • Audio – Interworking of differing WebRTC and codec sets
  • Video – Use of VP8, Support for H.264
  • Data – Support of MSRP ( RCS standard for messaging over DataChannel API)

Media Source

RTCVideoSource_4 (media-source)

timestamp	03/01/2022, 23:07:05
trackIdentifier	1bcab53d-1eca-41d1-a96a-00f1458c9b1b
kind	video
width	640
height	480
frames	7556
framesPerSecond	30

RTCAudioSource_3 (media-source)

timestamp	        03/01/2022, 23:06:26
trackIdentifier	        12cb979c-b40f-4de7-8b50-be6f4425e0b2
kind	                audio
audioLevel	        0.020599993896298106
totalAudioEnergy	1.8476431267450812
[Audio_Level_in_RMS]	0.02541394245734895
totalSamplesDuration	213.66999999995065
echoReturnLoss	        -0.11197675950825214
echoReturnLossEnhancement 8.111690521240234

Peer-to-Peer Media Stream

Direct connection to media servers and media gateways.

Use common codec set wherever possible to eliminate transcoding —Use regionalized transcoding where common codec not available Real-time video transcoding is expensive and performance impacting.

On-going standards/device/network work needs to be done to expand common codec set. WebRTC codec standards have not been finalized yet. WebRTC target is to support royalty free codecs within its standards.

AudioG.711, OpusG.711, AMR, AMR-WB (G.722.2)
Audio – ExtendedG.729a[b], G.726

Supporting common codecs between VoLTE devices and WebRTC endpoints requires one or more of the following:

  1. Support of WebRTC codecs on 3GPP/GSMA
  2. Support of 3GPP/GSMA codecs on WebRTC
  3. WebRTC browser support of codecs native to the device

Outbound Video from Ubuntu chrome Webrtc browser

RTCOutboundRTPVideoStream_3305924664 (outbound-rtp)

timestamp	03/01/2022, 22:23:32
ssrc	3305924664
kind	video
trackId	RTCMediaStreamTrack_sender_4
transportId	RTCTransport_0_1
codecId	RTCCodec_1_Outbound_96
[codec]	VP8 (96)
mediaType	video
mediaSourceId	RTCVideoSource_4
packetsSent	171360
[packetsSent/s]	204.02266754223697
retransmittedPacketsSent	620
[retransmittedPacketsSent/s]	0
bytesSent	177210957
[bytesSent_in_bits/s]	1680050.6587655507
headerBytesSent	4218672
[headerBytesSent_in_bits/s]	39812.423281967494
retransmittedBytesSent	668008
[retransmittedBytesSent_in_bits/s]	0
framesEncoded	22003
[framesEncoded/s]	30.00333346209367
keyFramesEncoded	14
totalEncodeTime	418.017
[totalEncodeTime/framesEncoded_in_ms]	9.533333333333378
totalEncodedBytesTarget	0
[totalEncodedBytesTarget_in_bits/s]	0
framesSent	22003
[framesSent/s]	30.00333346209367
hugeFramesSent	1
totalPacketSendDelay	29963.73
[totalPacketSendDelay/packetsSent_in_ms]	31.62745098039772
qualityLimitationReason	none
qualityLimitationDurations	{bandwidth:0,cpu:174895,none:717684,other:0}
qualityLimitationResolutionChanges	0
encoderImplementation	libvpx
firCount	0
pliCount	2
nackCount	161
remoteId	RTCRemoteInboundRtpVideoStream_3305924664
frameWidth	640
frameHeight	480
framesPerSecond	30
qpSum	151000
[qpSum/framesEncoded]	9.3

RTCP statistics RTCRemoteInboundRtpVideoStream_3305924664 (remote-inbound-rtp)

timestamp	03/01/2022, 22:25:29
ssrc	984864038
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
jitter	0.026854166666666665
packetsLost	19
localId	RTCOutboundRTPAudioStream_984864038
roundTripTime	0.048
fractionLost	0
totalRoundTripTime	8.932
roundTripTimeMeasurements	201



After considerable time( 10 minutes in my case ) the quality of the media stream adjust to network conditions and variations ( peaks and dips) flat out.

after some time
after some time has passed
after some time


After some time


After some time has passes


after some time has passes

Outbound Audio from Ubuntu Chrome Browser

RTCOutboundRTPAudioStream_984864038 (outbound-rtp)

timestamp	03/01/2022, 22:13:26
ssrc	984864038
kind	audio
trackId	RTCMediaStreamTrack_sender_3
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
[codec]	opus (111, minptime=10;useinbandfec=1)
mediaType	audio
mediaSourceId	RTCAudioSource_3
packetsSent	14292
[packetsSent/s]	50.003051944088384
retransmittedPacketsSent	0
[retransmittedPacketsSent/s]	0
bytesSent	1151754
[bytesSent_in_bits/s]	32449.980589635597
headerBytesSent	400176
[headerBytesSent_in_bits/s]	11200.683635475798
retransmittedBytesSent	0
[retransmittedBytesSent_in_bits/s]	0
nackCount	0
remoteId	RTCRemoteInboundRtpAudioStream_984864038

RTCP statistics RTCRemoteInboundRtpAudioStream_984864038 (remote-inbound-rtp)

timestamp	03/01/2022, 22:17:05
ssrc	984864038
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
jitter	0.002
packetsLost	3
localId	RTCOutboundRTPAudioStream_984864038
roundTripTime	0.023
fractionLost	0
totalRoundTripTime	4.344
roundTripTimeMeasurements 98	

Inbound Video from Android Webrtc Browser

RTCInboundRTPVideoStream_3384287918 (inbound-rtp)

timestamp	03/01/2022, 22:55:35
ssrc	3384287918
kind	video
trackId	RTCMediaStreamTrack_receiver_4
transportId	RTCTransport_0_1
mediaType	video
jitter	0.027
packetsLost	78
packetsReceived	79545
[packetsReceived/s]	0
bytesReceived	77156700
[bytesReceived_in_bits/s]	0
headerBytesReceived	1978716
[headerBytesReceived_in_bits/s]	0
jitterBufferDelay	2284.024
[jitterBufferDelay/jitterBufferEmittedCount_in_ms]	0
jitterBufferEmittedCount	13100
framesReceived	13101
[framesReceived/s]	0
[framesReceived-framesDecoded]	0
framesDecoded	13101
[framesDecoded/s]	0
keyFramesDecoded	1
[keyFramesDecoded/s]	0
framesDropped	0
totalDecodeTime	94.229
[totalDecodeTime/framesDecoded_in_ms]	0
totalInterFrameDelay	442.0259999999831
[totalInterFrameDelay/framesDecoded_in_ms]	0
totalSquaredInterFrameDelay	20.370232000000772
[interFrameDelayStDev_in_ms]	0
decoderImplementation	libvpx
firCount	0
pliCount	2
nackCount	51
codecId	RTCCodec_1_Inbound_96
[codec]	VP8 (96)
lastPacketReceivedTimestamp	1641276962171
[lastPacketReceivedTimestamp]	03/01/2022, 22:16:02
frameWidth	480
frameHeight	640
framesPerSecond	4
qpSum	97949
[qpSum/framesDecoded]	0
estimatedPlayoutTimestamp	3850268134980
[estimatedPlayoutTimestamp] 03/01/2092, 22:55:42	


Inbound Audio from Android Webrtc Browser

RTCInboundRTPAudioStream_579305270 (inbound-rtp)

timestamp	03/01/2022, 22:50:14
ssrc	579305270
kind	audio
trackId	RTCMediaStreamTrack_receiver_3
transportId	RTCTransport_0_1
mediaType	audio
jitter	        0.003
packetsLost	208
packetsDiscarded	0
packetsReceived	124469
[packetsReceived/s]	50.03320990953163
fecPacketsReceived	0
fecPacketsDiscarded	0
bytesReceived	4433321
[bytesReceived_in_bits/s]	14209.431614306981
headerBytesReceived	3485132
[headerBytesReceived_in_bits/s]	11207.439019735084
jitterBufferDelay	17887008
[jitterBufferDelay/jitterBufferEmittedCount_in_ms]	113.79999999996896
jitterBufferEmittedCount	119485440
totalSamplesReceived	        118645920
[totalSamplesReceived/s]	48031.88151315036
concealedSamples	        689415
[concealedSamples/s]	0
[concealedSamples/totalSamplesReceived]	0
silentConcealedSamples	338882
[silentConcealedSamples/s]	0
concealmentEvents	230
insertedSamplesForDeceleration	33841
[insertedSamplesForDeceleration/s]	0
removedSamplesForAcceleration	1562246
[removedSamplesForAcceleration/s]	0
totalAudioEnergy	4.458078675648182
[Audio_Level_in_RMS]	0
totalSamplesDuration	2472.2900000075438
codecId	RTCCodec_0_Inbound_111
[codec]	opus (111, minptime=10;useinbandfec=1)
lastPacketReceivedTimestamp	1641279014658
[lastPacketReceivedTimestamp]	03/01/2022, 22:50:14
audioLevel	0
remoteId	RTCRemoteOutboundRTPAudioStream_579305270
estimatedPlayoutTimestamp	3850267813642

RTCP statistics RTCRemoteOutboundRTPAudioStream_579305270 (remote-outbound-rtp)

timestamp	03/01/2022, 22:48:47
ssrc	579305270
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Inbound_111
packetsSent	120306
bytesSent	4285534
localId	RTCInboundRTPAudioStream_579305270
remoteTimestamp	1641278927459
[remoteTimestamp]	03/01/2022, 22:48:47
reportsSent	480
roundTripTimeMeasurements	0
totalRoundTripTime	0

Peer to Peer Data Transfer

Data Channel API of Webrtc allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.

  • (+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
  • (+) build in security due to p2p transfer
  • (+) high throughput than text transfer via a messaging server
  • (+) lower latency as p2p transfer takes shortest route

SCTP is the protocol that opens connectiosn for peer to peer data channel support in WebRTC. It can be configured for reliability and ordered delivery. It provides flow and congestion control to the data messages.

Data Channel Metrics

timestamp	03/01/2022, 23:13:13
label	sctp
dataChannelIdentifier	1
state	open
messagesSent	42
[messagesSent/s]	0
bytesSent	1962750
[bytesSent_in_bits/s]	0
messagesReceived	31
[messagesReceived/s]	0
bytesReceived	4712
[bytesReceived_in_bits/s]	0
After sharing 2 files of 1.5 Mb each


Webrtc Changes bitrate , resolution and framerate dynamically to accomodate the network conditions, policy constraints or user equipment capability.

Higher gthe bitrate, higher the media quality.

Birate of Audio CodecsBitrate of Video
Lossey formats
– iLBC (narrow band )13.33, 15.20 kbit/s
– iSAC ( wideband) 10–52 kbit/s
– GSM-EFR 12.2 kbit/s
– AAC 8–529 kbit/s (stereo)
– AMR-WB (G.722.2) 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s
– Opus – 6–510 kbit/s(-) higher bitrate consumes more bandwidth
(-) can cause congestion on network route

ITU-T formats
– G711 64kbps
– G.711.1 ( MDCT, A-law, μ-law) 64, 80, 96 kbit/s
– G.722 64 kbit/s (comprises 48, 56 or 64 kbit/s audio and 16, 8 or 0 kbit/s auxiliary data)

Lossless formats (such as Dolby trueHD, MPEG-4 ALS) can consume much larger bitrates.
QVGA 200-500 kbps
VGA 400 – 800 kbps
720p+ > 800 kbps
4K( 60fps) > 20 mbps

Packet Loss

Packet loss can cause choppy audio and distorted, blurry or forzen video.




Jitter is the packet delay variation in an otherwise predictable normal rate of delay. This could indicate route changes, growing congestion etc.


Jitter fo Audio


Jitter for Video

Round Trip Time

RTT affects latency



References :

Read more on SDP and its attributes