WebRTC Media Streams and Quality metrics

Media Stream Tracks in WebRTC

The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

Media Flow in VoIP system
Media Flow in WebRTC Call

Video Streams

Video Capture insync with hardware’s capabilities

WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

Video Capture Resolution

Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

SDP attributes for resolution, frame rate, and bitrate

SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

Dynamic FPS control based on actual hardware encoding

video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

Stream Orientation

support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer.

Audio Streams

Audio Level

audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

Normalization considering frequencies above 300 Hz, regardless of the sampling rate used.

Adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

GAIN calculation

  • If the endpoint has control over the entire audio-capture path like a regular phone
    the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
  • If the endpoint does not have control over the entire audio capture like software endpoint
    then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
  • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

Acoustic Echo Cancellation (AEC)

Endpoints shoud allow echo control mechanisms

SDP signaling and negotiation for media plane

Media plane adaptation is done at the SBC for network carried media, it should be done for all network hosted media services which face peer-to-peer media.

The high-level architecture elements of WebRTC media streams consists of

  • Encryption, RTP Multiplexing, Support for ICE
  • Audio – Interworking of differing WebRTC and codec sets
  • Video – Use of VP8, Support for H.264
  • Data – Support of MSRP ( RCS standard for messaging over DataChannel API)

Media Source

RTCVideoSource_4 (media-source)

timestamp	03/01/2022, 23:07:05
trackIdentifier	1bcab53d-1eca-41d1-a96a-00f1458c9b1b
kind	video
width	640
height	480
frames	7556
framesPerSecond	30

RTCAudioSource_3 (media-source)

timestamp	        03/01/2022, 23:06:26
trackIdentifier	        12cb979c-b40f-4de7-8b50-be6f4425e0b2
kind	                audio
audioLevel	        0.020599993896298106
totalAudioEnergy	1.8476431267450812
[Audio_Level_in_RMS]	0.02541394245734895
totalSamplesDuration	213.66999999995065
echoReturnLoss	        -0.11197675950825214
echoReturnLossEnhancement 8.111690521240234

Peer-to-Peer Media Stream

Direct connection to media servers and media gateways.

Use common codec set wherever possible to eliminate transcoding —Use regionalized transcoding where common codec not available Real-time video transcoding is expensive and performance impacting.

On-going standards/device/network work needs to be done to expand common codec set. WebRTC codec standards have not been finalized yet. WebRTC target is to support royalty free codecs within its standards.

AudioG.711, OpusG.711, AMR, AMR-WB (G.722.2)
Audio – ExtendedG.729a[b], G.726

Supporting common codecs between VoLTE devices and WebRTC endpoints requires one or more of the following:

  1. Support of WebRTC codecs on 3GPP/GSMA
  2. Support of 3GPP/GSMA codecs on WebRTC
  3. WebRTC browser support of codecs native to the device

Outbound Video from Ubuntu chrome Webrtc browser

RTCOutboundRTPVideoStream_3305924664 (outbound-rtp)

timestamp	03/01/2022, 22:23:32
ssrc	3305924664
kind	video
trackId	RTCMediaStreamTrack_sender_4
transportId	RTCTransport_0_1
codecId	RTCCodec_1_Outbound_96
[codec]	VP8 (96)
mediaType	video
mediaSourceId	RTCVideoSource_4
packetsSent	171360
[packetsSent/s]	204.02266754223697
retransmittedPacketsSent	620
[retransmittedPacketsSent/s]	0
bytesSent	177210957
[bytesSent_in_bits/s]	1680050.6587655507
headerBytesSent	4218672
[headerBytesSent_in_bits/s]	39812.423281967494
retransmittedBytesSent	668008
[retransmittedBytesSent_in_bits/s]	0
framesEncoded	22003
[framesEncoded/s]	30.00333346209367
keyFramesEncoded	14
totalEncodeTime	418.017
[totalEncodeTime/framesEncoded_in_ms]	9.533333333333378
totalEncodedBytesTarget	0
[totalEncodedBytesTarget_in_bits/s]	0
framesSent	22003
[framesSent/s]	30.00333346209367
hugeFramesSent	1
totalPacketSendDelay	29963.73
[totalPacketSendDelay/packetsSent_in_ms]	31.62745098039772
qualityLimitationReason	none
qualityLimitationDurations	{bandwidth:0,cpu:174895,none:717684,other:0}
qualityLimitationResolutionChanges	0
encoderImplementation	libvpx
firCount	0
pliCount	2
nackCount	161
remoteId	RTCRemoteInboundRtpVideoStream_3305924664
frameWidth	640
frameHeight	480
framesPerSecond	30
qpSum	151000
[qpSum/framesEncoded]	9.3

RTCP statistics RTCRemoteInboundRtpVideoStream_3305924664 (remote-inbound-rtp)

timestamp	03/01/2022, 22:25:29
ssrc	984864038
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
jitter	0.026854166666666665
packetsLost	19
localId	RTCOutboundRTPAudioStream_984864038
roundTripTime	0.048
fractionLost	0
totalRoundTripTime	8.932
roundTripTimeMeasurements	201



After considerable time( 10 minutes in my case ) the quality of the media stream adjust to network conditions and variations ( peaks and dips) flat out.

after some time
after some time has passed
after some time


After some time


After some time has passes


after some time has passes

Outbound Audio from Ubuntu Chrome Browser

RTCOutboundRTPAudioStream_984864038 (outbound-rtp)

timestamp	03/01/2022, 22:13:26
ssrc	984864038
kind	audio
trackId	RTCMediaStreamTrack_sender_3
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
[codec]	opus (111, minptime=10;useinbandfec=1)
mediaType	audio
mediaSourceId	RTCAudioSource_3
packetsSent	14292
[packetsSent/s]	50.003051944088384
retransmittedPacketsSent	0
[retransmittedPacketsSent/s]	0
bytesSent	1151754
[bytesSent_in_bits/s]	32449.980589635597
headerBytesSent	400176
[headerBytesSent_in_bits/s]	11200.683635475798
retransmittedBytesSent	0
[retransmittedBytesSent_in_bits/s]	0
nackCount	0
remoteId	RTCRemoteInboundRtpAudioStream_984864038

RTCP statistics RTCRemoteInboundRtpAudioStream_984864038 (remote-inbound-rtp)

timestamp	03/01/2022, 22:17:05
ssrc	984864038
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Outbound_111
jitter	0.002
packetsLost	3
localId	RTCOutboundRTPAudioStream_984864038
roundTripTime	0.023
fractionLost	0
totalRoundTripTime	4.344
roundTripTimeMeasurements 98	

Inbound Video from Android Webrtc Browser

RTCInboundRTPVideoStream_3384287918 (inbound-rtp)

timestamp	03/01/2022, 22:55:35
ssrc	3384287918
kind	video
trackId	RTCMediaStreamTrack_receiver_4
transportId	RTCTransport_0_1
mediaType	video
jitter	0.027
packetsLost	78
packetsReceived	79545
[packetsReceived/s]	0
bytesReceived	77156700
[bytesReceived_in_bits/s]	0
headerBytesReceived	1978716
[headerBytesReceived_in_bits/s]	0
jitterBufferDelay	2284.024
[jitterBufferDelay/jitterBufferEmittedCount_in_ms]	0
jitterBufferEmittedCount	13100
framesReceived	13101
[framesReceived/s]	0
[framesReceived-framesDecoded]	0
framesDecoded	13101
[framesDecoded/s]	0
keyFramesDecoded	1
[keyFramesDecoded/s]	0
framesDropped	0
totalDecodeTime	94.229
[totalDecodeTime/framesDecoded_in_ms]	0
totalInterFrameDelay	442.0259999999831
[totalInterFrameDelay/framesDecoded_in_ms]	0
totalSquaredInterFrameDelay	20.370232000000772
[interFrameDelayStDev_in_ms]	0
decoderImplementation	libvpx
firCount	0
pliCount	2
nackCount	51
codecId	RTCCodec_1_Inbound_96
[codec]	VP8 (96)
lastPacketReceivedTimestamp	1641276962171
[lastPacketReceivedTimestamp]	03/01/2022, 22:16:02
frameWidth	480
frameHeight	640
framesPerSecond	4
qpSum	97949
[qpSum/framesDecoded]	0
estimatedPlayoutTimestamp	3850268134980
[estimatedPlayoutTimestamp] 03/01/2092, 22:55:42	


Inbound Audio from Android Webrtc Browser

RTCInboundRTPAudioStream_579305270 (inbound-rtp)

timestamp	03/01/2022, 22:50:14
ssrc	579305270
kind	audio
trackId	RTCMediaStreamTrack_receiver_3
transportId	RTCTransport_0_1
mediaType	audio
jitter	        0.003
packetsLost	208
packetsDiscarded	0
packetsReceived	124469
[packetsReceived/s]	50.03320990953163
fecPacketsReceived	0
fecPacketsDiscarded	0
bytesReceived	4433321
[bytesReceived_in_bits/s]	14209.431614306981
headerBytesReceived	3485132
[headerBytesReceived_in_bits/s]	11207.439019735084
jitterBufferDelay	17887008
[jitterBufferDelay/jitterBufferEmittedCount_in_ms]	113.79999999996896
jitterBufferEmittedCount	119485440
totalSamplesReceived	        118645920
[totalSamplesReceived/s]	48031.88151315036
concealedSamples	        689415
[concealedSamples/s]	0
[concealedSamples/totalSamplesReceived]	0
silentConcealedSamples	338882
[silentConcealedSamples/s]	0
concealmentEvents	230
insertedSamplesForDeceleration	33841
[insertedSamplesForDeceleration/s]	0
removedSamplesForAcceleration	1562246
[removedSamplesForAcceleration/s]	0
totalAudioEnergy	4.458078675648182
[Audio_Level_in_RMS]	0
totalSamplesDuration	2472.2900000075438
codecId	RTCCodec_0_Inbound_111
[codec]	opus (111, minptime=10;useinbandfec=1)
lastPacketReceivedTimestamp	1641279014658
[lastPacketReceivedTimestamp]	03/01/2022, 22:50:14
audioLevel	0
remoteId	RTCRemoteOutboundRTPAudioStream_579305270
estimatedPlayoutTimestamp	3850267813642

RTCP statistics RTCRemoteOutboundRTPAudioStream_579305270 (remote-outbound-rtp)

timestamp	03/01/2022, 22:48:47
ssrc	579305270
kind	audio
transportId	RTCTransport_0_1
codecId	RTCCodec_0_Inbound_111
packetsSent	120306
bytesSent	4285534
localId	RTCInboundRTPAudioStream_579305270
remoteTimestamp	1641278927459
[remoteTimestamp]	03/01/2022, 22:48:47
reportsSent	480
roundTripTimeMeasurements	0
totalRoundTripTime	0

Peer to Peer Data Transfer

Data Channel API of Webrtc allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.

  • (+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
  • (+) build in security due to p2p transfer
  • (+) high throughput than text transfer via a messaging server
  • (+) lower latency as p2p transfer takes shortest route

SCTP is the protocol that opens connectiosn for peer to peer data channel support in WebRTC. It can be configured for reliability and ordered delivery. It provides flow and congestion control to the data messages.

Data Channel Metrics

timestamp	03/01/2022, 23:13:13
label	sctp
dataChannelIdentifier	1
state	open
messagesSent	42
[messagesSent/s]	0
bytesSent	1962750
[bytesSent_in_bits/s]	0
messagesReceived	31
[messagesReceived/s]	0
bytesReceived	4712
[bytesReceived_in_bits/s]	0
After sharing 2 files of 1.5 Mb each


Webrtc Changes bitrate , resolution and framerate dynamically to accomodate the network conditions, policy constraints or user equipment capability.

Higher gthe bitrate, higher the media quality.

Birate of Audio CodecsBitrate of Video
Lossey formats
– iLBC (narrow band )13.33, 15.20 kbit/s
– iSAC ( wideband) 10–52 kbit/s
– GSM-EFR 12.2 kbit/s
– AAC 8–529 kbit/s (stereo)
– AMR-WB (G.722.2) 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s
– Opus – 6–510 kbit/s(-) higher bitrate consumes more bandwidth
(-) can cause congestion on network route

ITU-T formats
– G711 64kbps
– G.711.1 ( MDCT, A-law, μ-law) 64, 80, 96 kbit/s
– G.722 64 kbit/s (comprises 48, 56 or 64 kbit/s audio and 16, 8 or 0 kbit/s auxiliary data)

Lossless formats (such as Dolby trueHD, MPEG-4 ALS) can consume much larger bitrates.
QVGA 200-500 kbps
VGA 400 – 800 kbps
720p+ > 800 kbps
4K( 60fps) > 20 mbps

Packet Loss

Packet loss can cause choppy audio and distorted, blurry or forzen video.




Jitter is the packet delay variation in an otherwise predictable normal rate of delay. This could indicate route changes, growing congestion etc.


Jitter fo Audio


Jitter for Video

Round Trip Time

RTT affects latency



References :

Read more on SDP and its attributes