- Media Stream Tracks in WebRTC
- Video Streams
- Video Capture insync with hardware’s capabilities
- Capture Resolution
- SDP attributes for resolution, frame rate, and bitrate
- Dynamic FPS control based on actual hardware encoding
- Stream Orientation
- Audio Streams
- Audio Level
- GAIN calculation
- Acoustic Echo Cancellation (AEC)
- SDP signaling and negotiation for media plane
- Media Source
- Peer-to-Peer Media Stream
- Frames
- Packets
- Bytes
- Headers
- Peer-to-Peer Data Transfer
- Bitrate
- Packet Loss
- Jitter
- Round Trip Time
Media Stream Tracks in WebRTC
The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.
The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.


Video Streams
Video Capture insync with hardware’s capabilities
WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source
Video Capture Resolution
Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels.
Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .
SDP attributes for resolution, frame rate, and bitrate
SDP allows for codec-independent indication of preferred video resolutions using a=imageattr
to indicate the maximum resolution that is acceptable.
Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.
Dynamic FPS control based on actual hardware encoding
video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .
Stream Orientation
support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer.
Audio Streams
Audio Level
Audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.
Normalization is considering frequencies above 300 Hz, regardless of the sampling rate used. Can be adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.
GAIN calculation
- If the endpoint has control over the entire audio-capture path like a regular phone
the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
- If the endpoint does not have control over the entire audio capture like software endpoint
then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
- For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.
Acoustic Echo Cancellation (AEC)
Endpoints allow echo control mechanisms

SDP signaling and negotiation for media plane
Media plane adaptation is done at the SBC for network carried media, it should be done for all network hosted media services which face peer-to-peer media.
The high-level architecture elements of WebRTC media streams consists of
- Encryption, RTP Multiplexing, Support for ICE
- Audio – Interworking of differing WebRTC and codec sets
- Video – Use of VP8, Support for H.264
- Data – Support of MSRP ( RCS standard for messaging over DataChannel API)
Media Source
RTCVideoSource_4 (media-source)
timestamp 03/01/2022, 23:07:05 trackIdentifier 1bcab53d-1eca-41d1-a96a-00f1458c9b1b kind video width 640 height 480 frames 7556 framesPerSecond 30

RTCAudioSource_3 (media-source)
timestamp 03/01/2022, 23:06:26 trackIdentifier 12cb979c-b40f-4de7-8b50-be6f4425e0b2 kind audio audioLevel 0.020599993896298106 totalAudioEnergy 1.8476431267450812 [Audio_Level_in_RMS] 0.02541394245734895 totalSamplesDuration 213.66999999995065 echoReturnLoss -0.11197675950825214 echoReturnLossEnhancement 8.111690521240234

Peer-to-Peer Media Stream
Direct connection to media servers and media gateways.
Use common codec set wherever possible to eliminate transcoding Use regionalized transcoding where common codec not available Real-time video transcoding is expensive and performance impacting.
On-going standards/device/network work needs to be done to expand common codec set. WebRTC codec standards have not been finalized yet. WebRTC target is to support royalty free codecs within its standards.
Media | WebRTC | legacy |
Audio | G.711, Opus | G.711, AMR, AMR-WB (G.722.2) |
Audio – Extended | G.729a[b], G.726 | |
Video | VP8 | H.264/AVC |
Supporting common codecs between VoLTE devices and WebRTC endpoints requires one or more of the following:
- Support of WebRTC codecs on 3GPP/GSMA
- Support of 3GPP/GSMA codecs on WebRTC
- WebRTC browser support of codecs native to the device
RTP streams and RTCP stats Outbound Video
Chrome Browser on ubuntu OS
RTCOutboundRTPVideoStream_3305924664 (outbound-rtp)
timestamp 03/01/2022, 22:23:32 ssrc 3305924664 kind video trackId RTCMediaStreamTrack_sender_4 transportId RTCTransport_0_1 codecId RTCCodec_1_Outbound_96 [codec] VP8 (96) mediaType video mediaSourceId RTCVideoSource_4 packetsSent 171360 [packetsSent/s] 204.02266754223697 retransmittedPacketsSent 620 [retransmittedPacketsSent/s] 0 bytesSent 177210957 [bytesSent_in_bits/s] 1680050.6587655507 headerBytesSent 4218672 [headerBytesSent_in_bits/s] 39812.423281967494 retransmittedBytesSent 668008 [retransmittedBytesSent_in_bits/s] 0 framesEncoded 22003 [framesEncoded/s] 30.00333346209367 keyFramesEncoded 14 totalEncodeTime 418.017 [totalEncodeTime/framesEncoded_in_ms] 9.533333333333378 totalEncodedBytesTarget 0 [totalEncodedBytesTarget_in_bits/s] 0 framesSent 22003 [framesSent/s] 30.00333346209367 hugeFramesSent 1 totalPacketSendDelay 29963.73 [totalPacketSendDelay/packetsSent_in_ms] 31.62745098039772 qualityLimitationReason none qualityLimitationDurations {bandwidth:0,cpu:174895,none:717684,other:0} qualityLimitationResolutionChanges 0 encoderImplementation libvpx firCount 0 pliCount 2 nackCount 161 remoteId RTCRemoteInboundRtpVideoStream_3305924664 frameWidth 640 frameHeight 480 framesPerSecond 30 qpSum 151000 [qpSum/framesEncoded] 9.3
RTCP statistics RTCRemoteInboundRtpVideoStream_3305924664 (remote-inbound-rtp)
timestamp 03/01/2022, 22:25:29 ssrc 984864038 kind audio transportId RTCTransport_0_1 codecId RTCCodec_0_Outbound_111 jitter 0.026854166666666665 packetsLost 19 localId RTCOutboundRTPAudioStream_984864038 roundTripTime 0.048 fractionLost 0 totalRoundTripTime 8.932 roundTripTimeMeasurements 201
Frames



After considerable time( 10 minutes in my case ) the quality of the media stream adjust to network conditions and variations ( peaks and dips) flat out.



Packets


Bytes Send And Received


Headers


Outbound Audio from Ubuntu Chrome Browser
RTCOutboundRTPAudioStream_984864038 (outbound-rtp)
timestamp 03 / 01 / 2022, 22: 13: 26 ssrc 984864038 kind audio trackId RTCMediaStreamTrack_sender_3 transportId RTCTransport_0_1 codecId RTCCodec_0_Outbound_111 [codec] opus(111, minptime = 10; useinbandfec = 1) mediaType audio mediaSourceId RTCAudioSource_3 packetsSent 14292 [packetsSent / s] 50.003051944088384 retransmittedPacketsSent 0 [retransmittedPacketsSent / s] 0 bytesSent 1151754 [bytesSent_in_bits / s] 32449.980589635597 headerBytesSent 400176 [headerBytesSent_in_bits / s] 11200.683635475798 retransmittedBytesSent 0 [retransmittedBytesSent_in_bits / s] 0 nackCount 0 remoteId RTCRemoteInboundRtpAudioStream_984864038
RTCP statistics RTCRemoteInboundRtpAudioStream_984864038 (remote-inbound-rtp)
timestamp 03/01/2022, 22:17:05 ssrc 984864038 kind audio transportId RTCTransport_0_1 codecId RTCCodec_0_Outbound_111 jitter 0.002 packetsLost 3 localId RTCOutboundRTPAudioStream_984864038 roundTripTime 0.023 fractionLost 0 totalRoundTripTime 4.344 roundTripTimeMeasurements 98



Inbound Video from Android Webrtc Browser
RTCInboundRTPVideoStream_3384287918 (inbound-rtp)
timestamp 03 / 01 / 2022, 22: 55: 35 ssrc 3384287918 kind video trackId RTCMediaStreamTrack_receiver_4 transportId RTCTransport_0_1 mediaType video jitter 0.027 packetsLost 78 packetsReceived 79545 [packetsReceived / s] 0 bytesReceived 77156700 [bytesReceived_in_bits / s] 0 headerBytesReceived 1978716 [headerBytesReceived_in_bits / s] 0 jitterBufferDelay 2284.024 [jitterBufferDelay / jitterBufferEmittedCount_in_ms] 0 jitterBufferEmittedCount 13100 framesReceived 13101 [framesReceived / s] 0[framesReceived - framesDecoded] 0 framesDecoded 13101 [framesDecoded / s] 0 keyFramesDecoded 1 [keyFramesDecoded / s] 0 framesDropped 0 totalDecodeTime 94.229 [totalDecodeTime / framesDecoded_in_ms] 0 totalInterFrameDelay 442.0259999999831 [totalInterFrameDelay / framesDecoded_in_ms] 0 totalSquaredInterFrameDelay 20.370232000000772 [interFrameDelayStDev_in_ms] 0 decoderImplementation libvpx firCount 0 pliCount 2 nackCount 51 codecId RTCCodec_1_Inbound_96 [codec] VP8(96) lastPacketReceivedTimestamp 1641276962171 [lastPacketReceivedTimestamp] 03 / 01 / 2022, 22: 16: 02 frameWidth 480 frameHeight 640 framesPerSecond 4 qpSum 97949 [qpSum / framesDecoded] 0 estimatedPlayoutTimestamp 3850268134980 [estimatedPlayoutTimestamp] 03 / 01 / 2092, 22: 55: 42
missing

Inbound Audio from Android Webrtc Browser
RTCInboundRTPAudioStream_579305270 (inbound-rtp)
timestamp 03 / 01 / 2022, 22: 50: 14 ssrc 579305270 kind audio trackId RTCMediaStreamTrack_receiver_3 transportId RTCTransport_0_1 mediaType audio jitter 0.003 packetsLost 208 packetsDiscarded 0 packetsReceived 124469 [packetsReceived / s] 50.03320990953163 fecPacketsReceived 0 fecPacketsDiscarded 0 bytesReceived 4433321 [bytesReceived_in_bits / s] 14209.431614306981 headerBytesReceived 3485132 [headerBytesReceived_in_bits / s] 11207.439019735084 jitterBufferDelay 17887008 [jitterBufferDelay / jitterBufferEmittedCount_in_ms] 113.79999999996896 jitterBufferEmittedCount 119485440 totalSamplesReceived 118645920 [totalSamplesReceived / s] 48031.88151315036 concealedSamples 689415 [concealedSamples / s] 0[concealedSamples / totalSamplesReceived] 0 silentConcealedSamples 338882 [silentConcealedSamples / s] 0 concealmentEvents 230 insertedSamplesForDeceleration 33841 [insertedSamplesForDeceleration / s] 0 removedSamplesForAcceleration 1562246 [removedSamplesForAcceleration / s] 0 totalAudioEnergy 4.458078675648182 [Audio_Level_in_RMS] 0 totalSamplesDuration 2472.2900000075438 codecId RTCCodec_0_Inbound_111 [codec] opus(111, minptime = 10; useinbandfec = 1) lastPacketReceivedTimestamp 1641279014658 [lastPacketReceivedTimestamp] 03 / 01 / 2022, 22: 50: 14 audioLevel 0 remoteId RTCRemoteOutboundRTPAudioStream_579305270 estimatedPlayoutTimestamp 3850267813642 [estimatedPlayoutTimestamp]
RTCP statistics RTCRemoteOutboundRTPAudioStream_579305270 (remote-outbound-rtp)
timestamp 03 / 01 / 2022, 22: 48: 47 ssrc 579305270 kind audio transportId RTCTransport_0_1 codecId RTCCodec_0_Inbound_111 packetsSent 120306 bytesSent 4285534 localId RTCInboundRTPAudioStream_579305270 remoteTimestamp 1641278927459 [remoteTimestamp] 03 / 01 / 2022, 22: 48: 47 reportsSent 480 roundTripTimeMeasurements 0 totalRoundTripTime 0

Comparision of Media stream QoS metrics between laptop browser and mobile browser
chrome browser on laptop | mobile chorme browser |
higher frame received ( 30) | lower frame received (20) |
lower jitter (0.002) and packet loss (3) | higher jitter (0.003) and packet loss (208) |
Bundled Streams
Same port used for all emdia stream. Fir exmaple port 9 is used for audio video as well as their RTCP feedbacks in snippet below.
a=group:BUNDLE 0 1 a=extmap-allow-mixed a=msid-semantic: WMS kAGMqdVh7lL70CVUVZQblgjPYsuhOAiGY3ii m=audio 9 UDP/TLS/RTP/SAVPF 111 63 103 104 9 0 8 106 105 13 110 112 113 126 (33 more lines) a=rtcp:9 IN IP4 0.0.0.0 a=sendrecv a=msid:kAGMqdVh7lL70CVUVZQblgjPYsuhOAiGY3ii ed96e925-4425-467b-a099-8fb2e0c67b88 a=rtcp-mux ... m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 35 36 120 119 124 (100 more lines) a=rtcp:9 IN IP4 0.0.0.0 a=sendrecv a=msid:kAGMqdVh7lL70CVUVZQblgjPYsuhOAiGY3ii f45d3002-2866-44ce-a807-ac59a4f6708c a=rtcp-mux
Same CSRC : To run multiple streams of media over a single RTP stream, a common SSRC is used.
a=ssrc:1683985800 cname:OYWXQ35YL2Hh+eUX a=ssrc:1384669066 cname:OYWXQ35YL2Hh+eUX
Peer to Peer Data Transfer
Data Channel API of Webrtc allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.
- (+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
- (+) build in security due to p2p transfer
- (+) high throughput than text transfer via a messaging server
- (+) lower latency as p2p transfer takes shortest route
SCTP is the protocol that opens connectiosn for peer to peer data channel support in WebRTC. It can be configured for reliability and ordered delivery. It provides flow and congestion control to the data messages.
Data Channel Metrics
timestamp 03/01/2022, 23:13:13 label sctp protocol dataChannelIdentifier 1 state open messagesSent 42 [messagesSent/s] 0 bytesSent 1962750 [bytesSent_in_bits/s] 0 messagesReceived 31 [messagesReceived/s] 0 bytesReceived 4712 [bytesReceived_in_bits/s] 0

Bitrate
Webrtc Changes bitrate , resolution and framerate dynamically to accomodate the network conditions, policy constraints or user equipment capability. Higher the bitrate, higher the media quality.

Birate of Audio Codecs
Lossey formats
– iLBC (narrow band )13.33, 15.20 kbit/s
– iSAC ( wideband) 10–52 kbit/s
– GSM-EFR 12.2 kbit/s
– AAC 8–529 kbit/s (stereo)
– AMR-WB (G.722.2) 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s
– Opus – 6–510 kbit/s(-) higher bitrate consumes more bandwidth
(-) can cause congestion on network route
ITU-T formats
– G711 64kbps
– G.711.1 ( MDCT, A-law, μ-law) 64, 80, 96 kbit/s
– G.722 64 kbit/s (comprises 48, 56 or 64 kbit/s audio
and 16, 8 or 0 kbit/s auxiliary data)
Lossless formats (such as Dolby trueHD, MPEG-4 ALS)
consume much larger bitrates.

Bitrate of Video
QVGA 200-500 kbps
VGA 400 – 800 kbps
720p+ > 800 kbps
4K( 60fps) > 20 mbps
Packet Loss
Packet loss can cause choppy audio and distorted, blurry or frozen video.
Audio Packet loss


Video Packet loss


Jitter
Jitter is the packet delay variation in an otherwise predictable normal rate of delay. This could indicate route changes, growing congestion etc.
Audio

Video

Round Trip Time
High RTT is indicative of network congestion and causes delays.
Audio RTT


Video RTT


Cummulative Analysis of packet lost , RTT measurement and total RTT on an internetwork scenarios with peerfelxive and relay ICE candidates

Deeper Analysis of fraction lost , jitter and RTT

The chart shows how jitter follows RTT
References :
- [1] IETF https://tools.ietf.org/id/draft-ietf-rtcweb-sdp-08.html#rfc.section.5.2.8
- [2] developer.mozilla.org Webrtc codecs https://developer.mozilla.org/en-US/docs/Web/Media/Formats/WebRTC_codecs
Read more on SDP and its attributes