Fault Tolerance and Error Correction in WebRTC

Fluctuating Networks

WebRTC has build in capabilities to detect network glitches and adapt itself to changing situations. Some of the methodologies used are listed below.

Dynamic Bandwidth estimation

Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.


An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.

A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.

  • (+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
  • (-) can introduce latency for packets that arrive early
  • (-) Need active resisizing by means of feedback
    • for hi speed and goog network jitterbuffer can ve small sized
    • for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
  • (-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.

SDP renegotiation


Demand for High Quality Video

Applications telehealth, advertising or broadcasting on WebRTC media streams

Tradeoff between Latency vs Quality

Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.

Layering for adaptive streaming

using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.

  • (-) more complex compression algorithms

Better compression algorithms vs CPU compute

A better performing compression algorithm produces fewer bits to encode the same video quality as its predecessor.

  • (-) Higher performing compression engines most always has higher energy consumption and carbon footprint
  • (+) resilent to network fluctuations

Full INTRA-frame Request (FIR)

Requests a key frame to decode the frame. Can be used when a new peer joins the conference a key frane is required to start decoding its video strea,.

Picture Loss Indication (PLI)

Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
request a full key frame from the sender , when new memeber enters the session.request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss

Redundant Encoding (RED) in Media Packets

Recovers packet loss under lossy networks by adding extra bits of information in following packets.

  • (+) good for unpredictable networks

LBRR ( low bit-rate redundancy) – tbd


Congestion is created when a network path has reached its maximum limits which could be due to

  • failures(switches, routers, cables, fibres ..)
  • over subscription and operating at peak bandwidth.
  • broadcast storms
  • Inapt BGP routing and congestion detection
  • BGP is responsisble for finiding the shortest routable path for a packet

The direct consequences of congestion for any network transport can be

  • High Latency
  • Connection Timeouts
  • Low throughput
  • Packet loss
  • Queueing delay

With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.

Feedback Loop

A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.

Overcome congestion with lower bitrate

Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems

Reduce frame quality and resolution

Full HD constraint
vga constraints

Congestion control Algorithms : Google Congestion Control ( GCC)

Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.

  • On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
  • On the sender side TWCC(Transport wide congestion control) can be used.

Other congestion control algorithms

  • QUIC Loss Detection and Congestion Control RFC 9002
  • Coupled Congestion Control for RTP Media rfc8699
  • NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
  • Self-Clocked Rate Adaptation for Multimedia RMCAT WG
  • SCReAM – Mobile optimised congestion control algorithm by Ericson

Low Network Strength and High Packet Loss

Packet loss is the loss of packets in transmission which could be owing to

  • network resources and path
  • transmission medium congestion
  • applications inability to absord delayed packets.
  • Maximum Transmission Unit size : measure of how large a single apcket can be.

Recovering Lost packets

High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.

Acknowledgement to identify packet loss

A receiver can notifiy the sender of the possible concerns around packet loss by means of sendings acks.

  • Selective Acknowledgement (SACK) : notifies the sender of multiple packets and thereby indicating gaps
  • Negative Acknowledgements (NACK ) : notifies the sender of packets lost
    • RTCP Packet Type 193 denotes NACK.
  • (+) higher NACK count is suggestive of high packet loss
  • (-) round trip time for NACK to send and waiting for packet to be retransmitted and receive in response can cause significant delay

Forward Error Correction (FEC)

The sender proactively send redundant data such that lost packets dont affact the stream on receiver’s end.

  • (+) receiver doesnt have to request for exgtra data to be sent , the sender does it by itself at RTP level
  • (+) less delay than NACK which incurs round trip time
  • (-) involve extra bandwidth.

Long distance Calls and High Round Trip Time

Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.

transmission_time = timestamp_send - timestamp_receive

For this reason RTT( Round Trip Time)is a better means to avoid clock synchoronization errors.

transmission_time = rtt /2 

Using Receiver reports and Sender Reports from RTCP to adjust to network conditions

Sender and receiver reports (SR and RR) provide a highlight of the connection and media quality streaming on this connection.

RTCP Senders report for WebRTC media stream
RTCP receivers report for WebRTC media stream

Low Latency Media Streaming

Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.

Optimize Compute resource

  • mobile agents have lesser computative power
  • Camera with features such as auto focus or other adjustments will taker more time to cappture
  • network should be of suited bandwidth and strength

Reduce information to be encoded and sent

  • Subject focus and blurring backgroud
  • Filtering noise at source
  • Voice Activity Detection (VAD)
    • send extra data in FEC only is there is voice activity detected in packet
  • Echo Cancellation

Measuring latency

Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization

NTP Synchronization of Audio Video Sync

During the buffereng of incoming [ackets ( which canrage from few ten of miliseconds to few hundred milisecond ) the streams are synchronized.

Time used by RTP for sync is NTP and RTP based ( which are not required to be in sync).

  • NTP Timestamp : 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. Formatted as fractional seconds since Jan 1, 1900
  • RTP Timestamp : RTP timestamp corresponds to the same instant as the NTP timestamp. Expressed in the units of the RTP media clock.
    • Majority of video formats use a 90kHz clock.
    • For receiver to sync audio and video streams these two streasm must be from same clock
Frame 300: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0 (outbound)
Packet type: Sender Report (200)
Length: 6 (28 bytes)
Sender SSRC: 0x39a659b4 (967203252)
Timestamp, MSW: 3855754463 (0xe5d224df)
Timestamp, LSW: 2364654374 (0x8cf1c326)
[MSW and LSW as NTP timestamp: Mar 8, 2022 18:54:23.550563999 UTC]
RTP timestamp: 1110449770

Demand for higher security on WebRTC’s CPaaS

Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.

Features :

  • multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
  • Multistreaming transmit several independent streams of chunks in parallel
  • SCTP has similarities to TCP retransmission and partial reliability like UDP.
  • Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
  • Validation and acknowledgment mechanisms protect against flooding attack

SCTP frames data as datagrams and not as a byte stream

  • (+) SCTP enables WebRTC to be multiplexing
  • (+) It has flow control and congestion avoidance support
  • (+)  

End to End Encryption

End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.

Minimize Public-private mapping pairs vai RTCP-mux

Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.

With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .

  • (+) easier to manage security by gathering ICE candidates for a single port only instead of 2
  • (+) increases the systesm capacity for media session using the same number of ports
  • (+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
  • WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing

References :

AEC (Echo Cancellation) and AGC (Gain Control) in WebRTC

Echo is the sound of your own voice reverberating. If the amplitude of such a sound is high and intervals exceed 25 ms, it becomes disruptive to the conversation. Its types can be acoustic or hybrid. Echo cancellers need to eliminate the echo while still preserving call quality and not disrupting tones such as DTMF.

Acoustic Echo 

Usually the background or reflected noise which is an undesired voiceband energy transfers from the speaker to the microphone and into the communication network. Mostly found in a hands-free set or speakerphone. In a multiparty call scenario, it could also occur due to unmatched volume levels, challenging network conditions on one party, background noise, double talk or even proximity between user and microphone

Hybrid / Electronic Echo in PSTN phones

In a public telephone system, local loop wiring is done using two-wire connections carrying bidirectional voice signals. In PBX, a two-to-four wire conversion is done using a hybrid circuit which does not perform perfect impedance matches resulting in a Hybrid echo.

echo AEC
Hybrid / Electronic Echo in PSTN phones

Echo Cancellation

An efficient echo canceller should cancel out the entire echo tail while not leading to any packet loss. It needs to be adaptive to changing IP network bandwidth and algorithm should function equally well in conference scenarios  where there may be more than one echo sources. Benchmarking tools like MOS (Mean Opinion scores ) are used to gauge the  results. Often voice quality enhancement technologies are also integrated into AEC modules, such as :

  • automatic Gain control ( AGC) ,
  • Noise Reduction
  • Confort Noise Generator ( CNG)
  • Non linear processor
  • tone Disabler for SS& and DTMF tones
echo AEC 2
Automatic Echo Cancellation

WebRTC Echo Cancellation

WebRTC now actively detects and removes echo especially the local system echo resonance.

Noise Suppression in WebRTC

Noise suppression automatically filters the audio to remove background noise.

Automatic Gain Control (AGC)

AGC works as a circuit. When the average audio level is low , circuit raises it and if the audio level is high the circuit brings it down.

  • (+) AGC frees the user from manually tuning the audio level.
  • (-) During a pause too , agc tries to bring audio level to standard setting making background noises louder.
  • (-) subesquent audio processing make gain control progressively worse.

Audio Compressor : Due to the drawbacks with AGC , Audio Compressers carry the operation more sophistically by looking at amplitude of the sound.

(-) not ideal for music which had varrying sound amplitude.

Audio Peak Limiter : Limiters simply keep the audio from exceeding a set maximum level.

(+) well suited for avoiding loud noise such as door slam from entering the processing pipeline.

Audio Expanders :increase the dynamic (loudness) range of audio that has been overly processed.

(+) suited for over compressed audio transmissiono such as Satellite relays

Audio Filters :attenuate audio frequencies either above or below certain points within the audio range.

AGC in webRTC


aspectRatio: true
autoGainControl: true
brightness: true
channelCount: true
colorTemperature: true
contrast: true
deviceId: true
echoCancellation: true
exposureCompensation: true
exposureMode: true
exposureTime: true
facingMode: true
focusDistance: true
focusMode: true
frameRate: true
groupId: true
height: true
iso: true
latency: true
noiseSuppression: true
pan: true
pointsOfInterest: true
resizeMode: true
sampleRate: true
sampleSize: true
saturation: true
sharpness: true
tilt: true
torch: true
whiteBalanceMode: true
width: true
zoom: true

WebRTC Get User MEdia with various values of autoGainControl

References :