- Fluctuating Networks
- Demand for High Quality Video
- Low Network Strength and High Packet Loss
- Low Latency Media Streaming
- Long distance Calls and High Round Trip Time
- NTP Synchronization of Audio Video Sync
- Demand for higher security on WebRTC’s CPaaS
WebRTC has build in capabilities to detect network glitches and adapt itself to changing situations. Some of the methodologies used are listed below.
Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.
An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.
A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.
- (+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
- (-) can introduce latency for packets that arrive early
- (-) Need active resisizing by means of feedback
- for hi speed and goog network jitterbuffer can ve small sized
- for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
- (-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.
Applications telehealth, advertising or broadcasting on WebRTC media streams
Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.
using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.
- (-) more complex compression algorithms
A better performing compression algorithm produces fewer bits to encode the same video quality as its predecessor.
- (-) Higher performing compression engines most always has higher energy consumption and carbon footprint
- (+) resilent to network fluctuations
Requests a key frame to decode the frame. Can be used when a new peer joins the conference a key frane is required to start decoding its video strea,.
Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.
a=rtpmap:100 VP9/90000 a=rtcp-fb:100 goog-remb a=rtcp-fb:100 transport-cc a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=fmtp:100 profile-id=2 a=rtpmap:101 rtx/90000 a=fmtp:101 apt=100
|request a full key frame from the sender , when new memeber enters the session.||request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them|
|causes of making PLI request could be decoder crash or heavy loss|
Recovers packet loss under lossy networks by adding extra bits of information in following packets.
- (+) good for unpredictable networks
LBRR ( low bit-rate redundancy) – tbd
Congestion is created when a network path has reached its maximum limits which could be due to
- failures(switches, routers, cables, fibres ..)
- over subscription and operating at peak bandwidth.
- broadcast storms
- Inapt BGP routing and congestion detection
- BGP is responsisble for finiding the shortest routable path for a packet
The direct consequences of congestion for any network transport can be
- High Latency
- Connection Timeouts
- Low throughput
- Packet loss
- Queueing delay
With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.
A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.
Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems
Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.
- On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
- On the sender side TWCC(Transport wide congestion control) can be used.
Other congestion control algorithms
- QUIC Loss Detection and Congestion Control RFC 9002
- Coupled Congestion Control for RTP Media rfc8699
- NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
- Self-Clocked Rate Adaptation for Multimedia RMCAT WG
- SCReAM – Mobile optimised congestion control algorithm by Ericson
Packet loss is the loss of packets in transmission which could be owing to
- network resources and path
- transmission medium congestion
- applications inability to absord delayed packets.
- Maximum Transmission Unit size : measure of how large a single apcket can be.
High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.
A receiver can notifiy the sender of the possible concerns around packet loss by means of sendings acks.
- Selective Acknowledgement (SACK) : notifies the sender of multiple packets and thereby indicating gaps
- Negative Acknowledgements (NACK ) : notifies the sender of packets lost
- RTCP Packet Type 193 denotes NACK.
- (+) higher NACK count is suggestive of high packet loss
- (-) round trip time for NACK to send and waiting for packet to be retransmitted and receive in response can cause significant delay
The sender proactively send redundant data such that lost packets dont affact the stream on receiver’s end.
- (+) receiver doesnt have to request for exgtra data to be sent , the sender does it by itself at RTP level
- (+) less delay than NACK which incurs round trip time
- (-) involve extra bandwidth.
Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.
transmission_time = timestamp_send - timestamp_receive
For this reason RTT( Round Trip Time)is a better means to avoid clock synchoronization errors.
transmission_time = rtt /2
Sender and receiver reports (SR and RR) provide a highlight of the connection and media quality streaming on this connection.
Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.
Optimize Compute resource
- mobile agents have lesser computative power
- Camera with features such as auto focus or other adjustments will taker more time to cappture
- network should be of suited bandwidth and strength
Reduce information to be encoded and sent
- Subject focus and blurring backgroud
- Filtering noise at source
- Voice Activity Detection (VAD)
- send extra data in FEC only is there is voice activity detected in packet
- Echo Cancellation
Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization
During the buffereng of incoming [ackets ( which canrage from few ten of miliseconds to few hundred milisecond ) the streams are synchronized.
Time used by RTP for sync is NTP and RTP based ( which are not required to be in sync).
- NTP Timestamp : 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. Formatted as fractional seconds since Jan 1, 1900
- RTP Timestamp : RTP timestamp corresponds to the same instant as the NTP timestamp. Expressed in the units of the RTP media clock.
- Majority of video formats use a 90kHz clock.
- For receiver to sync audio and video streams these two streasm must be from same clock
Frame 300: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0 (outbound)
Packet type: Sender Report (200)
Length: 6 (28 bytes)
Sender SSRC: 0x39a659b4 (967203252)
Timestamp, MSW: 3855754463 (0xe5d224df)
Timestamp, LSW: 2364654374 (0x8cf1c326)
[MSW and LSW as NTP timestamp: Mar 8, 2022 18:54:23.550563999 UTC]
RTP timestamp: 1110449770
Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.
- multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
- Multistreaming transmit several independent streams of chunks in parallel
- SCTP has similarities to TCP retransmission and partial reliability like UDP.
- Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
- Validation and acknowledgment mechanisms protect against flooding attack
SCTP frames data as datagrams and not as a byte stream
- (+) SCTP enables WebRTC to be multiplexing
- (+) It has flow control and congestion avoidance support
End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.
Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.
With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .
- (+) easier to manage security by gathering ICE candidates for a single port only instead of 2
- (+) increases the systesm capacity for media session using the same number of ports
- (+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
- WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing
- RTP: A Transport Protocol for Real-Time Applications RFC 3550