Low Latency Media streaming

Low latency is imperative for use cases that require mission-critical communication such as the emergency call for first responders, interactive collaboration and communication services, real-time remote object detection etc. Other use cases where low latency is essential are banking communication, financial trading communication, VR gaming etc. When low latency streaming is combined with high definition (HD) quality, the complication grows tenfold. Some instance where good video quality is as important as sensitivity to delay is telehealth for patient-doctor communication. 

Measuring Latency

A NTP Time server  measures the number of seconds that have elapsed since January 1, 1970. NTP time stamp can represent time values to an accuracy of ± 0.1 nanoseconds (ns). In RTP spec it is a 64-bit double word where top 32 bits represent seconds, and the bottom 32 bits represent fractions of a second.

For measuring latency in RTP howver NTP time server is not used , instead the audio capture clock forms the basis for NTP time calcuilation.

RTP/(RTP sample rate) = (NTP + offset) x scale

The latency is then calculated with accurate mappings between the NTP time and RTP media time stamps by sending RTCP packets for each media stream.

Latency can be induced at various points in the systems

  1. Transmitter Latency in the capture, encoding and/or packetization
  2.  Network latency including gateways, load balancing, buffering
  3.  Media path having TURN servers for Network address traversal, delay due to low bandwidth, transcoding delays in media servers.
  4.  Receiver delays in playback due to buffer delay, playout delay by decoder due to hardware constraints

The delay can be caused by one or many stages of the path in the media stream and would be a cumulative sum of all individual delays. For this reason, TCP is a bad candidate due to its latency incurred in packet reordering and fair congestion control mechanisms. While TCP continues to provide signalling transport, the media is streamed over RTP/UDP.

Latency reduction

Although modern media stacks such as WebRTC are designed to be adpatble for dynamic network condition, the issues of bandwidth unpredictbaility leads to packet loss and evenetually low Qos.

Effective techniques to reduce latency

  • Dynamic network analysis and bandwidth estimation for adaptive streaming ensure low latency stream reception at the remote end.
  • Silence suppression: This is an effective way to save bandwidth
    • (+) typical bandwidth reduction after using silence suppression ~50%
  • Noise filtering and Background blurring are also efficient ways to reduce the network traffic 
  • Forward error correction or redundant encoding, techniques help to recover from packet loss faster than retransmission requests via NACK 
  • Increased Compression can also optimize packetization and transmission of raw data that targets low latency
  • Predictive decoding and end point controlled congestion

Ineffective techniques that do not improve QoS even if reduce latency 

  • minimizing headers in every PDU: Some extra headers such as CSRC or timestamp can be removed to create RTPLite but significant disadvantages include having to functionalities using custom logic as
    • (-) removing timestamp would lead to issues in cross-media sync ( lip-sync) or jitter, packet loss sync
    • (-) Removing contributing source identifiers (SSRC ) could lead to issues in managing source identity in multicast or via media gateway.
  • Too many TURN, STUN servers and candidates collection 
  • lowering resolution or bitrate may achieve low latency but is far from the hi-definition experience that users expect.

Lip Sync ( Audio Video Synchronization)

Many real time communicationa nd streaming platforms have seprate audio and video processing pipelines. The output of these two can go out of sync due to differing latency or speed and may may appear out of sync at the playback on freceivers end. As teh skew increases the viewers perceive it as bad quality.

According to convention, at the input to the encoder, the audio should not lead the video by more than 15 ms and should not lag the video by more than 45 ms. Generally the lip sync tolerance is at +- 15 ms.

Synchronize clocks

The clock helps in various ways to make the streaming faster by calculating delays with precision

  • NTP timestamps help the endpoints measure their delays.
  • Sequence numbers detect losses as it increases by 1 for each packet transmitted. The timestamps increase by the time covered by a packet which is useful for correcting timing order, playout delay compensation etc.

To compute transmission time the sender and receiver clocks need to be synchronized with miliseconds of precision. But this is unliley in a heterogenious enviornment as hosts may have different clock speeds. Clock Synchronization can be acheived in various ways

  1. NTP synchronization : For multimedia conferences the NTP timestamp from RTCP SR is used to give a common time reference that can associate these independant timesatmps witha wall clock shared time. Thei allows media synchronization between sender in a single session. Additionally RFC 3550 specifies one media-timestamp in the RTP data header and a mapping between such timestamp and a globally synchronized clock( NTP), carried as RTCP timestamp mappings.

2. MultiCast synchronization: receivers synronize to sender’s RTP Header media timestamp

  • (-) good approach for multicast sessions

3. Round Trip Time measurement as a workaround to calculate clock cync. A round trip propagation delay can help the host sync its clock with peers.

roundtrip_time = send timestamp - reflected timestamp
tramission_time = roundtrip_time / 2
  • (-) this approach assumes equal time for sending and receiving packet which is not the case in cellular networks. thus not suited for networks which are time assymetrical.
  • (-) transmission time can also vary with jitter. ( some packets arrive in bursts and some be delayed)
  • (-) subjected to [acket loss

4. Adjust playout delay via Marker Bit : Marker bit indicates begiining of talk spurt. This lets the receiver adjust the playout elay to compensate fir the different click rates between sender and receiver and/or network delay jitter.

Marker Bit Header in RTP with GSM payload

Receivers can perform delay adaption using marker bit as long as the reordering time of market bit packer with respect to other packets is less than the playout delay. Else the receiver waits for the next talkspurt.

Sequence of RTP with GSM payload

Simillar examples frmo WebRTC RTP dumps

Congestion Control in Real Time Systems

Congestion is when we have reached the peak limit the network can supportor the peak bandwidth the network path can handle. There could be many reasons for congestion as limits by ISP, high usage at certain time, failure on some network resources causing other relay to be overloaded so on. Congestion can result in

  • droppping excess packets => high packet loss
  • increased buffering -> overloaded packets will queue and cause eratic delivery -> high jitter
  • progressively increasing round trip time
  • congestion control algorithsm send explicit notification which trigger other nodes to actiavte congestion control.

A real time communication system maybe efficient performing encoding/decoding but will be eventually limited by the network. Sending congestion dynmically helps the platform to adapt and ensure a satisfactory quality without lossing too many packet at network path. There has been extensive reserach on the subject of congestion control in both TCP and UDP transports. Simplistic methods use ACK’s for packet drop and OWD (one way delay) to derieve that some congestion may be occuring and go into avoidance mode by reducing the bitrate and/or sending window.

UDP/RTP streams have the support of well deisgned RTCP feedbacks to proactively deduce congestion situation before it happenes. Some WebRTC approaches work around the problem of congestion by providing simulcast , SVC(Temporal/frame rate, spatial/picture size , SNR/Quality/Fidelity ) , redundant encoding etc. Following attributes can help to infer congestion in a network

  • increasing RTT ( Round Trip Time)
  • increasing OWD( One Way Delay)
  • occurance of Packet Loss
  • Queing Delay Gradient = queue length in bits / Capacity of bottleneck link

Feedback loop

A feedback loop between video encoder and congestion controller can significantly help the host from experiencing bad QoS.

MTU determination

Maximum transmission Unit ( MTU) determine how large a packet can be to be send via a network. MTU differs on the path and path discovery is an effective way to determine the largest packet size that can be sent.

RTCP Feedback

To avoid having to expend on retransmission and faulty gaps in playback, the system needs to cummulatively detect congestion. The ways to detect congestion are looking at self’s sending buffer and observing receivers feedback. RTP supports mechanisms that allow a form of congestion control on longer time scales.

Resolving congestion

Some popular approaches to overcome congestion are limiting speed and sending less

  • Throttling video frame acquistion at the sender when the send buffer is full
  • change the audio/video encoding rate
  • Reduce video frame rate, or video image size at the transmitter
  • modifying the source encoder

The aim of these algorithms is usually a tradeoff between Throughout and Latency. Hence maximizaing throughout and penalizing delays is a formula use often to come up with more mordern congestion cntrol algorithms.

  • NADA ( Network Assisted Dynamic Adaption) which uses a loss vs delay algorithm using OWD,
  • SCREAM ( Sefl Clocked Rate Adaptation for multimedia)
  • GCC ( google congestion control) uses kalman filter on end to end OWD( one way delay) and compares against an adaptive threhsold to throttle the sending rate.

Transport Protocol Optimization

A low latency transport such as UDP is most appropriate for real time transmission of media packet, due to smaller packet and ack less opeartion. A TCP transport for such delay sensitive enviornments is not ideal. Some points that show TCP unsuited for RTP are :

  • Missing packet discovery and retrasmission will take atleast one round trip time using ack which either results in audible gap in playout or the retransmitted packet being discarded by altogether by decoder buffer.
  • TCP packets are heavier than UDP.
  • TCP cannot support multicast
  • TCP congestion control is inaplicable to real time media as it reduces the congestion window when packet loss is detected. This is unsuited to codecs which have a specific sampling like PCM is 64 kb/s + header overhead.

Fasten session establishment

Lower layer protocols are always required to have control over resources in switches, routers or other network relay points. However, RTP provides ways to ensure a fast real-time stream and its feedback including timestamps and control synchronization of different streams to the application layer.

  1. Mux the stream: RTP and RTCP share the same socket and connection, instead of using two separate connections.
  • (+) Since the same port is used fewer ICE candidates gathering is required in WebRTC.

2. Prioritize PoP ( points of presence) under quality control over open internet relays points.

3. Parallelize AAA ( authentication and authorization) with session establishing instead of serializing it.

TradeOff of Latency vs Quality

To achieve reliable transmission the media needs to be compressed as much ( made smaller) which may lead to loss of certain picture quality.

Lossy vs Lossless compression

Loss Less Compression will incur higher latency as compared to lossy compression.

Loss Less CompressionLossy Compression
(+) Better pictue quality(-) lower picture quality
(-) higher power consumption(+) lower power consumption at encoder and decoder
suited for file storagesuited for real time streaming

Intra frame vs Inter frame compressssion

Intra frameInter frame
Intra frame compression reduce bits to decribe a single frame ( lie JPEG)Reduce the bit to decode a series of frame by femoving duplicate information
I – frame : a complete picture without any loss
P – frame : partical picture with delat from previous frame
B – frame : a partical pictureusing modification from previous and future pictures.
suited for images

Container Formats for Streaming

Some time ago Flash + RTMP was the popular choice for streaming. Streaming involves segmenting a audio or audio inot smaller chunks which can be easily transmistted over networks. Container formats contain an encoded video and audio track in a single file which is then streamed using the streaming protocol. It is encoded in different resolutions and bitrates. Once receivevd it is again stored in a container format (MP4, FLV).

ABR formats are HTTP-based, media-streaming communications protocols. As the connection gets slower, the protocol adjusts the requested bitrate to the available bandwidth. Therefore, it can work on different network bandwidths, such as 3G or 4G.

TCP based streaming

  • (-) slow start due to three way handshake , TCP Slow Start and congestion avoidance phase
  • (+) SACK to overcoming reseding the whole chain for a lost packet

UDP based streaming

One of the first television broadcasting techinques, such as in IPTV over fibre with repeaters , was multicast broadcasting with MPEG Transport Stream content over UDP.  This is suited for internal closed networks but not as much for external networks with issues such as interference, shaping, traffic congestion channels, hardware errors, damaged cables, and software-level problems. In this case, not only low latency is required, but also retransmission of lost packets.

  • (-) needs FEC for ovevrcomming lost packets which causes overheads

Real-Time Messaging Protocol (RTMP)

Developed by Macromedia and acquired by Adobe in 2005. Orignally developed to support Flash streaming, RTMP enables segmented streaming. RTMP Codecs

  • Audio Codecs: AAC, AAC-LC, HE-AAC+V1 and V2, OPUS, MP3, SPEEX, VORBIS
  • Video Codecs: H.264, VP6, VP8

RTMP is widely used for ingest and usually has low latency ~5s. RTMP works on TCP usind default port 1935. RTMFP uses UDP by replaced chunked stream. It has many variations such as

  • (-) HTTP incompatible.
  • (-) may get blocked by some frewalls
  • (-) delay from 2-3 s , upto 30 s
  • (+) multicast supported
  • (+) Low buffering
  • (-) non support for vp9/HEVC/AV1

RTMP forms several virtual channels on which audio, video, metadata, etc. are transmitted.

RTSP (Real-Time Streaming Protocol)

RTSP is primarily used for streaming and controlling media such as audio and video on IP networks. It relies on extrenal codecs and security. It is

not typically used for low-latency streaming because of its design and the way it handles data transfer:
RTSP uses TCP (Transmission Control Protocol) as its transport protocol. This protocol is designed to provide reliability and error correction, but it also introduces additional overhead and latency compared to UDP.
RTSP cannot adjust the video and audio bitrate, resolution, and frame rate in real-time to minimize the impact of network congestion and to achieve the best possible quality under the current network conditions as modern streaming technologies do such as WebRTC.

  • (-) legacy requies system software
  • (+) often used by Survillance system or IOT systes with higher latency tolerance
  • (-) mostly compatible with android mobile clients

LL ( Low Latency) – HTTP Live Streaming (HLS)

HLS can adapt the bitrate of the video to the actual speed of the connection using container format such as mp4. Apple’s HLS protocol used the MPEG transport stream container format or .ts (MPEG-TS)

  • Audio Codecs: AAC-LC, HE-AAC+ v1 & v2, xHE-AAC, FLAC, Apple Lossless
  • Video Codecs: H.265, H.264

The sub 2 seond latency using fragmented 200ms chunks .

  • (+) relies on ABR to produce ultra high quality streams
  • (+) widely compatible even HTML5 video players
  • (+) secure
  • (-) higher latency ( sub 2 second) than WebRTC
  • (-) propertiary : HLS capable encoders are not accessible or affordable


MPEG DASH (MPEG DASH (Dynamic Adaptive Streaming over HTTP)) is primarily used for streaming on-demand video and audio content over HTTP. A HTTP based streaming protocol by MPEG, as an alternative to HLS. MPEG-DASH uses .mp4 containers as HLS streams are delivered in .ts format.  

MPEG DASH uses HTTP (Hypertext Transfer Protocol) as its transport protocol, which allows for better compatibility with firewalls and other network devices, but it also introduces additional overhead and latency, hence unsuitable for low latency streaming

  • (+) supports adaptive streaming (ABR)
  • (+) open source
  • (-) incompatible to apple devvices
  • (-) not provided out of box with browsers, requires sytem software for playback

WebRTC (Web Based Real Time Communication)

WebRTC is the most used p2p Web bsed streaming with sub second latency. Inbuild Latency control in WebRTC

Transport protocol

WebRTC uses UDP as its transport protocol, which allows for faster and more efficient data transfer compared to TCP (Transmission Control Protocol). UDP does not require the same level of error correction and retransmission as TCP, which results in lower latency.

P2P (Peer-to-Peer) Connections

WebRTC allows for P2P connections between clients, which reduces the number of hops that data must travel through and thus reduces latency. WebRTC also use Data channel API which can transmit data p2p swiftly such as state changes or messages.

Choice of Codecs

Most WebRTC providers use vp9 codec ( succesor to vp8) for video compression which is great at providing high quality with reduced data.

Network Adaptation

WebRTC adapts to networks in realtime by adjusting bitrate, resolution, framework with changes in networks conditions. This auto adjusting quality also helps WebRTC mitigate the losses under congestion buildup.

  • (+) open source and support for other open source standardized techs such as Vp8, Vp9 , Av1 video codecs , OPUS audio codec. Also supported H264 certains profiles and other telecom codecs
  • (+) secure E2E SRTP
  • (-) evolving still

SRT (Secure Reliable Transport)

Streaming protocol by SRT alliance. Originally developed as open source video streaming potocol by  Haivision and Wowza. This technology has found use primarily in streaming high-quality, low-latency video over the internet, including live events and enterprise video applications.

Packet recovery using  selective and immediate (NAK-based) re-transmission for low latency streaming , advanced codecs and video processing technologies and extended Interoperability have helped its cause.

  • (+) Sub second latency as WebRTC.
  • (+) codec agnostic
  • (+) secure
  • (+) compatible

 Each unit of SRT media or control data created by an application begins with the SRT packet header.

Scalable Video Encoding ( SVC) for low latency decoding

SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. 

RTCRtpCodecCapability dictionary provides information about codec capabilities such as codec MIME media type/subtype,codec clock rate expressed in Hertz, maximum number of channels (mono=1, stereo=2). With the advent of SVC there has been a rising interest in motion-compensation and intra prediction, transform and entropy coding, deblocking, unit packetization. The base layer of an SVC bit-stream is generally coded in compliance with H.264/MPEG4-AVC.

An MP4 file contains media sample data, sample timing information, sample size and location information, and sample packetization information, etc.

CMAF (Common Media Application Format)

Format to enable HTTP based streaming. It is compatible with DASH and HLS players by using a new uniform transport container file. Apple and Microsoft proposed CMAF to the Moving Pictures Expert Group (MPEG) in 2016 . CMAF features

  • (+) Simpler
  • (+) chunked encoder and chunked tranfer
  • (-) 3-5 s of latency
  • (+) increases CDN efficiency

CMAF Addressable MEdia Objects : CMAF header , CMAF segment , CMAF chunk , CMAF track file.

Logical compoennets of CMAF

CMAF Track: contains encoded media samples, including audio, video, and subtitles. Media samples are stored in a CMAF specified container. Tracks are made up of a CMAF Header and one or more CMAF Fragments.

CMAF Switching Set: contains alternative tracks that can be switched and spliced at CMAF Fragment boundaries to adaptively stream the same content at different bit rates and resolutions.

Aligned CMAF Switching Set: two or more CMAF Switching Sets encoded from the same source with alternative encodings; for example, different codecs, and time aligned to each other.

CMAF Selection Set: a group of switching sets of the same media type that may include alternative content (for example, different languages or camera angles) or alternative encodings (for example,different codecs).

CMAF Presentation: one or more presentation time synchronized selection sets.

Fanout Multicast / Anycast

Connection-oriented protocols find it difficult to scale out to large numbers as compared to connectionless protocols like IP/UDP. It is not advisable to implement hybrid, multipoint or cascading architectures for low latency networks as every proxy node will add its buffering delays. More on various RTP topologies is mentioned in the article below.

Ref :

Fault Tolerance and Error Correction in WebRTC

Fluctuating Networks

WebRTC has build in capabilities to detect network glitches and adapt itself to changing situations. Some of the methodologies used are listed below.

Dynamic Bandwidth estimation

Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.


An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.

A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.

  • (+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
  • (-) can introduce latency for packets that arrive early
  • (-) Need active resisizing by means of feedback
    • for hi speed and goog network jitterbuffer can ve small sized
    • for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
  • (-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.

SDP renegotiation


Demand for High Quality Video

Applications telehealth, advertising or broadcasting on WebRTC media streams

Tradeoff between Latency vs Quality

Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.

Layering for adaptive streaming

using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.

  • (-) more complex compression algorithms

Better compression algorithms vs CPU compute

A better performing compression algorithm produces fewer bits to encode the same video quality as its predecessor.

  • (-) Higher performing compression engines most always has higher energy consumption and carbon footprint
  • (+) resilent to network fluctuations

Full INTRA-frame Request (FIR)

Requests a key frame to decode the frame. Can be used when a new peer joins the conference a key frane is required to start decoding its video strea,.

Picture Loss Indication (PLI)

Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
request a full key frame from the sender , when new memeber enters the session.request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss

Redundant Encoding (RED) in Media Packets

Recovers packet loss under lossy networks by adding extra bits of information in following packets.

  • (+) good for unpredictable networks

LBRR ( low bit-rate redundancy) – tbd


Congestion is created when a network path has reached its maximum limits which could be due to

  • failures(switches, routers, cables, fibres ..)
  • over subscription and operating at peak bandwidth.
  • broadcast storms
  • Inapt BGP routing and congestion detection
  • BGP is responsisble for finiding the shortest routable path for a packet

The direct consequences of congestion for any network transport can be

  • High Latency
  • Connection Timeouts
  • Low throughput
  • Packet loss
  • Queueing delay

With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.

Feedback Loop

A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.

Overcome congestion with lower bitrate

Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems

Reduce frame quality and resolution

Full HD constraint
vga constraints

Congestion control Algorithms : Google Congestion Control ( GCC)

Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.

  • On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
  • On the sender side TWCC(Transport wide congestion control) can be used.

Other congestion control algorithms

  • QUIC Loss Detection and Congestion Control RFC 9002
  • Coupled Congestion Control for RTP Media rfc8699
  • NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
  • Self-Clocked Rate Adaptation for Multimedia RMCAT WG
  • SCReAM – Mobile optimised congestion control algorithm by Ericson

Low Network Strength and High Packet Loss

Packet loss is the loss of packets in transmission which could be owing to

  • network resources and path
  • transmission medium congestion
  • applications inability to absord delayed packets.
  • Maximum Transmission Unit size : measure of how large a single apcket can be.

Recovering Lost packets

High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.

Acknowledgement to identify packet loss

A receiver can notifiy the sender of the possible concerns around packet loss by means of sendings acks.

  • Selective Acknowledgement (SACK) : notifies the sender of multiple packets and thereby indicating gaps
  • Negative Acknowledgements (NACK ) : notifies the sender of packets lost
    • RTCP Packet Type 193 denotes NACK.
  • (+) higher NACK count is suggestive of high packet loss
  • (-) round trip time for NACK to send and waiting for packet to be retransmitted and receive in response can cause significant delay

Forward Error Correction (FEC)

The sender proactively send redundant data such that lost packets dont affact the stream on receiver’s end.

  • (+) receiver doesnt have to request for exgtra data to be sent , the sender does it by itself at RTP level
  • (+) less delay than NACK which incurs round trip time
  • (-) involve extra bandwidth.

Long distance Calls and High Round Trip Time

Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.

transmission_time = timestamp_send - timestamp_receive

For this reason RTT( Round Trip Time)is a better means to avoid clock synchoronization errors.

transmission_time = rtt /2 

Using Receiver reports and Sender Reports from RTCP to adjust to network conditions

Sender and receiver reports (SR and RR) provide a highlight of the connection and media quality streaming on this connection.

RTCP Senders report for WebRTC media stream
RTCP receivers report for WebRTC media stream

Low Latency Media Streaming

Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.

Optimize Compute resource

  • mobile agents have lesser computative power
  • Camera with features such as auto focus or other adjustments will taker more time to cappture
  • network should be of suited bandwidth and strength

Reduce information to be encoded and sent

  • Subject focus and blurring backgroud
  • Filtering noise at source
  • Voice Activity Detection (VAD)
    • send extra data in FEC only is there is voice activity detected in packet
  • Echo Cancellation

Measuring latency

Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization

NTP Synchronization of Audio Video Sync

During the buffereng of incoming [ackets ( which canrage from few ten of miliseconds to few hundred milisecond ) the streams are synchronized.

Time used by RTP for sync is NTP and RTP based ( which are not required to be in sync).

  • NTP Timestamp : 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. Formatted as fractional seconds since Jan 1, 1900
  • RTP Timestamp : RTP timestamp corresponds to the same instant as the NTP timestamp. Expressed in the units of the RTP media clock.
    • Majority of video formats use a 90kHz clock.
    • For receiver to sync audio and video streams these two streasm must be from same clock
Frame 300: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0 (outbound)
Packet type: Sender Report (200)
Length: 6 (28 bytes)
Sender SSRC: 0x39a659b4 (967203252)
Timestamp, MSW: 3855754463 (0xe5d224df)
Timestamp, LSW: 2364654374 (0x8cf1c326)
[MSW and LSW as NTP timestamp: Mar 8, 2022 18:54:23.550563999 UTC]
RTP timestamp: 1110449770

Demand for higher security on WebRTC’s CPaaS

Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.

Features :

  • multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
  • Multistreaming transmit several independent streams of chunks in parallel
  • SCTP has similarities to TCP retransmission and partial reliability like UDP.
  • Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
  • Validation and acknowledgment mechanisms protect against flooding attack

SCTP frames data as datagrams and not as a byte stream

  • (+) SCTP enables WebRTC to be multiplexing
  • (+) It has flow control and congestion avoidance support
  • (+)  

End to End Encryption

End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.

Minimize Public-private mapping pairs vai RTCP-mux

Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.

With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .

  • (+) easier to manage security by gathering ICE candidates for a single port only instead of 2
  • (+) increases the systesm capacity for media session using the same number of ports
  • (+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
  • WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing

References :

AEC (Echo Cancellation) and AGC (Gain Control) in WebRTC

Echo is the sound of your own voice reverberating. If the amplitude of such a sound is high and intervals exceed 25 ms, it becomes disruptive to the conversation. Its types can be acoustic or hybrid. Echo cancellers need to eliminate the echo while still preserving call quality and not disrupting tones such as DTMF.

Acoustic Echo 

Usually the background or reflected noise which is an undesired voiceband energy transfers from the speaker to the microphone and into the communication network. Mostly found in a hands-free set or speakerphone. In a multiparty call scenario, it could also occur due to unmatched volume levels, challenging network conditions on one party, background noise, double talk or even proximity between user and microphone

Hybrid / Electronic Echo in PSTN phones

In a public telephone system, local loop wiring is done using two-wire connections carrying bidirectional voice signals. In PBX, a two-to-four wire conversion is done using a hybrid circuit which does not perform perfect impedance matches resulting in a Hybrid echo.

echo AEC
Hybrid / Electronic Echo in PSTN phones

Echo Cancellation

An efficient echo canceller should cancel out the entire echo tail while not leading to any packet loss. It needs to be adaptive to changing IP network bandwidth and algorithm should function equally well in conference scenarios  where there may be more than one echo sources. Benchmarking tools like MOS (Mean Opinion scores ) are used to gauge the  results. Often voice quality enhancement technologies are also integrated into AEC modules, such as :

  • automatic Gain control ( AGC) ,
  • Noise Reduction
  • Confort Noise Generator ( CNG)
  • Non linear processor
  • tone Disabler for SS& and DTMF tones
echo AEC 2
Automatic Echo Cancellation

WebRTC Echo Cancellation

WebRTC now actively detects and removes echo especially the local system echo resonance.

Noise Suppression in WebRTC

Noise suppression automatically filters the audio to remove background noise.

Automatic Gain Control (AGC)

AGC works as a circuit. When the average audio level is low , circuit raises it and if the audio level is high the circuit brings it down.

  • (+) AGC frees the user from manually tuning the audio level.
  • (-) During a pause too , agc tries to bring audio level to standard setting making background noises louder.
  • (-) subesquent audio processing make gain control progressively worse.

Audio Compressor : Due to the drawbacks with AGC , Audio Compressers carry the operation more sophistically by looking at amplitude of the sound.

(-) not ideal for music which had varrying sound amplitude.

Audio Peak Limiter : Limiters simply keep the audio from exceeding a set maximum level.

(+) well suited for avoiding loud noise such as door slam from entering the processing pipeline.

Audio Expanders :increase the dynamic (loudness) range of audio that has been overly processed.

(+) suited for over compressed audio transmissiono such as Satellite relays

Audio Filters :attenuate audio frequencies either above or below certain points within the audio range.

AGC in webRTC


aspectRatio: true
autoGainControl: true
brightness: true
channelCount: true
colorTemperature: true
contrast: true
deviceId: true
echoCancellation: true
exposureCompensation: true
exposureMode: true
exposureTime: true
facingMode: true
focusDistance: true
focusMode: true
frameRate: true
groupId: true
height: true
iso: true
latency: true
noiseSuppression: true
pan: true
pointsOfInterest: true
resizeMode: true
sampleRate: true
sampleSize: true
saturation: true
sharpness: true
tilt: true
torch: true
whiteBalanceMode: true
width: true
zoom: true

WebRTC Get User MEdia with various values of autoGainControl

References :

TeleMedicine and WebRTC

Anywhere anytime Telemedicine communication tool accessible on any device.  The solution provides a low eight signalling server which drops out as soon as call is connected thus ensuring absolutely private calls without relaying or involving any central server in any call related data or media . This ensure doctor patient details are not processed , stored or recorded by our servers.

The solution enables doctors / nurses / medical practitioners and patients  to do

  • High definition Audio/video calls 
  • End to end encrypted p2p chats 
  • Integration with HMS ( hospital management system ) to fetch history of the patients 
  • Screens sharing to show reports without transferring them as files 
  • Include more concerned people of doctors using Mesh based peer to peer conferencing feature.      

Confidentialty and Privacy

For privacy and security of certain health information only HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant video-conferencing tools can only be used for Telemedicine in US.

Telemedicine scenario Callflow

Calllfow for Attended Call Transfer and 2 way conference in a Telemedicine scenario between Patient , hospital attendant , doctor and a nurse

References :

WebRTC Audio/Video Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session. The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

  1. RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.
  2. RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

WebRTC Browsers MUST implement the VP8 video codec as described in RFC6386 and H.264 Constrained Baseline described in RFC 7442.

WebRTC Video Codec and Processing Requirements
Media Flow in WebRTC Call

WebRTC Video Codecs

Most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding. Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123

a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100

a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=102

a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=127

a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125

a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 red/90000
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=124


Developed by on2 and then acquired and opensource by google.

libvpx encoder library.

  • Supported conatiner – 3GP, Ogg, WebM
  • (+) supported simulcast
  • (+) Now free of royality fees.
  • (+) No limit on frame rate or data rate

Maximum resolution of 16384×16384 pixels.

VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.

Encode and decode pixels with an implied 1:1 (square) aspect ratio.


Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates.

  • supported Containers are – MP4, Ogg, WebM
  • (+) Open and free of royalties and any other licensing requirements

H264/AVC constrained

AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC.

  • propertiary, patented codec, mianted by MPEG / ITU

Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

Other profiles , which are not supporedt are Baseline(BP), Extended(XP), Main(MP) , High(HiP) , Progressive High(ProHiP) , High 10(Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

  • supported containers are 3GP, MP4, WebM

Parameter settings:

  • packetization-mode
  • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
  • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
  • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

AV1 (AOMedia Video 1)

open format designed by the Alliance for Open Media. It is royality free and especially designed for internet video HTML element and WebRTC.

  • higher data compression rates than VP9 and H.265/HEVC

offers 3 profiles in increasing support for color depths and chroma subsampling.
1. main,
2. high, and
3. professional

  • supports HDR
  • supports Varible Frame Rate
  • Supported container are ISOBMFF, MPEG-TS, MP4, WebM

Stats for Video based media stream track

timestamp 04/05/2020, 14:25:59
ssrc 3929649593
isRemote false
mediaType video
kind video
trackId RTCMediaStreamTrack_sender_2
transportId RTCTransport_0_1
codecId RTCCodec_1_Outbound_96
[codec] VP8 (payloadType: 96)
firCount 0
pliCount 9
nackCount 476
qpSum 912936
[qpSum/framesEncoded] 32.86666666666667
mediaSourceId RTCVideoSource_2
packetsSent 333664
[packetsSent/s] 29.021823604499957
retransmittedPacketsSent 0
bytesSent 342640589
[bytesSent/s] 3685.7715977714947
headerBytesSent 8157584
retransmittedBytesSent 0
framesEncoded 52837
[framesEncoded/s] 30.022576142586164
keyFramesEncoded 31
totalEncodeTime 438.752
[totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
totalEncodedBytesTarget 335009905
[totalEncodedBytesTarget/s] 3602.7091371103397
totalPacketSendDelay 20872.8
[totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
qualityLimitationReason bandwidth
qualityLimitationResolutionChanges 20
encoderImplementation libvpx
Graph for Video Track in chrome://webrtc-internals

Non WebRTC supported Video codecs

Need active realtime media transcoding


Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.

  • suited for low bandwidth networks
  • (-) not comaptible with WebRTC
    • but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

H.265 / HEVC

proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

  • Container – Mp4

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

WebRTC Audio Codecs

source : unknown

WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000


Opus is a lossy audio compression format developed by the Internet Engineering Task Force (IETF) targeting a broad range of interactive real-time applications over the Internet, from speech to music and supportes multiple compression algorithms

  • Constant and variable bitrate encoding – 6 kbit/s to 510 kbit/s
  • frame sizes – 2.5 ms to 60 ms
  • sampling rates – 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).
  • container- Ogg, WebM, MPEG-TS, MP4

As an open format standardized through RFC 6716, a reference implementation is provided under the 3-clause BSD license. All known software patents which cover Opus are licensed under royalty-free terms.

  • (+ ) flexible, suited for speech ( by SILK) and music ( CELT)
  • (+) support for mono and stereo
  • (+) inbuild FEC( Forward Error Correction) thus resilient to packet loss
  • (+) compression adjustability\ for unpredictable networks
  • (-) Highly CPU intensive ( unsuitable for embedded devices like rpi)
  • (-) processing and memory intensive

For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommends that Opus be offered before PCMA/PCMU.

AAC (Advanvced Audio Encoding)

part of the MPEG-4 (H.264) standard. Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

  • supported containers – MP4, ADTS, 3GP

G.711 (PCMA and PCMU)

G.711 is an ITU standard (1972) for audio compression. It is primarily used in telephony.

ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
vital to interface with the standard telecom network and carriers. G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU

It is the required standard in many voice-based systems and technologies, for example in H.320 and H.323 specifications.

  • Fixed 64Kbpd bit rate
  • supports 3GP container formats


ITU standard (1988) Encoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression

  • 7 kHz Wideband audio codec operating
  • Bitrate 48, 56 and 64 kbit/s.
  • containers used 3GP, AMR-WB

G722 improved speech quality due to a wider speech bandwidth of up to 50-7000 Hz compared to G.711 of 300–3400 Hz.

Comfort noise (CN)

artificial background noise which is used to fill gaps in a transmission instead of using pure silence. It prevents – jarring or RTP Timeout.

Should be used for streams encoded with G.711 or any other supported codec that does not provide its own CN. Use of Discontinuous Transmission (DTX) / CN by senders is optional

Internet Low Bitrate Codec (iLBC)

A opensource narrowband speech codec for VoIP and streaming audio.

  • 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
  • Defined by IETF RFCs 3951 and 3952.

Internet Speech Audio Codec (iSAC)

iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. It is designed for voice transmissions which are encapsulated within an RTP stream.

  • 16 kHz or 32 kHz sampling frequency
  • adaptive and variable bit rate of 12 to 52 kbps.


patent-free audio compression format designed for speech and also a free software speech codec that is used in VoIP applications and podcasts. May be obsolete, with Opus as its official successor.

AMR-WB Adaptive Multi-rate Wideband is a patented wideband speech coding standard that provides improved speech quality. This is codec is generally available on mobile phones.

  • wider speech bandwidth of 50–7000 Hz.
  • data rate is between 6-12 kbit/s, and the

DTMF and ‘audio/telephone-event’ media type

endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

DTMF events list

| 0 | DTMF digit "0"
| 1 | DTMF digit "1"
| 2 | DTMF digit "2"
| 3 | DTMF digit "3"
| 4 | DTMF digit "4"
| 5 | DTMF digit "5"
| 6 | DTMF digit "6"
| 7 | DTMF digit "7"
| 8 | DTMF digit "8"
| 9 | DTMF digit "9"
| 10 | DTMF digit "*"
| 11 | DTMF digit "#"
| 12 | DTMF digit "A"
| 13 | DTMF digit "B"
| 14 | DTMF digit "C"
| 15 | DTMF digit "D"

Stats for Audio Media track

Stats for Audio Media include

  • headerBytesSent
  • packetsSent
  • bytesSent
timestamp 04/05/2020, 14:25:59
ssrc 3005719707
isRemote fals
mediaType audio
kind audio
trackId RTCMediaStreamTrack_sender_1
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
[codec] opus (payloadType: 111)
mediaSourceId RTCAudioSource_1
packetsSent 88277
[packetsSent/s] 50.03762690431027
retransmittedPacketsSent 0
bytesSent 1977974
[bytesSent/s] 150.11288071293083
headerBytesSent 2118648
retransmittedBytesSent 0
Graphs in chrome://webrtc-internals for Audio


m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE

Stats for Datachannel

Statistics RTCDataChannel_1
timestamp 04/05/2020, 14:25:59
label sctp
datachannelid 1
state open
messagesSent 1
[messagesSent/s] 0
bytesSent 228
[bytesSent/s] 0
messagesReceived 1
[messagesReceived/s] 0
bytesReceived 228
[bytesReceived/s] 0

Refrenecs :

Quick links : If you are new to WebRTC read : Introduction to WebRTC is at https://telecom.altanai.com/2013/08/02/what-is-webrtc/

Layers of WebRTC at https://telecom.altanai.com/2013/07/31/webrtc/




Media Capture and streams

const stream = await navigator.mediaDevices.getUserMedia({video: true});


used for screenshare access which could be full csreen a specific appliction or a browser tab

Other APIs in pipeline

  • preferCurrentTab – formerly known as getCurrentBrowsingContextMedia() API
    {video: true, audio: true, preferCurrentTab: true});


Enumerate Devices

Supported constraints

> navigator.mediaDevices.getSupportedConstraints();

aspectRatio: true
autoGainControl: true
brightness: true
channelCount: true
colorTemperature: true
contrast: true
deviceId: true
echoCancellation: true
exposureCompensation: true
exposureMode: true
exposureTime: true
facingMode: true
focusDistance: true
focusMode: true
frameRate: true
groupId: true
height: true
iso: true
latency: true
noiseSuppression: true
pan: true
pointsOfInterest: true
resizeMode: true
sampleRate: true
sampleSize: true
saturation: true
sharpness: true
tilt: true
torch: true
whiteBalanceMode: true
width: true
zoom: true

Peer-to-peer connections

RTCPeerConnection Interface

RTCPeerConnection creates , manages , monitors as well as tears down p2p communication channel

new RTCPeerConnection(servers, pcConstraint);

Deatiled Interface defination

interface RTCPeerConnection : EventTarget {

	RTCConfiguration getConfiguration();
	void setConfiguration(RTCConfiguration configuration);
	void close();

	Promise<RTCSessionDescriptionInit> createOffer(optional RTCOfferOptions options);
	Promise<RTCSessionDescriptionInit> createAnswer(optional RTCAnswerOptions options);
	Promise<void> setLocalDescription(optional RTCSessionDescriptionInit description);
	Promise<void> setRemoteDescription(optional RTCSessionDescriptionInit description);
	Promise<void> addIceCandidate(optional RTCIceCandidateInit candidate);

	readonly attribute RTCSessionDescription? localDescription;
	readonly attribute RTCSessionDescription? currentLocalDescription;
	readonly attribute RTCSessionDescription? pendingLocalDescription;  
	readonly attribute RTCSessionDescription? remoteDescription;
	readonly attribute RTCSessionDescription? currentRemoteDescription;
	readonly attribute RTCSessionDescription? pendingRemoteDescription;
	readonly attribute RTCSignalingState signalingState;
	readonly attribute RTCIceGatheringState iceGatheringState;
	readonly attribute RTCIceConnectionState iceConnectionState;
	readonly attribute RTCPeerConnectionState connectionState;
	readonly attribute boolean? canTrickleIceCandidates;

	void restartIce();
	static sequence<RTCIceServer> getDefaultIceServers();

	attribute EventHandler onnegotiationneeded;
	attribute EventHandler onicecandidate;
	attribute EventHandler onicecandidateerror;
	attribute EventHandler onsignalingstatechange;
	attribute EventHandler oniceconnectionstatechange;
	attribute EventHandler onicegatheringstatechange;
	attribute EventHandler onconnectionstatechange;

RTCConfiguration Dictionary

dictionary RTCConfiguration {
  sequence<RTCIceServer> iceServers;
  RTCIceTransportPolicy iceTransportPolicy;
  RTCBundlePolicy bundlePolicy;
  RTCRtcpMuxPolicy rtcpMuxPolicy;
  DOMString peerIdentity;
  sequence<RTCCertificate> certificates;
  [EnforceRange] octet iceCandidatePoolSize = 0;

RTCOAuthCredential Dictionary

Describes the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server

dictionary RTCOAuthCredential {
  required DOMString macKey;
  required DOMString accessToken;

RTCRtcpMuxPolicy Enum

what ICE candidates are gathered to support non-multiplexed RTCP.

  • negotiate – Gather ICE candidates for both RTP and RTCP candidates. If the remote-endpoint is capable of multiplexing RTCP, multiplex RTCP on the RTP candidates. If it is not, use both the RTP and RTCP candidates separately.
  • require – Gather ICE candidates only for RTP and multiplex RTCP on the RTP candidates. If the remote endpoint is not capable of rtcp-mux, session negotiation will fail.

If the value of configuration.rtcpMuxPolicy is set and its value differs from the connection’s rtcpMux policy, throw an InvalidModificationError. If the value is “negotiate” and the user agent does not implement non-muxed RTCP, throw a NotSupportedError.

An RTCPeerConnection object has a signaling state, a connection state, an ICE gathering state, and an ICE connection state.

RTCSignalingState Enum


An RTCPeerConnection object has an operations chain which ensures that only one asynchronous operation in the chain executes concurrently.

Also an RTCPeerConnection object MUST not be garbage collected as long as any event can cause an event handler to be triggered on the object. When the object’s internal slot is true ie closed, no such event handler can be triggered and it is therefore safe to garbage collect the object.


generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including

  • descriptions of the local MediaStreamTracks attached to this RTCPeerConnection,
  • codec/RTP/RTCP capabilities
  • ICE agent (usernameFragment, password , local candiadtes etc )
  • DTLS connection
const pc = new RTCPeerConnection();
.then(desc => pc.setLocalDescription(desc));

With more attributes

var pc = new RTCPeerConnection();
     mandatory: {
        OfferToReceiveAudio: true,
        OfferToReceiveVideo: true
    optional: [{
        VoiceActivityDetection: false
}).then(function(offer) {
	return pc.setLocalDescription(offer);
.then(function() {
// Send the offer to the remote through signaling server


generates an SDPanswer with the supported configuration for the session that is compatible with the parameters in the remote configuration

var pc = new RTCPeerConnection();
  OfferToReceiveAudio: true
  OfferToReceiveVideo: true
.then(function(answer) {
  return pc.setLocalDescription(answer);
.then(function() {
  // Send the answer to the remote through signaling server

Offer/Answer Options – VoiceActivityDetection

dictionary RTCOfferAnswerOptions {
  boolean voiceActivityDetection = true;

capable of detecting “silence”

dictionary RTCOfferOptions : RTCOfferAnswerOptions {
  boolean iceRestart = false;
dictionary RTCAnswerOptions : RTCOfferAnswerOptions {};

Codec preferences of an m= section’s associated transceiver is said to be the value of the RTCRtpTranceiver with the following filtering applied

  • If direction is “sendrecv”, exclude any codecs not included in the intersection of RTCRtpSender.getCapabilities(kind).codecs and RTCRtpReceiver.getCapabilities(kind).codecs.
  • If direction is “sendonly”, exclude any codecs not included in RTCRtpSender.getCapabilities(kind).codecs.
  • If direction is “recvonly”, exclude any codecs not included in RTCRtpReceiver.getCapabilities(kind).codecs.

Legacy Interface Extensions

partial interface RTCPeerConnection {
Promise<void> createOffer(
 RTCSessionDescriptionCallback successCallback,
 RTCPeerConnectionErrorCallback failureCallback,
 optional RTCOfferOptions options);
Promise<void> setLocalDescription(
optional RTCSessionDescriptionInit description,
VoidFunction successCallback,                                  
RTCPeerConnectionErrorCallback failureCallback);

Promise<void> createAnswer(
RTCSessionDescriptionCallback successCallback,
RTCPeerConnectionErrorCallback failureCallback);  

Promise<void> setRemoteDescription(
optional RTCSessionDescriptionInit description,
VoidFunction successCallback,                    
RTCPeerConnectionErrorCallback failureCallback);

Promise<void> addIceCandidate(
RTCIceCandidateInit candidate,
VoidFunction successCallback,
RTCPeerConnectionErrorCallback failureCallback);

Session Description Model

enum RTCSdpType {
interface RTCSessionDescription {
  readonly attribute RTCSdpType type;
  readonly attribute DOMString sdp;
  [Default] object toJSON();
dictionary RTCSessionDescriptionInit {
  RTCSdpType type;
  DOMString sdp = "";

Priority and QoS Model which can be : “very-low”, “low”, “medium”, “high”


Send and receive MediaStreamTracks over a peer-to-peer connection.
Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.


const pc = new RTCPeerConnection();
let stream = await navigator.mediaDevices.getUserMedia({
video: true});

0: MediaStreamTrack
contentHint: ""
enabled: true
id: "d5b10d39-cf26-4705-8831-6a0fa89546a0"
kind: "video"
label: "Integrated Webcam (0bda:58c2)"
muted: false
oncapturehandlechange: null
onended: null
onmute: null
onunmute: null
readyState: "live"

RTCRtpSenders objects manages encoding and transmission of MediaStreamTracks .

RTCRtpReceivers objects manage the reception and decoding of MediaStreamTracks. These are associated with one track.

  • RTCRtpReceiver.getContributingSources()
  • RTCRtpReceiver.getParameters()
  • RTCRtpReceiver.getStats()
  • RTCRtpReceiver.getSynchronizationSources()
    • returns audio level, rtptimestamp, source , timesatmp
  • RTCRtpReceiver.getCapabilities()

RTCRtpTransceivers interface describes a permanent pairing of an RTCRtpSender and an RTCRtpReceiver. Each transceiver is uniquely identified using its mid ( media id) property from the corresponding m-line.

They are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack(), or explicitly when the application uses the addTransceiver(). They are also created when a remote description is applied that includes a new media description.

rtpTransceiver = RTCPeerConnection.addTransceiver(trackOrKind, init);

trackOrKind should be either audio or video othereise a TypeError is thrown. init is optional – can contain direction , sendEncodings , streams.

var tracks = stream.getTracks();
    pc.addTrack(track, stream)

For example if the stream’s track is


0: MediaStreamTrack
    contentHint: ""
    enabled: true
    id: "d5b10d39-cf26-4705-8831-6a0fa89546a0"
    kind: "video"
    label: "Integrated Webcam (0bda:58c2)"
    muted: false
    oncapturehandlechange: null
    onended: null
    onmute: null
    onunmute: null
    readyState: "live"
[[Prototype]]: MediaStreamTrack
length: 1

the the sender ( fetched from getTransceivers() API) is


0: RTCRtpTransceiver
	currentDirection: null
	direction: "sendrecv"
	headerExtensionsNegotiated: []
	headerExtensionsToOffer: (15) [{…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}]
	mid: null
	receiver: RTCRtpReceiver {track: MediaStreamTrack, transport: null, rtcpTransport: null, playoutDelayHint: null}
	sender: RTCRtpSender
	    dtmf: null
	    rtcpTransport: null
	    track: MediaStreamTrack {kind: 'video', id: 'd5b10d39-cf26-4705-8831-6a0fa89546a0', label: 'Integrated Webcam (0bda:58c2)', enabled: true, muted: false, …}
	    transport: null
	    [[Prototype]]: RTCRtpSender
	    stopped: false
	[[Prototype]]: RTCRtpTransceiver
length: 1

RTCPeerConnection Interface

partial interface RTCPeerConnection {
sequence<RTCRtpSender> getSenders();
sequence<RTCRtpReceiver> getReceivers();
sequence<RTCRtpTransceiver> getTransceivers();

RTCRtpSender addTrack(MediaStreamTrack track, MediaStream... streams);
void removeTrack(RTCRtpSender sender);
RTCRtpTransceiver addTransceiver((MediaStreamTrack or DOMString) trackOrKind, optional RTCRtpTransceiverInit init);
attribute EventHandler ontrack;


dictionary RTCRtpTransceiverInit {
  RTCRtpTransceiverDirection direction = "sendrecv";
  sequence<MediaStream> streams = [];
  sequence<RTCRtpEncodingParameters> sendEncodings = [];

RTCRtpTransceiverDirection can be either of : “sendrecv”, “sendonly”, “recvonly”, “inactive”, “stopped”

RTCRtpSender Interface

Allows an application to control how a given MediaStreamTrack is encoded and transmitted to a remote peer.

interface RTCRtpSender {
readonly attribute MediaStreamTrack? track;
readonly attribute RTCDtlsTransport? transport;
readonly attribute RTCDtlsTransport? rtcpTransport;
static RTCRtpCapabilities? getCapabilities(DOMString kind);
Promise<void> setParameters(RTCRtpSendParameters parameters);
RTCRtpSendParameters getParameters();
Promise<void> replaceTrack(MediaStreamTrack? withTrack);
void setStreams(MediaStream... streams);
Promise<RTCStatsReport> getStats();

RTCRtpParameters Dictionary

dictionary RTCRtpParameters {
  required sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
  required RTCRtcpParameters rtcp;
  required sequence<RTCRtpCodecParameters> codecs;

RTCRtpSendParameters Dictionary

dictionary RTCRtpSendParameters : RTCRtpParameters {
  required DOMString transactionId;
  required sequence<RTCRtpEncodingParameters> encodings;
  RTCDegradationPreference degradationPreference = "balanced";
  RTCPriorityType priority = "low";

RTCRtpReceiveParameters Dictionary

dictionary RTCRtpReceiveParameters : RTCRtpParameters {
  required sequence<RTCRtpDecodingParameters> encodings;

RTCRtpCodingParameters Dictionary

dictionary RTCRtpCodingParameters {
  DOMString rid;

RTCRtpDecodingParameters Dictionary

dictionary RTCRtpDecodingParameters : RTCRtpCodingParameters {};

RTCRtpEncodingParameters Dictionary

dictionary RTCRtpEncodingParameters : RTCRtpCodingParameters {
  octet codecPayloadType;
  RTCDtxStatus dtx;
  boolean active = true;
  unsigned long ptime;
  unsigned long maxBitrate;
  double maxFramerate;
  double scaleResolutionDownBy;

RTCDtxStatus Enum

disabled- Discontinuous transmission is disabled.
enabled- Discontinuous transmission is enabled if negotiated.

RTCDegradationPreference Enum

enum RTCDegradationPreference {

RTCRtcpParameters Dictionary

dictionary RTCRtcpParameters {
  DOMString cname;
  boolean reducedSize;

RTCRtpHeaderExtensionParameters Dictionary

dictionary RTCRtpHeaderExtensionParameters {
  required DOMString uri;
  required unsigned short id;
  boolean encrypted = false;

RTCRtpCodecParameters Dictionary

dictionary RTCRtpCodecParameters {
  required octet payloadType;
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;

payloadType – identify this codec.
mimeType – codec MIME media type/subtype. Valid media types and subtypes are listed in [IANA-RTP-2]
clockRate – expressed in Hertz
channels – number of channels (mono=1, stereo=2).
sdpFmtpLine – “format specific parameters” field from the “a=fmtp” line in the SDP corresponding to the codec

RTCRtpCapabilities Dictionary

dictionary RTCRtpCapabilities {
  required sequence<RTCRtpCodecCapability> codecs;
  required sequence<RTCRtpHeaderExtensionCapability> headerExtensions;

RTCRtpCodecCapability Dictionary

dictionary RTCRtpCodecCapability {
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;

RTCRtpHeaderExtensionCapability Dictionary

dictionary RTCRtpHeaderExtensionCapability {
  DOMString uri;

Example JS code to RTCRtpCapabilities

const pc = new RTCPeerConnection();
const transceiver = pc.addTransceiver('audio');
const capabilities = RTCRtpSender.getCapabilities('audio');

Output :

codecs: Array(13)
0: {channels: 2, clockRate: 48000, mimeType: "audio/opus", sdpFmtpLine: "minptime=10;useinbandfec=1"}
1: {channels: 1, clockRate: 16000, mimeType: "audio/ISAC"}
2: {channels: 1, clockRate: 32000, mimeType: "audio/ISAC"}
3: {channels: 1, clockRate: 8000, mimeType: "audio/G722"}
4: {channels: 1, clockRate: 8000, mimeType: "audio/PCMU"}
5: {channels: 1, clockRate: 8000, mimeType: "audio/PCMA"}
6: {channels: 1, clockRate: 32000, mimeType: "audio/CN"}
7: {channels: 1, clockRate: 16000, mimeType: "audio/CN"}
8: {channels: 1, clockRate: 8000, mimeType: "audio/CN"}
9: {channels: 1, clockRate: 48000, mimeType: "audio/telephone-event"}
10: {channels: 1, clockRate: 32000, mimeType: "audio/telephone-event"}
11: {channels: 1, clockRate: 16000, mimeType: "audio/telephone-event"}
12: {channels: 1, clockRate: 8000, mimeType: "audio/telephone-event"}
length: 13

headerExtensions: Array(6)
0: {uri: "urn:ietf:params:rtp-hdrext:ssrc-audio-level"}
1: {uri: "http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time"}
2: {uri: "http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01"}
3: {uri: "urn:ietf:params:rtp-hdrext:sdes:mid"}
4: {uri: "urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id"}
5: {uri: "urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id"}
length: 6

RTCRtpReceiver Interface

allows an application to inspect the receipt of a MediaStreamTrack.

interface RTCRtpReceiver {
  readonly attribute MediaStreamTrack track;
  readonly attribute RTCDtlsTransport? transport;
  readonly attribute RTCDtlsTransport? rtcpTransport;
  static RTCRtpCapabilities? getCapabilities(DOMString kind);
  RTCRtpReceiveParameters getParameters();
  sequence<RTCRtpContributingSource> getContributingSources();
  sequence<RTCRtpSynchronizationSource> getSynchronizationSources();
  Promise<RTCStatsReport> getStats();

dictionary RTCRtpContributingSource

dictionary RTCRtpContributingSource {
  required DOMHighResTimeStamp timestamp;
  required unsigned long source;
  double audioLevel;
  required unsigned long rtpTimestamp;

dictionary RTCRtpSynchronizationSource

dictionary RTCRtpSynchronizationSource : RTCRtpContributingSource {
  boolean voiceActivityFlag;

voiceActivityFlag of type boolean – Only present for audio receivers. Whether the last RTP packet, delivered from this source, contains voice activity (true) or not (false).

RTCRtpTransceiver Interface

Each SDP media section describes one bidirectional SRTP (“Secure Real Time Protocol”) stream. RTCRtpTransceiver describes this permanent pairing of an RTCRtpSender and an RTCRtpReceiver, along with some shared state. It is uniquely identified using its mid property.

Thus it is combination of an RTCRtpSender and an RTCRtpReceiver that share a common mid. An associated transceiver( with mid) is one that’s represented in the last applied session description.

interface RTCRtpTransceiver {
    readonly attribute DOMString? mid;
    [SameObject] readonly attribute RTCRtpSender sender;
    [SameObject] readonly attribute RTCRtpReceiver receiver;
    attribute RTCRtpTransceiverDirection direction;
    readonly attribute RTCRtpTransceiverDirection? currentDirection;
    void stop();
    void setCodecPreferences(sequence<RTCRtpCodecCapability> codecs);

Method stop() – Irreversibly marks the transceiver as stopping, unless it is already stopped. This will immediately cause the transceiver’s sender to no longer send, and its receiver to no longer receive.
stopping transceiver will cause future calls to createOffer to generate a zero port in the media description for the corresponding transceiver and stopped transceiver will cause future calls to createOffer or createAnswer to generate a zero port in the media description for the corresponding transceiver


This method overrides the default codec preferences used by the user agent.

Example setting codec Preference for OPUS in audio

peer = new RTCPeerConnection();    
const transceiver = peer.addTransceiver('audio');
const audiocapabilities = RTCRtpSender.getCapabilities('audio');
let codec = [];
codec.push(audiocapabilities.codecs[0]); // opus 

Before setting codec preference for OPUS

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000

After setting codec preference for OPUS audio

m=audio 9 UDP/TLS/RTP/SAVPF 111
c=IN IP4
a=rtcp:9 IN IP4

a=msid:hcgvWcGG7WhdzboWk79q39NiO8xkh4ArWhbM f15d77bb-7a6f-4f41-80cd-51a3c40de7b7
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1


Access to information about the Datagram Transport Layer Security (DTLS) transport over which RTP and RTCP packets are sent and received by RTCRtpSender and RTCRtpReceiver objects, as well other data such as SCTP packets sent and received by data channels.
Each RTCDtlsTransport object represents the DTLS transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

interface RTCDtlsTransport : EventTarget {
  [SameObject] readonly attribute RTCIceTransport iceTransport;
  readonly attribute RTCDtlsTransportState state;
  sequence<ArrayBuffer> getRemoteCertificates();
  attribute EventHandler onstatechange;
  attribute EventHandler onerror;

RTCDtlsTransportState Enum

  • “new”- DTLS has not started negotiating yet.
  • “connecting” – DTLS is in the process of negotiating a secure connection and verifying the remote fingerprint.
  • “connected”- DTLS has completed negotiation of a secure connection and verified the remote fingerprint.
  • “closed” – transport has been closed intentionally like close_notify alert, or calling close().
  • “failed” – transport has failed as the result of an error like failure to validate the remote fingerprint

RTCDtlsFingerprint dictionary

dictionary RTCDtlsFingerprint {
  DOMString algorithm;
  DOMString value;

Protocols multiplexed with RTP (e.g. data channel) share its component ID. This represents the component-id value 1 when encoded in candidate-attribute while ICE candadte for RTCP has component-id value 2 when encoded in candidate-attribute.


The track event uses the RTCTrackEvent interface.

interface RTCTrackEvent : Event {
  readonly attribute RTCRtpReceiver receiver;
  readonly attribute MediaStreamTrack track;
  [SameObject] readonly attribute FrozenArray<MediaStream> streams;
  readonly attribute RTCRtpTransceiver transceiver;

dictionary RTCTrackEventInit

dictionary RTCTrackEventInit : EventInit {
  required RTCRtpReceiver receiver;
  required MediaStreamTrack track;
  sequence<MediaStream> streams = [];
  required RTCRtpTransceiver transceiver;


This interface candidate Internet Connectivity Establishment (ICE) configuration used to setup RTCPeerconnection. To facilitate routing of media on given peer connection, both endpoints exchange several candidates and then one candidate out of the lot is chosen which will be then used to initiate the connection.

const pc = new RTCPeerConnection();
  • candidate – transport address for the candidate that can be used for connectivity checks.
  • component – candidate is an RTP or an RTCP candidate
  • foundation – unique identifier that is the same for any candidates of the same type , helps optimize ICE performance while prioritizing and correlating candidates that appear on multiple RTCIceTransport objects.
  • ip , port
  • priority
  • protocol – tcp/udp
  • relatedAddress , relatedPort
  • sdpMid – candidate’s media stream identification tag
  • sdpMLineIndex

usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).

RTCIceCredentialType Enum : supports OAuth 2.0 based authentication. The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection setConfiguration method to periodically refresh the TURN credentials.

enum RTCIceCredentialType {

RTCIceServer Dictionary : Describes the STUN and TURN servers that can be used by the ICE Agent to establish a connection with a peer.

dictionary RTCIceServer {    
   required (DOMString or sequence<DOMString>) urls; 
   DOMString username;      
   (DOMString or RTCOAuthCredential) credential; 
   RTCIceCredentialType credentialType = "password";  
 [{urls: 'stun:stun1.example.net'},
  {urls: ['turns:turn.example.org', 'turn:turn.example.net'],
    username: 'user',
    credential: 'myPassword',
    credentialType: 'password'},
  {urls: 'turns:turn2.example.net',
    username: 'xxx',
    credential: {
      macKey: 'xyz=',
      accessToken: 'xxx+yyy=='
    credentialType: 'oauth'}

RTCIceTransportPolicy Enum

ICE candidate policy [JSEP] to select candidates for the ICE connectivity checks

  • relay – use only media relay candidates such as candidates passing through a TURN server. It prevents the remote endpoint/unknown caller from learning the user’s IP addresses
  • all – ICE Agent can use any type of candidate when this value is specified.

RTCBundlePolicy Enum

  • balanced – Gather ICE candidates for each media type (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
  • max-compat – Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
  • max-bundle – Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

If the value of configuration.bundlePolicy is set and its value differs from the connection’s bundle policy, throw an InvalidModificationError.

Interfaces for Connectivity Establishment

describes ICE candidates

interface RTCIceCandidate {
  DOMString candidate;
  DOMString sdpMid;
  unsigned short sdpMLineIndex;
  DOMString foundation;
  RTCIceComponent component;
  unsigned long priority;
  DOMString address;
  RTCIceProtocol protocol;
  unsigned short port;
  RTCIceCandidateType type;
  RTCIceTcpCandidateType tcpType;
  DOMString relatedAddress;
  unsigned short relatedPort;
  DOMString usernameFragment;
  RTCIceCandidateInit toJSON();

RTCIceProtocol can be either tcp or udp

TCP candidate type which can be either of

  • active – An active TCP candidate is one for which the transport will attempt to open an outbound connection but will not receive incoming connection requests.
  • passive – A passive TCP candidate is one for which the transport will receive incoming connection attempts but not attempt a connection.
  • so – An so candidate is one for which the transport will attempt to open a connection simultaneously with its peer.

UDP candidate type

  • host – actual direct IP address of the remote peer
  • srflx – server reflexive ,  generated by a STUN/TURN server
  • prflx – peer reflexive ,IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE
  • relay – generated using TURN

ICE Candidate UDP Host

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:27784895 1 udp 2122260223 51577 typ host generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:27784895 1 udp 2122260223 51382 typ host generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:27784895 1 udp 2122260223 53600 typ host generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate TCP Host

Notice TCP host candidates for mid 0 , 1 and 3 for video , audio and data media types

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate UDP Srflx

Notice 3 candidates for 3 streams sdpMid 0,1 and 2

sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:2163208203 1 udp 1686052607 27177 typ srflx raddr rport 53600 generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:2163208203 1 udp 1686052607 27176 typ srflx raddr rport 51382 generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2163208203 1 udp 1686052607 27175 typ srflx raddr rport 51577 generation 0 ufrag muSq network-id 1 network-cost 10
Peer reflexive Candidate

ICE Candidate (host)

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2880323124 1 udp 2122260223 61622 typ host generation 0 ufrag jsPO network-id 1 network-cost 10

usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).



interface RTCPeerConnectionIceEvent : Event {
  readonly attribute RTCIceCandidate? candidate;
  readonly attribute DOMString? url;


interface RTCPeerConnectionIceErrorEvent : Event {
  readonly attribute DOMString hostCandidate;
  readonly attribute DOMString url;
  readonly attribute unsigned short errorCode;
  readonly attribute USVString errorText;

RTCIceTransport Interface

Access to information about the ICE transport over which packets are sent and received. Each RTCIceTransport object represents the ICE transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

interface RTCIceTransport : EventTarget {
    readonly attribute RTCIceRole role;
    readonly attribute RTCIceComponent component;
    readonly attribute RTCIceTransportState state;
    readonly attribute RTCIceGathererState gatheringState;
    sequence<RTCIceCandidate> getLocalCandidates();
    sequence<RTCIceCandidate> getRemoteCandidates();
    RTCIceCandidatePair? getSelectedCandidatePair();
    RTCIceParameters? getLocalParameters();
    RTCIceParameters? getRemoteParameters();
    attribute EventHandler onstatechange;
    attribute EventHandler ongatheringstatechange;
    attribute EventHandler onselectedcandidatepairchange;

RTCIceParameters Dictionary

dictionary RTCIceParameters {
  DOMString usernameFragment;
  DOMString password;

RTCIceCandidatePair Dictionary

dictionary RTCIceCandidatePair {
  RTCIceCandidate local;
  RTCIceCandidate remote;

RTCIceGatheringState Enum

  1. “closed”,
  2. “failed”,
  3. “disconnected”,
  4. “new”,
  5. “connecting”,
  6. “connected”

RTCIceGathererState Enum

  1. “new”,
  2. “gathering”,
  3. “complete”

RTCIceTransportState Enum

  1. new – ICE agent is gathering addresses or is waiting to be given remote candidates 
  2. checking –
  3. connected – Found a working candidate pair, but still performing connectivity checks to find a better one.
  4. completed – Found a working candidate pair and done performing connectivity checks.
  5. disconnected
  6. failed
  7. closed

RTCIceRole Enum

unknown : agent who role is not yet defined
controlling : controlling agent
controlled : controlled agent

RTCIceComponent Enum

rtp : ICE Transport is used for RTP (or RTCP multiplexing)
rtcp : ICE Transport is used for RTCP

Peer-to-peer Data API

Connection.createDataChannel('sendDataChannel', dataConstraint);

With SCTP, the protocol used by WebRTC data channels, reliable and ordered data delivery is on by default.

Sending large files

 Split data channel message in chunks 
var CHUNK_LEN = 64000; // 64 Kb
var img = photoContext.getImageData(0, 0, photoContextW, photoContextH),
len = img.data.byteLength,
n = len / CHUNK_LEN | 0;

for (var i = 0; i < n; i++) {
    var start = i * CHUNK_LEN, end = (i + 1) * CHUNK_LEN;
    dataChannel.send(img.data.subarray(start, end));
// last chunk
if (len % CHUNK_LEN) {
   dataChannel.send(img.data.subarray(n * CHUNK_LEN));


The browser maintains a set of statistics for monitored objects, in the form of stats objects. A group of related objects may be referenced by a selector( like MediaStreamTrack that is sent or received by the RTCPeerConnection).

Statistics API extends the RTCPeerConnection interface

partial interface RTCPeerConnection {
  Promise<RTCStatsReport> getStats(optional MediaStreamTrack? selector = null);
  attribute EventHandler onstatsended;


Method getStats() Gathers stats for the given selector and reports the result asynchronously.

  clockRate: 90000,
  id: "RTCCodec_0_Inbound_96",
  mimeType: "video/VP8",
  payloadType: 96,
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_97",
  mimeType: "video/rtx",
  payloadType: 97,
  sdpFmtpLine: "apt=96",
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_98",
  mimeType: "video/VP9",
  payloadType: 98,
  sdpFmtpLine: "profile-id=0",
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_99",
  mimeType: "video/rtx",
  payloadType: 99,
  sdpFmtpLine: "apt=98",
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"

Quality limitation in pc.getstats()

  bytesSent: 2646771,
  codecId: "RTCCodec_0_Outbound_96",
  encoderImplementation: "libvpx",
  firCount: 0,
  frameHeight: 540,
  framesEncoded: 471,
  framesPerSecond: 31,
  framesSent: 471,
  frameWidth: 960,
  headerBytesSent: 67900,
  hugeFramesSent: 0,
  id: "RTCOutboundRTPVideoStream_971979131",
  keyFramesEncoded: 4,
  kind: "video",
  mediaSourceId: "RTCVideoSource_48",
  mediaType: "video",
  nackCount: 0,
  packetsSent: 2657,
  pliCount: 0,
  qpSum: 6598,
  qualityLimitationDurations: {
    bandwidth: 15882,
    cpu: 0,
    none: 36,
    other: 0
  qualityLimitationReason: "bandwidth",
  qualityLimitationResolutionChanges: 0,
  remoteId: "RTCRemoteInboundRtpVideoStream_971979131",
  retransmittedBytesSent: 0,
  retransmittedPacketsSent: 0,
  ssrc: 971979131,
  timestamp: 1644263635999.547,
  totalEncodedBytesTarget: 0,
  totalEncodeTime: 2.866,
  totalPacketSendDelay: 47.68,
  trackId: "RTCMediaStreamTrack_sender_48",
  transportId: "RTCTransport_0_1",
  type: "outbound-rtp"

RTCStatsReport Object

map between strings that identify the inspected objects (id attribute in RTCStats instances), and their corresponding RTCStats-derived dictionaries.

interface RTCStatsReport {
  readonly maplike<DOMString, object>;

RTCStats Dictionary

stats object constructed by inspecting a specific monitored object.

dictionary RTCStats {
  required DOMHighResTimeStamp timestamp;
  required RTCStatsType type;
  required DOMString id;


Constructor (DOMString type, RTCStatsEventInit eventInitDict)]

interface RTCStatsEvent : Event {
  readonly attribute RTCStatsReport report;

dictionary RTCStatsEventInit

dictionary RTCStatsEventInit : EventInit {
  required RTCStatsReport report;


  • RTCSenderVideoTrackAttachmentStats
  • RTCSenderAudioTrackAttachmentStats

Stream stats

  • RTCRTPStreamStats : ssrc, kind, transportId, codecId + attr
  • RTCReceivedRTPStreamStats : packetsReceived, packetsLost, jitter, packetsDiscarded + attr
  • RTCSentRTPStreamStats : packetsSent, bytesSent + attr
  • RTCPeerConnectionStats, with attributes dataChannelsOpened, dataChannelsClosed
  • RTCDataChannelStats : label, protocol, dataChannelIdentifier, state, messagesSent, bytesSent, messagesReceived, bytesReceived + attr
  • RTCMediaStreamStats : streamIdentifer, trackIds
  • RTCVideoSenderStats : framesSent
  • RTCVideoReceiverStats : framesReceived, framesDecoded, framesDropped, partialFramesLost + attr
  • RTCCodecStats : payloadType, codecType, mimeType, clockRate, channels, sdpFmtpLine
  • RTCTransportStats : bytesSent, bytesReceived, rtcpTransportStatsId, selectedCandidatePairId, localCertificateId, remoteCertificateId
  • RTCCertificateStats : fingerprint, fingerprintAlgorithm, base64Certificate, issuerCertificateId + attr

OutboundInbound stats

  • RTCOutboundRTPStreamStats : trackId, senderId, remoteId, framesEncoded, nackCount + attr
  • RTCRemoteOutboundRTPStreamStats : localId, remoteTimestamp + attr
  • RTCInboundRTPStreamStats : trackId, receiverId, remoteId, framesDecoded, nackCount + attr
  • RTCRemoteInboundRTPStreamStats : localId, bytesReceived, roundTripTime + attr

Handler Stats

  • RTCMediaHandlerStats : trackIdentifier, ended + attr
  • RTCAudioHandlerStats : audioLevel + attr
  • RTCVideoHandlerStats : frameWidth, frameHeight, framesPerSecond + attr

RTCICE stats

  • RTCIceCandidatePairStats : transportId, localCandidateId, remoteCandidateId, state, priority, nominated, bytesSent, bytesReceived, totalRoundTripTime, currentRoundTripTime
  • RTCIceCandidateStats : address, port, protocol, candidateType, url

References :

JavaScript Session Establishment Protocol (JSEP) in WebRTC handshake

This article is aimed at explaining the intricacies and detailed offer answer flow in webrtc handshake and JSEP. You can read the following articles on WebRTC as a prereq before reading through this one. WebRTC has API s namely – Peerconnection , getUserMedia , Datachannel and getStats.

JSEP (JavaScript Session Establishment Protocol)

JSEP is used during signalling via w3c’s recommended RTCPeerConnectionAPI interface to set up a multimedia session. The multimedia session description specifies the critical components of setting up a session between local and remote such as transport ports, protocol, profiles. It also handles the interaction with the ICE state machine.

Offer/Answer Excahange Flow

prereq : Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

  1. Side initiating the session creates a offer by CreateOffer() API
aPromise = myPeerConnection.createOffer([options]);

options is type of RTC Offer Options

  • iceRestart
  • offerToReceiveAudio ( legacy)
  • offerToReceiveVideo ( legacy)
  • voiceActivityDetection

2. The application then stores the offer in local config as setLocalDescriptionAPI()

 myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);

3. Offer is sent to remote side using its choice of signalling ( SIP , WS , HTTP, XMPP .. )

4. Remote party stores it use setRemoteDescription() API

.then(function () {
  return createMyStream();

4. Remote part generates an answer using createAnswer() API

aPromise = RTCPeerConnection.createAnswer([options]);

5. Remote party stores the answer in its local config using setLocalDescription() API

6. Answer is transferred to Initiator side using choice of signalling ( SIP , WS , HTTP, XMPP .. ) again

7. Initiating side stores it use setRemoteDescription() API

Interfaces of webrtc and tracks to stream addition

Perform webRTC handshake

Webrtc call setup and incoming call callflow between remote peer , peerconnection factory , peerconnection and application

Outgoing Call: setting up a call with remote by sending an offer. Wait for the remote’s answer to process it to create the session.

JSEP setup a call

Incoming Call : Receive remote’s offer and process to reply with an answer.

JSEP flow to receive a call

Signalling state Transitions on PeerConnection

As the caller initiates a new RTCPeerConnection() , the RTCSignalingState state is “stable” as remote and local descriptions are empty

As the caller initiates call and calls createOffer() , he now has offer SDP and procced to store offer locally with setLocalDescription(offer) the RTCSignalingState state is “have-local-offer” . After than caller send the offer to callee over signalling channel

Simillarily as the calle recives the offer, it starts with RTCSignalingState stable and then proceeds to store the Remote’s offer using setRemoteDescription(offer), its state is now “have-remote-offer”

The callee generates a provsional answer and for caller and stores it locally , state transitiosn to “have-local-pranswer“. The pranswer SDP is send to caller over signalling channel again .

Caller stores the callee’s pr answer SDP and state updates to “have-remote-pranswer”

img : https://w3c.github.io/webrtc-pc

Once there is no offer/answer exchange in progress the state again changes to ” stable “.

State schanges to “closed” if RTCpeerConnection is closed

Detailed Offer / Answer SDP

Local Offer created by side initiating the session / Caller

The first offfer called initial offer can have dummy date for contact line such as to prevent leaking a local Ip address

c=IN IP4
a=rtcp:9 IN IP4

“o=” line contains <username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

o=- 4445251981417004127 2 IN IP4 127.0.0.

shows username – and 4445251981417004127 as session id. Same username “-” is specified in “s=” line

“t=” line shows <start time> <stop time>

t=0 0

Full session Block example

type: offer, sdp: v=0
o=- 4445251981417004127 2 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2

Media Section : An m= section is generated for each RtpTransceiver that has been added to the PeerConnection. For the initial offer since no ports are available yet , dummy port 9 can be sadded. However if it is bundle only then port value is set to 0. Later the port value will be set to the port value of default ICE candidate.

DTLS filed “UDP/TLS/RTP/SAVPF” is followed by the list of codecs in order of priority.

“c=” line in msection too must be filled with dummy values if IP as no candidates are available yet .




“a=ice-ufrag” , “a=ice-pwd” , “a=fingerprint” , “a=setup” , “a=tls-id”

Media Stream Identification attribute “a-mid:”

For each media format on the m= line, “a=rtpmap” for “rtx” with the clock rate of codec and “a=fmtp” to reference the payload type of the primary codec.  “a=rtcp-fb” specified RTCP feedback

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

Audio Block exmaple

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3968544080 cname:da0nYe1oYR8AvVNp
a=ssrc:3968544080 msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=ssrc:3968544080 mslabel:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2
a=ssrc:3968544080 label:7525d75c-ffe7-4038-8b71-653d249e63bb

// remove video section for simplicity

Data Block is created if data channle has been created with m= section for data.

“a=sctp-port” line referencing the SCTP port number set to 5000

 “a=max-message-size”  set to 262144 here

Data Block example

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A

Subsequent Offers

When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processig is different due to gathered ICE candidates . However the <session-version> is not changed .

Additionally m section is updated if RtpTransceiver is added or removed

Each “m=” and c=” line MUST be filled in with the port, relevant RTP profile, and address of the default candidate for the m= section

If the m= section is not bundled into another m= section, update the “a=rtcp” with port and address of RTCP camdidate and add “a=camdidate” with  “a=end-of-candidates” 

Local Answer created by side receiving the session/ Callee

When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer. 

Each offered m= section will have an associated RtpTransceiver

Remote Destination / Callee can reject the m section by setting port in m line to 0 . It can reject msection if neither of the offered media format are supported , RtpTransceiver is stoopped etc.

For the initial offer the dummy port value of 9 is set as no ICE candudate is avaible yet. Simillarly  “c=” line must contain the “dummy” value “IN IP4” too.

The <proto> field MUST be set to exactly match the <proto> field for the corresponding m= line in the offer.

type: answer, sdp: v=0
o=- 5730481682283561642 3 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT

Audio section

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT e817fe0f-1cc0-4901-9fd9-e810289cc85d
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3260997313 cname:FxLUKuXrLQe0r1rn

Video section removed for simplicity

Data stream

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43

Subsequent Answers

 Port value would normally be set to the port of the default ICE candidate for this m= section. For the exmaple above

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

will be changes with relevant port adress such as

type: offer, sdp: v=0
o=- 6407282338169184323 3 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS bSrCUCFybGovIy0FUhPTZAr9ToRmx8I09nEj
m=audio 55375 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 55375 typ host generation 0 network-id 1 network-c

Simillarly m video and data line will also get ports

m=video 53877 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 53877 typ host generation 0 network-id 1 network-cost 10
m=application 57991 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=candidate:2880323124 1 udp 2122260223 57991 typ host generation 0 network-id 1 network-cost 10

If the answer contains any “a=ice-options” attributes where “trickle” is listed as an attribute, update the PeerConnection canTrickle property to be true. 

Modifying Offer/answer SDP

SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription. After calling setLocalDescription with an offer or answer, the application MAY modify the SDP to reduce its capabilities before sending it to the far side.

Assume we have a MCU at location and want the video stream to relay via a Media Server.

SDP Parsing

SDP is used for session parsing and contians sequence of line with key value pairs. SDP is read, line-by-line, and converted to a data structure that contains the deserialized information.

JSEP SDP bears a lot of simillarity to SIP SDP explained here : SIP and SDP Messages Explained

Session-Level Parsing

  • Line “v=” , “o=”,”b=” and “a=” are processed . The “i=”, “u=”, “e=”, “p=”, “t=”, “r=”, “z=”, and “k=” lines are not used by this specification; they MUST be checked for syntax but their values are not used. Line “c=” is checked for syntax and ICE mismatch detection
  • “a= ” attribute could be : “a=group” , “s=”ice-lite” , “a=ice-pwd”, “a=ice-options” , “a=fingerprint”, “a=setup” , a=tls-id”, “a=identity” , “a=extmap”

Media Section Parsing

Line “m=” for media , proto , port , fmt in RTP

Attributes “a=” can be :

  • “a=rtpmap” or “a=fmtp” : map from an RTP payload type number to a media encoding name that identifies the payload format.
a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]
m=audio 49230 RTP/AVP 96 97 98
a=rtpmap:96 L8/8000
a=rtpmap:97 L16/8000
a=rtpmap:98 L16/11025/2
  • Packetization parameters as “a=ptime” , “a=maxptime” which define the length of each RTP packet.
  • Direction as  “a=sendrecv” , a=recvonly , a=sendonly , a=inactive
  • Muxing as “a=rtcp-mux” , “a=rtcp-mux-only”
  • RTCP attributes “a=rtcp” , “a=rtcp-rsize”
  • Line “c=” is checked.
  • Line “b=” for bandiwtdh , bwtype
  • Attribites for “a=” could be “a=ice-ufrag”, “a=”ice-pwd”, “a=ice-options” , “a=candidate”, “a=remote-candidate” , a=end-of-candidates” and “a=fingerprint”

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN, also allows for address selection for multi-homed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

ICE Gathering

Caller and callee performs checks to finalize the protocol and routing needed to establish a peer connection . Number of candudates are proposed till they mutually agree upon one . Peerconnection then uses that candiadte detaisl to initiate the connection .

While Applying a Local Description at the media engine level if m= section is new, WebRTC media stacks begins gathering candidates for it.

RTCPeerconnection specified canTrickleIceCandidates. ICE trickling is the process of continuing to send candidates after the initial offer or answer has already been sent to the other peer.

ICE TransportRole is responsible for Choosing a candidate pair.

ICE layer sets one peer as controlling and other as controlled agent. The controling agent makes the final decision as to which candidate pair to choose.

Final selected canduadte in SDP

a=group:BUNDLE 0 1 2
a=msid-semantic: WMS 9Cv3eIelHVuhxrGfxSvUsfokNu4eb4R9PYw2

m=audio 59937 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 x.x.x.x
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 x.x.x.x 59937 typ host generation 0 network-id 1 network-cost 10
a=candidate:3844981444 1 tcp 1518280447 x.x.x.x 9 typ host tcptype active generation 0 network-id 1 network-cost 10

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    • translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    • addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address

Steps for mappling Server Reflexive Address

  1. Agent sends the TURN Allocate request from IP address and port X:x,
  2. NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
  3. Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
  4. Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

Only STUN based Binding

agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

Checks : ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first.

  • TRIGGERED CHECKS – accelerates the process of finding a valid candidate
  • ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

Checks ensure maintaining frozen candidates and pairs with some foundation for media stream. Each candidate pair in the check list has a foundation and a state. States for candidates pairs

1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.

2. In-Progress: A check has been sent for this pair, but the transaction is in progress.

3. Succeeded: A check for this pair was already done and produced a successful result.

4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.

5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 37542 typ srflx raddr rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 27190 typ relay raddr rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 28049 typ relay raddr rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

Selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement. Controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. There are 2 ways : regular nomination and aggressive nomination.

ReadMore :

WebRTC Media Stack

WebRTC service’s

References :

  • [1] WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019 http://w3c.github.io/webrtc-pc/
  • [2] RFC 5245 Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols

WebRTC CPaaS ( Communication Platform as a Service )

A CPasS ( communication platform as a service ) is a cloud-based communication platform like B2B cloud communications platform that provides real-time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces. Traditionally, with IP protected protocols, licensed codecs maintaining a signalling protocol stack, and network interfaces building a communication platform was a costly affair. Cisco, Facetime, and Skype were the only OTT ( over the top) players taking away from the telco’s call revenue. However, with the advent of standardised, open-source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real-time communications on his platform has many options to choose from. This article provides an insight into how CPaaS solutions are architectured and programmed.

SIP based Communication Platform as a Service

SIP and WebRTC are many a times closely knit together as protocl, and media plane techologies to build a communication platform such as CPaaS , UCC, B2b call agent , call centre applicatioinsso on. This integration expected to continue to evolve and improve in order to meet the growing demands of users for high-quality, low-latency communication.

Sample CPass Architecture build on open source technologies

Over all Archietcture of Real Time Comunication ecosystem with Media management, CDR , processing pielines , real time analytics.

Data Streams for realtime analytics and telemetrics

There are several assessment technologies that can be used for measuring the quality of WebRTC (Web Real-Time Communications) calls, including:

  1. Mean Opinion Score (MOS): A standardized method for measuring the quality of voice and video calls, based on human perception.
  2. Packet loss and jitter: Measures the amount of packet loss and variation in packet arrival times, which can impact the quality of a call.
  3. Round-trip time (RTT): Measures the time it takes for a packet to travel from the sender to the receiver and back, which can affect the delay in a call.
  4. Bitrate: Denotes the amount of data that is transmitted during a call, which can impact the quality of the audio and video.
  5. Codecs chosen can impact the quality and bandwidth requirements of the call.
  6. Network conditions
  7. Quality of Service (QoS): Measures the quality of the network connection and the ability of the network to support real-time communications.
  8. WebRTC specific metrics: such as video resolution, frames per seconds, audio level, and so on.
  9. PESQ (Perceptual Evaluation of Speech Quality)  predict subjective opinion scores of a degraded audio such as warping , varioioable delays
  10. PSR( Peak signal to noise ration)

These technologies can be used in combination to provide a comprehensive assessment of the quality of a WebRTC call and to identify any issues that may be impacting the call quality.

I have written an article before on Steps for building and deploying WebRTC solution , which includes standalone, cloud hosted and TURN based NAT handler systems .

A typical CPaaS solution provides

  • Call server + Media Server that can be interacted with via UA
  • Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
  • APIs that can trigger automated calls and perform preprogrammed routing.
  • Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
  • Some CPaaS providers also add features like transcribing ,transcoding, recording , playback etc to provide edge over other CPaaS providers

Self hosted Datacentre vs Cloud server

Self hosted in DatacentreCloud server
Cost(-) Self-hosted datacenters can be more expensive to set up and maintain, as they require the purchase of hardware and ongoing maintenance costs.
(+) no monthly recurring fees to cloud vendors
(+) pay as you go
Scalability(-) maintenance of racks and servers
(-) requires planning for high availability and geographical deployment for redundancy
(+) no stress on resource management like cooling, rack space , wiring etc
(+) easy to setup
Reliability(-) limited to a single location and can be affected by local issues such as power outages.Cloud providers typically have multiple data centers and will automatically route traffic.
(-) outages in cloud infrastructures datacentre could lead to service disruption
Control and Security(+) more controlled for security or access(-) not in premise, security can be provisonoed by not in control

Cloud-based infrastructure 

Cloud Services as Amazon Web service, Google Cloud, Microsoft Azure, IBM Cloud, Digital Ocean is great resources to host the multiple parts of a CPaaS system such as gateways, media servers, SIP Application servers, other servers for microservices including accounting, profile management, rest services etc. Often virtualized machines ( VMs) mounted on a larger physical remote datacentre are an ideal choice for VoIP and cloud communication providers.

Self hosted / inpremises Servers / private cloud

Marinating datacentre provides flexibility to extend and or develop tightly controlled use cases. It is often a requirement for secure communication platforms pertaining to government or banking communications such as turret phones.

Some approaches are to set up the server with Openstack to manage SDN ( software-defined network). Other approaches also involve VMWare to virtualize servers and then using docker container-managed via Kubernetes to dynamically spawn instances of server as load scaled up or down.

Using existing SDK vs building your own RTC platform from scratch

I have come across so many small size startups trying to build CPaaS solutions from scratch but only realising it after weeks of trying to build an MVP that they are stuck with firewall, NAT, media quality or interoperability issues. Since there are so many solutions already out in the market it is best to instead use them as an underlying layer and build applications services using it such as call centre or CRM services making custom wrappers.

Tech insights and experiences

Companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday.

Keeping up with emerging trends

Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP, other ML based enhancements are provided by CPaaS company and would potentially try to abstrct away the implementation details from their SDK users or clients.

Auto Scaling, High Availability

A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure


Using a CPaaS saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models.

Call Rate charging and Accounting Services

Call Rates are very critical for billing and charging the users. Any updates from the customer or carriers or individuals need to propagate automatically and quickly to avoid discrepancies and negative margins.

CDR ( Call Detail Record ) processing pipeline

CDRs need to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows.
CDR can also be used for a wide variety of analytics including correlations, aggregations, filtering, and sampling.

Updating rate sheet ( charges per call or per second )

The following setup is ideal to use the new input rate sheet values via web UI console or POST API and propagate it quickly to the main DB via a queuing system such as SQS. Serverless operations such as using AWS lambda can be used via a trigger-based system for any updates. This ensures that any new input rates are updated in realtime and maintain fallback values in separate storage as s3 bucket too

CDR (Call Detail Record ) managemenet, billing, dispute management

In current Voip scenarios a call may be passing thorugh various telco providers , ISP and cloud telephony serviIn current VoIP scenarios, a call may be passing through various telco providers, ISP and cloud telephony service providers where each system maintains its own call records and billing. This in my opinion is duplication and missing a single source of truth. A decentralized, reliable and consistent data store via blockchain coudl potentially maintain the call records making then immutable and non diputable. Some more details on the concept are in the article below.

Kamailio WebRTC SIP Server

The purpose of this article is to demo the process of using Kamailio + RTP Engine to enable SIP-based WebRTC call to a traditional SIP UA like Xlite. Kamailio Will thus provide not only call routing but also NATing, TLS and WebSocket support for webrtc endpoints. For this bridging of SRTP from WebRTC endpoint like JSSIP to RTP for SIP UA like Xlite, we will use the RTP engine.


Kamailio’s modules on WebSocket, TLS, NATHelper help it to support WebSocket based SIP which the default Kamailio configuration doesn’t. Snippets from Kamailio config to support webrtc endpoints is below. A full kamailio config is present here https://github.com/altanai/kamailioexamples/blob/master/webrtc_to_webrtc_ws/websocket_tls_webrtc_kamailio.cfg

loadmodule "tm.so"
loadmodule "sl.so"
loadmodule "rr.so"
loadmodule "pv.so"
loadmodule "maxfwd.so"
loadmodule "usrloc.so"
loadmodule "textops.so"
loadmodule "siputils.so"
loadmodule "xlog.so"
loadmodule "sanity.so"
loadmodule "ctl.so"
loadmodule "kex.so"
loadmodule "corex.so"
loadmodule "tls.so"
loadmodule "xhttp.so"
loadmodule "websocket.so"
loadmodule "nathelper.so" 

Configuration of the important modules

TLS module

To provide the SSL support which webrtc endpoints require Certificate Authority CA and provate certs signed by it

creating file paths

mkdir certs
mkdir certs/private
mkdir certs/newcerts
touch certs/index.txt
echo 01 >certs/serial
echo 01 >certs/crlnumber

list the files

/home/ubuntu/certs# ls
crlnumber  index.txt  newcerts  private  serial

Create CA private key

openssl genrsa -out certs/private/cakey.pem 2048
chmod 600 certs/private/cakey.pem

Create CA self signed certificate

openssl req -out certs/cacert.pem -x509 -new -key certs/private/cakey.pem

Create server / client certificate, a private key (by name privkey.pem)

openssl req -out kamailio1_cert_req.pem -new -nodes
openssl ca -in kamailio1_cert_req.pem -out kamailio1_cert.pem

Output should be like

Certificate is to be certified until Jun 25 11:02:41 2020 GMT (365 days)
Sign the certificate? [y/n]:y
1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated

and files genrated shoudl look like

/home/ubuntu# ls 
 certs  kamailio1_cert.pem kamailio1_cert_req.pem privkey.pem

Copy the newly created certs to their respective paths

mkdir /etc/pki/CA/
cp kamailio1_cert.pem /etc/pki/CA/
cp privkey.pem /etc/pki/CA/

Make list of ca certs by finidng all cacerts accross root firectory and appending them to a catlist pem

find / -name cacert.pem
cat /usr/share/doc/libssl-doc/demos/cms/cacert.pem >> /home/ubuntu/catlist.pem
cat /usr/share/doc/libssl-doc/demos/smime/cacert.pem >> /home/ubuntu/catlist.pem
cat /home/ubuntu/kamailio_source_code/misc/tls-ca/rootCA/cacert.pem >> /home/ubuntu/catlist.pem
cp /home/ubuntu/catlist.pem /etc/pki/CA/

update kamailio.cfg

#!ifdef WITH_TLS
modparam("tls", "tls_method", "SSLv23")
modparam("tls", "certificate", "/etc/pki/CA/kamailio1_cert.pem")
modparam("tls", "private_key", "/etc/pki/CA/privkey.pem")
modparam("tls", "ca_list", "/etc/pki/CA/calist.pem")

Websocket module

Websocket is considered a transport option just as TCP or UDP in kamailio config , hence just as one defines IP addr and ports for TCP, UDP protocol , we need to define the same for WS or WSS

#!substdef "!MY_WS_ADDR!tcp:MY_IP_ADDR:MY_WS_PORT!g"
#!substdef "!MY_WSS_ADDR!tls:MY_IP_ADDR:MY_WSS_PORT!g"
#!ifdef WITH_TLS

check if port in R-URI meant for ws or wss, did not receive websocket or secure websocket

if (($Rp == MY_WS_PORT || $Rp == MY_WSS_PORT) && !(proto == WS || proto == WSS)) {
    xlog("L_WARN", "SIP request received on $Rp\n");
    sl_send_reply("403", "Forbidden");

request_route for websocket , included checking is client is behind NAT using nat_uac_test methods from NAThelper. If it is then for REGISTER methods do fix_nated_register and for other add_contact_alias

if (nat_uac_test(64)) {
	# NAT traversal  WebSocket
	if (is_method("REGISTER")) {
	} else {
		if (!add_contact_alias()) {
		xlog("L_ERR", "Error aliasing contact <$ct>\n");
		sl_send_reply("400", "Bad Request");


RTP relay and NAT helps with RTP packets

For detailed steps goto https://telecom.altanai.com/2018/04/03/rtp-engine-on-kamailio-sip-server/


apt-get remove rtpproxy
sudo apt install debhelper iptables-dev libcurl4-openssl-dev libglib2.0-dev libxmlrpc-core-c3-dev libhiredis-dev markdown build-essential:native


git clone https://github.com/sipwise/rtpengine.git
cd rtpengine


rtpengine --interface= --listen-ng=25061 --listen-cli=25062 --foreground --log-stderr --listen-udp=2222 --listen-tcp=25060

Integrate with kamailio using

loadmodule "rtpengine.so"
modparam("rtpengine", "rtpengine_sock", "udp:localhost:7722")
modparam("nathelper", "received_avp", "$avp(s:rcv)")


Like most other WebRTC libraries , JSSIP is event driven and provides provide core WEBRTC API like getUserMedia and RTP PeerConnection providing STUN,ICE,DTLS, SRTP features. It also integrated with rtcninja to provide cross browser accessibility. The differentiators with JSSIP lies in the fact that it supports SIP stack over websockets

JSSIP WebRTC client for kamailio

For tghe JSSIP default cllient UI and library one can use CDN based https://cdnjs.cloudflare.com/ajax/libs/jssip/3.1.2/jssip.min.js or can take a pull from JSSIP repo and build urself using gulp https://github.com/versatica/JsSIP.

Instantiate JSSIP websocket interface with kamailio IP

var socket = new JsSIP.WebSocketInterface('wss://<kamailio_ip>:443');

Add configuration for registeration . Note if not using kamailio as proxy to SBC, it is recommended to add regiseteration features to provide user reachability for incoming calls and NAT pings

 var configuration = {
   sockets  : socket,
   uri      : 'sip:username@example.com',
   password : 'password'

create UA and start

var ua = new JsSIP.UA(configuration);

SIP over WEBSOCKET messages and kamailio processing

REGISTER sip JSSIP UA with user altanai, domina voiptelcom.com

REGISTER sip:voiptelco.com SIP/2.0
Via: SIP/2.0/WSS 830p2l39g8bg.invalid;branch=z9hG4bK242397
Max-Forwards: 69
From: ;tag=3jaad0q8l8
Call-ID: fvn2cd1b9gqh6kd7nqdpj5
Contact: ;+sip.ice;reg-id=1;+sip.instance="";expires=600
Expires: 600
Supported: path,gruu,outbound
User-Agent: JsSIP 3.1.2
Content-Length: 0

Processed by kamailio using REGISTRAR route block

route[REGISTRAR] {
	if (is_method("REGISTER")) {
		if (!save("location")) {
SIP/2.0 200 OK
Via: SIP/2.0/WSS 830p2l39g8bg.invalid;branch=z9hG4bK242397;rport=19035;received=x.x.x.x
To: ;tag=4dad943d40a0a309c33d64467664aa30.f6d3
From: ;tag=3jaad0q8l8
Call-ID: fvn2cd1b9gqh6kd7nqdpj5
Contact: ;expires=600;received="sip:x.x.x.x:19035;transport=ws";pub-gruu="sip:altanai@voiptelco.com;gr=urn:uuid:80fa65e7-1cd7-4e40-bbee-c07f7a1ae9a5";temp-gruu="sip:uloc-5d260578-4b58-c-eeac6bd6@voiptelco.com;gr";+sip.instance="";reg-id=1
Server: kamailio (5.2.3 (x86_64/linux))
Content-Length: 0

INVITE from user1 altanai to john, notice that “To” header doesnt have tag. This will be handy for recognizing whether it is first message of dialog offer and in-dialog message such as ACK , RE-INVITE , BYE etc

INVITE sip:john@voiptelco.com SIP/2.0
Via: SIP/2.0/WSS ipoct61ao12v.invalid;branch=z9hG4bK4220209
Max-Forwards: 69
To: sip:john@voiptelco.com
From: sip:altanai@voiptelco.com;tag=2q0lecmbsn
Call-ID: s8bnv5869fp68d1ju8c1
CSeq: 1799 INVITE
Content-Type: application/sdp
Session-Expires: 90
Supported: timer,gruu,ice,replaces,outbound
User-Agent: JsSIP 3.1.2
Content-Length: 1823

with SDP containing codecs and ICE details. Supporting audio over UDP/ TLS/RTL /SAVPF . Codecs beings

  • 111 OPUS
  • 103 ISAC/16000
  • 104 ISAC/32000
  • 9 G722
  • 0 PCMU / G.711u narrowband
  • 8 PCMA / G.711
  • 106 , 105 , 13 – CN / comfort noise
  • 110 , 112 , 113 , 126 – telephone-event / DTMF
o=- 4779000713447952953 2 IN IP4
t=0 0
a=group:BUNDLE 0
a=msid-semantic: WMS SFIXFrpsOUskJ5JQhp1mIARlDk6S3hVFTOBb
m=audio 55839 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:3802297132 1 udp 2122260223 55839 typ host generation 0 network-id 1 network-cost 10
a=candidate:2887880668 1 tcp 1518280447 9 typ host tcptype active generation 0 network-id 1 network-cost 10
a=fingerprint:sha-256 AB:50:70:E3:57:E3:0C:7B:61:3B:03:5B:0F:54:14:14:9C:49:50:16:07:DC:E7:09:3E:4D:B5:A0:2B:EC:84:A1
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:SFIXFrpsOUskJ5JQhp1mIARlDk6S3hVFTOBb e803836d-249a-4b81-b73f-17e0f08dde5a
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:2800821831 cname:kLWViBLDLVfCXY8x
a=ssrc:2800821831 msid:SFIXFrpsOUskJ5JQhp1mIARlDk6S3hVFTOBb e803836d-249a-4b81-b73f-17e0f08dde5a
a=ssrc:2800821831 mslabel:SFIXFrpsOUskJ5JQhp1mIARlDk6S3hVFTOBb
a=ssrc:2800821831 label:e803836d-249a-4b81-b73f-17e0f08dde5a

100 trying from callee, note the to and from headers remain same for request or responses. This is send automatically by kamailio for INVITE.

SIP/2.0 100 trying -- your call is important to us
Via: SIP/2.0/WSS ipoct61ao12v.invalid;branch=z9hG4bK4220209;rport=17502;received=x.x.x.x
To: sip:john@voiptelco.com
From: sip:altanai@voiptelco.com;tag=2q0lecmbsn
Call-ID: s8bnv5869fp68d1ju8c1
CSeq: 1799 INVITE
Server: kamailio (5.2.3 (x86_64/linux))
Content-Length: 0

180 ringing from Callee, note the addition of contact header

SIP/2.0 180 Ringing
Via: SIP/2.0/WSS ipoct61ao12v.invalid;rport=17502;received=x.x.x.x;branch=z9hG4bK4220209
To: sip:john@voiptelco.com;tag=pvm73e3t89
From: sip:altanai@voiptelco.com;tag=2q0lecmbsn
Call-ID: s8bnv5869fp68d1ju8c1
CSeq: 1799 INVITE
Contact: sip:john@voiptelco.com;alias=x.x.x.x~17510~6;gr=urn:uuid:2e560b36-3ea8-41fd-80e3-ede66babb8a7
Supported: timer,gruu,ice,replaces,outbound
Content-Length: 0

200 OK with SDP

SIP/2.0 200 OK
Via: SIP/2.0/WSS ipoct61ao12v.invalid;rport=17502;received=x.x.x.x;branch=z9hG4bK4220209
To: sip:altanai@voiptelco.com;tag=pvm73e3t89
From: sip:john@voiptelco.com ;tag=2q0lecmbsn
Call-ID: s8bnv5869fp68d1ju8c1
CSeq: 1799 INVITE
Contact: sip:john@voiptelco.com;alias=x.x.x.x~17510~6;gr=urn:uuid:2e560b36-3ea8-41fd-80e3-ede66babb8a7
Session-Expires: 90;refresher=uas
Supported: timer,gruu,ice,replaces,outbound
Content-Type: application/sdp
Content-Length: 1477
o=- 4562215268128860297 2 IN IP4
t=0 0
a=group:BUNDLE 0
a=msid-semantic: WMS oxU77z9z9RfNL4CayvM1cMJKI0r7u6ZdqLBd
m=audio 55380 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:3802297132 1 udp 2122260223 55380 typ host generation 0 network-id 1 network-cost 10
a=fingerprint:sha-256 8A:F9:BE:8D:8A:80:FF:8C:89:3D:3A:D2:A1:36:B2:EC:11:53:81:7E:F4:53:E7:40:1E:B9:1E:A2:0F:D4:EA:2E
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:oxU77z9z9RfNL4CayvM1cMJKI0r7u6ZdqLBd cb6da6b5-d5b8-460e-88bc-1458ebc718e6
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3489110087 cname:oZ2LjwJD385qPHHH

Kamailio handling replies using reply_route

onreply_route {
     if (nat_uac_test(64)) {

Sending ACK.

ACK sip:john@voiptelco.com;alias=x.x.x.x~17510~6;gr=urn:uuid:2e560b36-3ea8-41fd-80e3-ede66babb8a7 SIP/2.0

Via: SIP/2.0/WSS ipoct61ao12v.invalid;branch=z9hG4bK1876245
Max-Forwards: 69
To: sip:altanai@voiptelco.com ;tag=pvm73e3t89
From: sip:john@voiptelco.com;tag=2q0lecmbsn
Call-ID: s8bnv5869fp68d1ju8c1
CSeq: 1799 ACK
Supported: outbound
User-Agent: JsSIP 3.1.2
Content-Length: 0

since ACK is a Within dialog message and sequential request withing a dialog should take the path determined by record-routing, we first check if it has to tag. Having a to tag validates that it is a in-dialog request .

After this validate if is loose_route() and has no destination URI $du , then try to add rui alias using handle_ruri_alias( ), if that fails, reject the request.

If it is not loose_route() and method is ACK then check if the ACK matches a transaction t_check_trans() ie is stateful. If it is then relay otherwise reject.

route[WITHINDLG] {
	if (has_totag()) {
		if (loose_route()) {
			if ($du == "") {
				if (!handle_ruri_alias()) {
				xlog("L_ERR", "Bad alias <$ru>\n");
				sl_send_reply("400", "Bad Request");
		} else {
			if ( is_method("ACK") ) {
				if ( t_check_trans() ) {
				} else {
			sl_send_reply("404", "Not Found");

Also required to convert ICE packet fromWebRTC to non ICE for Xlite.

Resources :

Session Border Controller (SBC) for WebRTC

  • B2BUA
  • Features
    • Security
      • Topology hiding
    • Connectivity
      • Least Cost Routing based on MoS
      • Protocol translations
      • Automatic Rerouting
    • QoS
    • Regulatory
    • Media services
      • NAT
    • Statistics and billing information
  • Gateways vs SBC
  • Building a SBC

Unified communication services build around WebRTC should be vendor agnostic and multi-tenant and be supported by other Communication Service Providers (CSPs), SIP trunks, PBXs, Telecom Equipment Manufacturers (TEMs), and Communication Platform as a Service (CPaaS). This can happen if all endpoints adhere to SIP standards in most updated RFC. However since not all are on the boat , Session border controllers are a great way to mitigate the differences and provide seamless connectivity to signalling and media , which could be between WebRTC, SIP or PSTN, from TDM to IP .

Session Border Controllers ( SBC )  assist in controlling the signalling and usually also the media streams involved in calls and sessions. They are often part of a VOIP network on the border where there are 2 peer networks of service providers such as backbone network and access network of corporate communication system which is behind firewall.

A more complex example is that of a large corporation where different departments have security needs for each location and perhaps for each kind of data. In this case, filtering routers or other network elements are used to control the flow of data streams. It is the job of a session border controller to assist policy administrators in managing the flow of session data across these borders.

– wikipedia

SBC act like a SIP-aware firewall with proxy/B2BUA.

What is B2BUA?

A Back to back user agent ( B2BUA ) is a proxy-like server that splits a SIP transaction in two pieces:

  • on the side facing User Agent Client (UAC), it acts as server;
  • on the side facing User Agent Server (UAS) it acts as a client.

SBC mostly have public url address  for teleworkers and a internal IP for enterprise/ inner LAN . This enables users connected to enterprise LAN ( who do not have public address ) to make a call to user outside of their network. During this process SBC takes care of following while relaying packets .

  1. Security
  2. Connectivity
  3. Qos
  4. Regulatory
  5. Media Services
  6. Statistics and billing information

Explaining the functions of SBC in detail

1. Security

SBCs provide security features such as encryption, authentication, and firewall capabilities to protect the network from unauthorized access and attacks. SBCs are often used by corporations along with firewalls and intrusion prevention systems (IPS) to enable VoIP calls to and from a protected enterprise network. VoIP service providers use SBCs to allow the use of VoIP protocols from private networks with Internet connections using NAT, and also to implement strong security measures that are necessary to maintain a high quality of service. The security features includes :

  • Prevent malicious attacks on network such as DOS, DDos.
  • Intrusion detection
  • cryptographic authentication
  • Identity/URL based access control
  • Blacklisting bad endpoints
  • Malformed packet protection
  • Encryption of signaling (via TLS and IPSec) and media (SRTP)
  • Stateful signalling and Validation
  • Toll Fraud – detect who is intending to use the telecom services without paying up

Topology hiding

SBC hides and anonymize secure information like IP ports before forwarding message to outside world . This helps protect the internal node of Operators such as PSTN gateways or SIP proxies from revealing outside.

2. Connectivity

As SBC offers IP-to-IP network boundary, it recives SIP request from users like REGISTER , INVITE  and routes them towards destination, making their IP. During this process it performs various operations like

  • NAT traversal
  • IPv4 to IPv6 inter-working
  • VPN connectivity
  • SIP normalization via SIP message and header manipulation
  • Multi vendor protocol normalization

Further Routing features includes  :

Least Cost Routing based on MoS ( Mean Opinion Score ) : Choosing a path based on MoS is better than chooisng any random path . 

Protocol translations SBCs can bridge WebRTC calls with other communication protocols such as SIP, H.323, and PSTN to enable communication between different systems and networks.

In essence SBC achieve interoperability, overcoming some of the problems that firewalls and network address translators (NATs) present for VoIP calls.

Automatic Rerouting

Connectivity loss from UA for whole branch is detected by timeouts . But they can also be detected by audio trough SIP OPTIONS by SBC .  In such connectivity loss , SBC decides rerouting or sending back 504 to caller .

SBC 2 (1)

4. QoS

To introduce performance optimization and business rules in call management QoS is very important. This includes the following:

  • Traffic policing
  • Resource allocation
  • Rate limiting
  • Call Admission Control (CAC)
  • ToS/DSCP bit setting
  • Recording and Audit of messages , voice calls , files

System and event logging

SBCs can log call information and statistics, and provide real-time monitoring capabilities to troubleshoot and diagnose issues with WebRTC calls.

5. Regulatory

Govt policies ( such as ambulance , police ) and/ or enterprise policies may require some calls to be holding priority over others . This can also be configured under SBC as emergency calls and prioritization.

Some instances may require communication provider to comply with lawful bodies and provide session information or content , this is also called as Lawful interception (LI) . This enables security officials to collect specific information rather than examining all the traffic that passes through a particular router. This is also part of SBC.

6. Media services

Many of the new generation of SBCs also provide built-in digital signal processors (DSPs) to enable them to offer border-based media control and services such as- DTMF relay , Media transcoding , Tones and announcements etc.

WebRTC enabled SBC’s also provide conversion between DTLS-SRTP, to and from RTCP/RTP. Also transcoding for Opus into G7xx codecs and ability to relay VP8/VP9 and H.264 codecs.

Network Address Translation (NAT)

SBCs can handle Network Address Translation (NAT) to allow WebRTC clients behind a NAT to connect to other clients outside of the NAT.

7. Statistics and billing information

SBC have an interface with and OSS/BSS systems for billing process , as almost all traffic that pass through the edge of the network passes via SBC. For this reason it is also used to gather Statistics and usage-based information like bandwidth, memory and CPU.  PCAP traces of both signaling and media information of specific sessions .

New feature rich SBCs also have built-in digital signal processors (DSPs). Thus able to provide more control over session’s media/voice. They also add services like Relay and Interworking, Media Transcoding, Tones and Announcements, DTMF etc.

SBCs act as a security gateway and traffic manager for WebRTC sessions, ensuring that the communication is secure, of good quality, and can traverse different networks and protocols.

Session Border Controller (SBC)
Session Border Controller for WebRTC , SIP , PSTN , IP PBX and Skype for business .

Diagram Component Description

Gateways vs SBC

Gateways provide compression or decompression, control signaling, call routing, and packetizing.

PSTN Gateway : Converts analog to VOIP and vice versa . Only audio no support for rich multimedia .

VOIP Gateway : A VoIP Gateway acts like a translator converting digital telecom lines to VoIP . VOIP gateway often also include voice and fax. They also have interfaces to Soft switches and network management systems.

WebRTC Gateway : They help in providing NAT with ICE-lite and STUN connectivity for peers behind policies and Firewall .

SIP trunking : Enterprises save on significant operation cost by switching to IP /SIP trunking in place of TDM (Time Division Multiplexing). Read more on SIP trunk and VPN  here. 

SIP Server : A Telecom application server ( SIP Server ) is useful for building VAS ( Value Added Services ) and other fine grained policies on real time services . Read more on SIP Servers here . 

VOIP/SIP service Provider :   There are many Worldwide SIP Service providers such as Verizon in USA , BT in europe, Swisscom in Switzerland etc .

Building a SBC

The latest trends in Telecommunications industry demand an open standardized SBC to cater to growing and large array of SIP Trunking, Unified Multimedia Communications UC&C, VoLTE, VoWi-Fi, RCS and OTT services worldwide . Building an SBC requires that it meet the following prime requirements :

  • software centric
  • Cloud Deploybale
  • Rich multimedia (audio , video , files etc) processing
  • open interfaces
  • The end product should be flexible to be deployed as COTS ( Commercial Off the shelf) product or as a virtual network function in the NFV cloud.
  • Multi Configuration , should be supported such as Hosted or Cloud deployed .
  • Overcome inconsistencies in SIP from different Vendors
  • Security and Lawful Interception
  • Carrier Grade Scaling

Flow Diagram 


Thus we see how SBC became important part of comm systems developed over SIP and MGCP. SBC offer B2BUA ( Back to Back user agent) behavior to control both signalling and media traffic.

Setting up ubuntu ec2 t2 micro for webrtc and socketio

Setting up a ec2 instance on AWS for web real time communication platform over nodejs and socket.io using WebRTC.

Primarily a Web Call  , Chat and conference platform uses WebRTC for the media stream and socketio for the signalling . Additionally used technologies are nosql for session information storage , REST Apis for getting sessions details to third parties.

Below is a comprehensive setup if ec2 t2.micro free tier instance, installation with a webrtc project module and samples of customisation and usage .

Technologies used are listed below :


  1. ec2 instance t2.micro covered under free tier
  2. domain name
  3. SSL certificate

Core module for Web Calling feature

  1. WebRTC
  2. Node.js
  3. socket.io

UI components

  1. javascript
  2. css
  3. html5
  4. bootstrap
  5. jquerry

Supporting setup for session management

  1. Code version-ing  and maintenance
  2. git
  3. npm

Sample Project https://github.com/altanai/webrtcdevelopment

Amazon’s free tier ec2

Amazon EC2 : These are elastic compute general purpose storage servers that mean that they can resize the compute capacity in the cloud based on load . 750 hours per month of Linux, RHEL, or SLES t2.micro instance usage. Expires 12 months after sign-up.

Some other products are also covered under free tier which may come in handy for setting up the complete complatorm. Here is a quick summary

Amazon S3 : it is a storage server. Can be used to store media file like image s, music , videos , recorded video etc .

Amazon RDS : It a relational database server . If one is using mysql or postgress for storing session information or user profile data . It is good option .

Amazon SES : email service. Can be used to send invites and notifications to users over mail for scheduled sessions or missed calls .

Amazon CloudFront : It is a CDN ( content delivery network ) . If one wants their libraries to be widly available without any overheads . CDN is a good choice .

Alternatively any server from Google cloud , azure free tier or digital ocean or even heroku can be used for WebRTC code deployment . Note that webrtc capture now requires htps in domain name.

Server Setup

Set up environment by installing nvm  , npm  and git ( source version control)

1. NVM ( node version manager )


curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.31.1/install.sh | bash

or Wget:

wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.37.2/install.sh | bash

To check installation

command -v nvm

2. NPM( node package manager)

sudo apt-get install npm
Screenshot from 2016-05-16 12-41-42

2. Git

sudo apt-get install git
Screenshot from 2016-05-17 11-25-01

 SSL certificates

Since 2015 it has become mandatory to have only https origin request WebRTC’s getUserMedia API ie Voice, video, geolocation , screen sharing require https origins.
Note that this does not apply to case where its required to only serve peer’s media Stream or using Datachannels . Voice, video, geolocation , screen sharing now require https origins

For A POC purpose here is th way of generating a self signed certificate
Transport Layer Security and/or Secure Socket Layer( TLS/SSL) is a public/private key infrastructure.Following are the steps

1.create a private key

 openssl genrsa -out webrtc-key.pem 2048

2.Create a “Certificate Signing Request” (CSR) file

 openssl req -new -sha256 -key webrtc-key.pem -out webrtc-csr.pem

3.Now create a self-signed certificate with the CSR,

 openssl x509 -req -in webrtc-csr.pem -signkey webrtc-key.pem -out webrtc-cert.pem

However in production or actual implementation it is highly recommended to use a signed certificate by CA as For examples include

Web Server

create https certificate using self generate or purchased SSL certificates using fs , node-static and https modules . To know how to create self generated SSL certificates follow section above on SSL certificates.

var fs = require(‘fs’);
var _static = require(‘node-static’);
var https = require(‘https’);

var file = new _static.Server(&amp;amp;amp;amp;amp;amp;amp;quot;./&amp;amp;amp;amp;amp;amp;amp;quot;, {
cache: 3600,
gzip: true,
indexFile: &amp;amp;amp;amp;amp;amp;amp;quot;index.html&amp;amp;amp;amp;amp;amp;amp;quot;

var options = {
key: fs.readFileSync(‘ssl_certs/webrtc-key.pem’),
cert: fs.readFileSync(‘ssl_certs/webrtc-cert.pem’),
ca: fs.readFileSync(‘ssl_certs/webrtc-csr.pem’),
requestCert: true,
rejectUnauthorized: false

var app = https.createServer(options, function(request, response){
request.addListener(‘end’, function () {
file.serve(request, response);


Web servers work with the HTTP (and HTTPS) protocol which is TCP based. As a genral rule TCP establishes connection whereas UDP send data packets

Scoketio signalling server as npm

Socket.io determines which of the following real-time communication method is suited to the particular client and its network bandwidth .

  • WebSocket
  • Adobe Flash Socket
  • AJAX long polling
  • AJAX multipart streaming
  • Forever Iframe
  • JSONP Polling

The socket.io server needs a HTTP Server for initial handshake. The general steps for socketio signalling server

1.require socket.io and keep the reference

var io = require(‘socket.io’)

2.Create your http / https server outline in section on webserver

3.bind your http and https servers (.listen)

io.listen(app, {
    log: false,
    origins: ‘*:*’

4. Optionally set transport

io.set(‘transports’, [

5.setup io events

io.sockets.on(‘connection’, function (socket) {
//Do domething

Note that Socket.io or websockets require an http server for the initial handshake.

Install ssocketio npm module

npm install socket.io

Complete code for signalling server

const io = require("socket.io")
    .listen(app, {
        log: false,
        origins: "*:*"
io.set("transports", ["websocket"])

var channels = {};

io.sockets.on("connection", function (socket) {

    var initiatorChannel = "";

    if (!io.isConnected) {
        io.isConnected = true;

    socket.on("namespace", function (data) {
        onNewNamespace(data.channel, data.sender);

    socket.on("new-channel", function (data) {
        if (!channels[data.channel]) {
            initiatorChannel = data.channel;
        console.log("new channel", data.channel, "by", data.sender)
        channels[data.channel] = {
            channel: data.channel,
            users: [data.sender]


    socket.on("join-channel", function (data) {
        console.log("Join channel", data.channel, "by", data.sender)

    socket.on("presence", function (channel) {
        var isChannelPresent = !!channels[channel.channel];
        console.log("presence for channel ", isChannelPresent)
        socket.emit("presence", isChannelPresent)

    socket.on("disconnect", function (channel) {
        // handle disconnected event

    socket.on("admin_enquire", function (data) {
        switch (data.ask) {
            case "channels":
                socket.emit("response_to_admin_enquire", channels)
            case "channel_clients":
                socket.emit("response_to_admin_enquire", io.of("/" + data.channel).clients());
            default :
                socket.emit("response_to_admin_enquire", channels)

function onNewNamespace(channel, sender) {
    console.log("onNewNamespace", channel);
    io.of("/" + channel).on("connection", function(socket) {
        var username;
        if (io.isConnected) {
            io.isConnected = false;
            socket.emit("connect", true)
        socket.on("message", function (data) {
            if (data.sender == sender) {
                if(!username) username = data.data.sender;
                socket.broadcast.emit("message", data.data)
        socket.on("disconnect", function() {
            if(username) {
                socket.broadcast.emit("user-left", username)
                username = null;

WebRTC main HTML5  project

This is the front  end section of the whole exercise . It contains JavaScript , css and html5 to make a webrtc call

<body id="pagebody">
<div id="elementToShare" className="container-fluid">
    <!-- ................................ top panel ....................... -->
    <div className="row topPanelClass">
        <div id="topIconHolder">
            <ul id="topIconHolder_ul">
                <li hidden><span id="username" className="userName" hidden>a</span></li>
                <li hidden><span id="numbersofusers" className="numbers-of-users" hidden></span></li>
                <li><span id="HelpButton" className="btn btn-info glyphicon glyphicon-question-sign topPanelButton"
                          data-toggle="modal" data-target="#helpModal"> Help </span></li>
    <!-- .............alerts................. -->
    <div className="row" id="alertBox" hidden="true"></div>
    <!-- .......................... Row ................................ -->
    <div className="row thirdPanelClass">
        <div className="col-xs-12 videoBox merge" id="videoHold">
            <div className="row users-container merge" id="usersContainer">
                <div className="CardClass" id="card">

                    <!-- when no remote -->
                    <div id="local" className="row" hidden="">
                        <video name="localVideo" autoPlay="autoplay" muted="true"/>
                    <!-- when remote is connected -->
                    <div id="remote" className="row" style="display:inline" hidden>
                        <div className="col-sm-6 merge" className="leftVideoClass" id="leftVideo">
                            <video name="video1" hidden autoPlay="autoplay" muted="true"></video>
                        <div className="col-sm-6 merge" className="rightVideoClass" id="rightVideo">
                            <video name="video2" hidden autoPlay="autoplay"></video>
    <!--modal help -->
    <div className="modal fade" id="helpModal" role="dialog">
        <div className="modal-dialog modal-lg">
            <div className="modal-content">
                <div className="modal-header">
                    <button type="button" className="close" data-dismiss="modal">&times;</button>
                    <h4 className="modal-title">Help</h4>
                <div className="modal-body">
                    WebRTC Runs in only https due to getusermedia security contraints
                <div className="modal-footer">
                    <button type="button" className="btn btn-default" data-dismiss="modal">Close</button>

the document start script that invokes the JS script

$('document').ready(function () {

    sessionid = init(true);

    var local = {
        localVideo: "localVideo",
        videoClass: "",
        userDisplay: false,
        userMetaDisplay: false

    var remote = {
        remotearr: ["video1", "video2"],
        videoClass: "",
        userDisplay: false,
        userMetaDisplay: false

    webrtcdomobj = new WebRTCdom(
        local, remote

    var session = {
        sessionid: sessionid,
        socketAddr: "https://localhost:8084/"

    var webrtcdevobj = new WebRTCdev(session, null, null, null);


Screenshot from 2016-05-17 12-12-37.png

Common known issues:

1.Opening page https://<web server ip>:< web server port>/index.html says insecure

This is beacuse the self signed certificates produced by open source openSSL is not recognized by a trusted third party Certificate Agency.
A CA ( Certificate Authority ) issues digital certificate to certify the ownership of a public key for a domain.

To solve the access issue goto https://<web server ip>:< web server port> and given access permission such as outlined in snapshot below


2.Already have given permission to Web Server , page loads but yet no activity .

if you open developer console ( ctrl+shift+I on google chrome ) you will notice that there migh be access related errros in red .
If you are using different server for web server and signalling server or even if same server but different ports you need to explicity go to the signalling server url and port and give access permission for the same reason as mentione above.

3.no webcam capture on opening the page

This could happen due to many reasons

  •  page is not loaded on https
  • browser is not webrtc compatible
  • Media permission to webcam are blocked
  • the machine does have any media capture devices attached
  •  Driver issues in the client machine while accessing webcams and mics .

4.socketio + code: 0, message: “Transport unknown”

Due to the version  v1.0.x of socket.io while performing handshake . To auto correct this , downgrade to v0.9.x


Bot to clean roads and outdoors for a better and cleaner India. It lifts up small objects like plastic cups,wrappers,leaves etc.

ramudroid image.png

The droid also provides real-time camera stream and detects obstruction to re-route itself. It can communicates over GSM ,wifi and BLE . It can also be remote navigated via browsers or android.

Working :

stages of ramudroid

  1. Litter comes between rotating brushes
  2. Litter is picked by brushes and pushed upwards  
  3. Brushes push it towards the tray

Is is inspired by Swach Bharat Abhiyaan in India , its an effort to contribute to society and welfare and well-being through technology . Following are some diagrams for the current and the previous versions , along with major delta points .

RamuDroid v1.0

Remote Streaming and movement via motors switched manually. Communication over Ethernet.


Dashboard /console Screen

RamuDroid v1  console


Cleaning garbage on public roads and outdoors through robot . Remote navigation and control of control through web page and camera live streaming .

Ramudroid compoenet diagram v5

v3 web console

Screenshot from 2015-12-03 08:55:27.png


Clean roads , pick up litter ( wrappers, leaves , cups , plastics bits etc ) . communicated over BLE ,Wifi and 3G n/w . Auto buzzer when meet with an obstruction in way . Flash Lights . Enhanced Design .

Ramudroid 6.5 componet diagram


Web Dashboard:

Screenshot from 2016-03-19 04-28-53.png

Pin Diagram associated with activities

Pin Pin 0 Pin 1 Pin 2 Pin 3 Pin 4 Pin 5 Pin 6 Pin 7
Front 0 1 0 1 1 1 1 1
Back 1 0 1 0 1 1 1 1
Left 1 0 1 0 1 1 1 1
Right 1 0 1 0 1 1 1 1
Brushes ON 1 0 1 0 1 1 1 1
Brushes OFF 1 0 1 0 1 1 1 1
Lift ON 1 0 1 0 1 1 1 1
Life OFF 1 0 1 0 1 1 1 1


Github : https://github.com/altanai/m2mcommunication

Slideshare :

Twitter :https://twitter.com/search?q=%23ramudroid


AR/VR on WebRTC WebGL , Three.js and WebRTC

For the last couple of weeks , I have been working on the concept of rendering 3D graphics on WebRTC media stream using different JavaScript libraries as part of a Virtual Reality project .

What is Augmented Reality ?

Augmented reality (AR) is viewing a real-world environment with elements that are supplemented by computer-generated sensory inputs such as sound, video, graphics , location etc.

How is AR diff. from VR(Virtual Reality) ?

Virtual RealityAugmented Reality
replaces the real world with simulated one , user is isolated from real life , Examples – Oculus Rift & Kinectblending of virtual reality and real life , user interacts with real world through digital overlays , Examples – Google glass & Holo Lens

Methods for rendering augmented Reality

  • Computer Vision
  • Object Recognition
  • Eye Tracking
  • Face Detection and substitution
  • Emotion and gesture picker
  • Edge Detection

Web based Augmented Reality platform building has for a Web base Components for end-to-end AR solution such as WebRTC getusermedia , Web Speech API, css, svg, HTML5 canvas, sensor API. Hardware components that can include Graphics driver, media capture devices such as microphone and camera, sensors. 3D Components like Geometry and Math Utilities, 3D Model Loaders and models, Lights, Materials,Shaders, Particles, Animation.

WebRTC (Web based Real Time communications)

Browser’s media stream and data. Standardization , on a API level at the W3C and at the protocol level at the IETF. WebRTc enables browser to browser applications for voice calling, video chat and P2P file sharing without plugins.Enables web browsers with Real-Time Communications (RTC) capabilities.

Code snippet for WebRTC API

1.To begin with WebRTC we first need to validate that the browser has permission to access the webcam. Find out if the user’s browser can use the getUserMedia API.

function hasGetUserMedia() {
	return !!(navigator.webkitGetUserMedia);
  1. Get the stream from the user’s webcam.
var video = $('#webcam')[0];
if (navigator.webkitGetUserMedia) {
			{audio:true, video:true},
			function(stream) { video.src = window.webkitURL.createObjectURL(stream);  },
			function(e) {	alert('Webcam error!', e); }

Screenshot AppRTC


Augmented Reality in WebRTC Browser - Google Slides

End to End RTC Pipeline for AR


  • Web Graphics Library
  • JavaScript API for rendering interactive 2D and 3D computer graphics in browser
  • no plugins
  • uses GPU ( Graphics Processing Unit ) acceleration
  • can mix with other HTML elements
  • uses the HTML5 canvas element and is accessed using Document Object Model interfaces
  • cross platform , works on all major Desktop and mobile browsers

WebGL Development

To get started you should know about :

  • GLSL, the shading language used by OpenGL and WebGL
  • Matrix computation to set up transformations
  • Vertex buffers to hold data about vertex positions, normals, colors, and textures
  • matrix math to animate shapes

Cleary WebGL is bit tough given the amount of careful coding , mapping and shading it requires .

Proceeding to some JS libraries that can make 3D easy for us .






emotion & gesture-based arpeggiator and synthesizer


MIT license javascript 3D engine ie ( WebGL + more).

3D space with webcam input as texture

Display the video as a plane which can be viewed from various angles in a given background landscape. Credits for below code : https://stemkoski.github.io/Three.js/

1.Use code from slide 10 to get user’s webcam input through getUserMedia

  1. Make a Screen , camera and renderer as previously described

  2. Give orbital CONTROLS for viewing the media plane from all angles
	controls = new THREE.OrbitControls( camera, renderer.domElement );

Make the FLOOR with an image texture

[sourcecode language="html"]
	var floorTexture = new THREE.ImageUtils.loadTexture( 'imageURL.jpg' );
	floorTexture.wrapS = floorTexture.wrapT = THREE.RepeatWrapping;
	floorTexture.repeat.set( 10, 10 );
	var floorMaterial = new THREE.MeshBasicMaterial({map: floorTexture, side: THREE.DoubleSide});
	var floorGeometry = new THREE.PlaneGeometry(1000, 1000, 10, 10);
	var floor = new THREE.Mesh(floorGeometry, floorMaterial);
	floor.position.y = -0.5;
	floor.rotation.x = Math.PI / 2;

6. Add Fog

scene.fog = new THREE.FogExp2( 0x9999ff, 0.00025 );

7.Add video Image Context and Texture.

video = document.getElementById( 'monitor' );
videoImage = document.getElementById( 'videoImage' );
videoImageContext = videoImage.getContext( '2d' );
videoImageContext.fillStyle = '#000000';
videoImageContext.fillRect( 0, 0, videoImage.width, videoImage.height );
videoTexture = new THREE.Texture( videoImage );
videoTexture.minFilter = THREE.LinearFilter;
videoTexture.magFilter = THREE.LinearFilter;
var movieMaterial=new THREE.MeshBasicMaterial({map:videoTexture,overdraw:true,side:THREE.DoubleSide});
var movieGeometry = new THREE.PlaneGeometry( 100, 100, 1, 1 );
var movieScreen = new THREE.Mesh( movieGeometry, movieMaterial );
  1. Set camera position
  1. Define the render function
    videoImageContext.drawImage( video, 0, 0, videoImage.width, videoImage.height );
    renderer.render( scene, camera );
  1. Animation
   requestAnimationFrame( animate );
Augmented Reality in WebRTC Browser - Google Slides (4)


WASM ( Web Assembly) is portable binary-code format and a corresponding text format. It is used for facilitating interactions between c++ programs and their host environment such as Javascript code into the browsers.

Emscripten is a one of the compiler toolchain and is a way to compile C++ into WASM.

GPL support for AR

Compiler/CUDA/OpenGL/Vulcan/Graphics/Fortran/GPGPU Developer.

Web media APIs like MSE (Media Source Extensions) and EME (Encrypted Media Extensions)

AR Processing pipeline 

credits : Media Pipe Google AI

On deveice Machine Learning piipeline consists of a platform solution such as MediaPipe above along with WASM. The WASM SIMD (Single instruction, multiple data for parallel process ) ML inerface can be XNNPACK or any other mobile platform based neural network inference framework. This is followerd by rendering.

GPU Accelerated segmentation ( WebGL) outperforms CPU Segmentation using WASM SIMD by talking the latency down from ~8.7ms to ~4.3 ms. Novel WebGL interface can via optimized fragment shaders using MRT.

Credits Intel Video analytics pipeline p7

GStreamer Video Analytics

Ref :

WebRTC Live Stream Broadcast

WebRTC has the potential to drive the Live Streaming broadcasting area with its powerful no plugin , no installation , open standard  policy. However the only roadblock is the VP8 codec which differs from the traditional H264 codec that is used by almost all the media servers, media control units , etc .

This post is first in the series of building a WebRTC based broadcasting solution. Note that a p2p session differs from a broadcasting session as Peer-to-peer session applies to bidirectional media streaming where as broadcasting only applies unidirectional media flow.

Describing some Scalable Broadcasting and Live streaming approaches below:

1. WebRTC multi peers

Since WebRTC is p2p technology , it is convenient to build a  network of webrtc client viewers which can pass on the stream to 3 other peers in different session. In this fashion a fission chain like structure is created where a single stream originated to first peer is replicated to 3 others which is in turn replicated to 9 peers etc .

WebRTC Scalable Streaming Server -WebRTC multi peers
WebRTC Scalable Streaming Server -WebRTC multi peers

Advantages :

  1. Scalable without the investment of media servers
  2. No additional space required at service providers network .

Disadvantage :

  1. The entire set of end clients to a node get disconnected if a single node is broken .
  2. Since sessions are dynamically created , it is difficult to maintain a map with fallback option in case of service disruption from any single node .
  3. Client incur bandwidth load of 2 Mbps( stream incoming peer ) incoming and 6 Mbps ( for 3 connected peers ) outgoing data .

2. Torrent based WebRTC chain

To over come the shortcoming of previous approach of  tree based broadcasting , it is suggested to use a chained broadcasting mechanism .

WebRTC Scalable Streaming Server v1 (4)
WebRTC Scalable Streaming Server- Single chain connection

To improvise on this mechanism for incresing efficieny for slow bandwidth connections we can stop their outgoing stream converting them to only consumers . This way the connection is mapped and arranged in such a fashion that every alternate peer is connected to 2 peers  for stream replication. The slow bandwidth clients can be attached as independent endpoints . WebRTC Scalable Streaming Server v1 (3)

3. WebRTC Relay nodes for multiple peers

The aim here is to build a career grade WebRTC stream broadcasting platform , which is capable of using the WebRTC’s mediastream and peerconnection API , along with repeaters to make a scalable broadcasting / live streaming solution using socketio for behavior control and signalling. Algorithm :

At the Publisher’s end

  1. GetUserMedia
  2. Start Room “liveConf”
  3. Add outgoing stream to session “liveConf “ with peer “BR” in 1 way transport
    1 outgoing audio stream -> 1 MB in 1 RTP port
    1 outgoing video -> 1 MB 1 more RTP port
    Total Required 2 MB and 2 RTP ports
    At the Repeater layer (high upload and download bandwidth )
  4. Peer “BR” opens parallel room “liveConf_1” , “liveConf_2” with 4 other peers “Repeater1 “, “Repeater2” , so on
  5. Repetare1 getRemoteStream from “liveConf_1” and add as localStream to “liveConf_1_1”

Here the upload bandwidth is high and each repeater is capable of handling 6 outgoing streams . Therefore total 4 repeaters can handle upto 24 streams very easily

At the Viewer’s end

  1. Viewer Joins room ”liveConf_1_1”
  2. Play the incoming stream on WebRTC browser video element”
WebRTC Relay nodes for multiple peers
WebRTC Relay nodes for multiple peers
  • (+) low CPU cost per node
    • As 6 viewers can connect to 1 repeater for feed , total of 24 viewers will require only 4 repeaters.
  • Only 2 MB consumption at publisher’s end and 2MB at each viewer’s end.

4. WebRTC  recorder to Broadcasting Media Server VOD

This process is essentially NOT a live streaming solution but a Video On Demand type of implementation for a recorded webRTC stream.

Figure shows a WebRTC node which can record the webrtc files as webm . Audio and video can be together recorded on firefox. With chrome one needs to merge a separately recorded webm ( video) and wav ( audio ) file to make a single webm file containing both audio and video and them store in VOD server’s repo.

WebRTC Scalable Streaming Server  - WebRTC Chunk recorder to Broadcasting Media Server VOD
WebRTC Chunk recorder to Broadcasting Media Server VOD

Although inherently Media Server do not support webm format but few new age lightweight media servers such as Kurento are capable of this.

(+) Can solve the end goal of broadcasting from a webrtc browser to multiple webrtc browsers without incurring extra load on any client machine ( Obviously assuming that Media Server handles the distribution of video and load sharing automatically )
(- ) It is not live streaming
(-) For significantly longer recorded streams the delta in a delay of streaming increases considerably. Ideally, this delta should be no more than 5 minutes.

To see the GStreamer and FFmpeg scripts , media servers ( like Wowza, Janus , Live555, Red5 ) and players ( VLC, raw RTP, RTMP , RTSP, MPEG DASH) refer to these posts

TFX Widgets Development

TFX is a modular widget based WebRTC communication and collaboration solution. It is a customizable solution where developers can create and add their own widget over the underlying WebRTC communication mechanism . It can support extensive set of user activity such as video chat , message , play games , collaborate on code , draw something together etc . It can go as wide as your imagination . This post describes the process of creating widgets to host over existing TFX platform .


It is required to have TFX Chrome extension installed and running from Chrome App Store under above . To do this follow the steps described in TangoFX v0.1 User’s manual.

Test TFX Sessions  ?

TFX Sessions uses the browser’s media API’s , like getUserMedia and Peerconnection to establish p2p media connection . Before media can traverse between 2 end points the signalling server is required to establish the path using Offer- Answer Model . This can be tested by making unit test cases on these function calls .

TFX Sessions uses socketio based handshake between peers to ascertain that they are valid endpoints to enter in a communication session . This is determined by SDP ( Session Description Parameters ) . The same can be observed in chrome://webrtc-internals/ traces and graphs .

TangoFX v8 Developer's manual - Google Docs

How to make widgets using TFX API ?

Step 1:  To make widgets for TFX , just write your simple web program which should consist of one main html webpage and associated css and js files for it .

Step 2 : Find an interesting idea which is requires minimal js and css . Remember it is a widget and not a full fleshed web project , however js frameworks like requirejs , angularjs , emberjs etc , work as well.

Step 3: Make a compact folder with the name of widget and put the respective files in it. For example the html files or view files would go to src folder , javascript files would goto js folder , css files would goto css folder , pictures to picture folder , audio files to sound folder and so on .

Step 4 : Once the widget is performing well in standalone environment , we can add a sync file to communicate the peer behaviors across TFX network . For this we primarily use 2 methods .

  • SendMessage : To send the data that will be traversed over DataChannel API of TFX . The content is in json format and will be shared with the peers in the session .
  • OnMessage : To receive the message communicated by the TFX API over network

Step 5: Submit the application to us or test it yourself by adding the plugin description in in widgetmanifest.json file . Few added widgets are

"plugintype": "code",
"id" 	:"LiveCode",
"type" 	: "code",
"title" : "Code widget",
"icon" 	: "btn btn-style glyphicon glyphicon-tasks",
"url"	: "../plugins/AddCode/src/codewindow.html"

"plugintype": "draw",
"id" 	:"Draw",
"type" 	: "draw",
"title" : "Code widget",
"icon" 	: "btn btn-style glyphicon glyphicon-pencil",
"url"	: "../plugins/AddDraw/src/drawwindow.html"

"plugintype": "pingpong",
"id"   :"Pingpong",
"type" : "pingpong",
"title": "ping pong widget",
"icon" : "btn btn-style glyphicon glyphicon-record",
"url"  : "../plugins/AddPingpong/src/main.html"

Step 6 : For proper orientation of the application make sure that overflow is hidden and padding to left is atleast 60 px so that it doesnt overlap with panel
padding-left: 60px;
overflow: hidden;

Step 7 : Voila the widget is ready to go .

Simple Messaging Widget

For demonstration purpose I have summarised the exact steps followed to create the simple messaging widget which uses WebRTC ‘s Datachannel API in the back and TFX SendMessage & OnMessage API to achieve

Step 1 : Think of a general chat scenario as present in various messaging si

Step 2: Made a folder structure with separation for js , css and src. Add the respective files in folder. It would look like following figure:

TangoFX v8 Developer's manual - Google Docs (1)
Step 3 : The html main page is

&amp;lt;!-- jquerry --&amp;gt;
&amp;lt;script src="../../../../js/jquery/jquery.min.js"&amp;gt;&amp;lt;/script&amp;gt;
&amp;lt;link rel="stylesheet" type="text/css" href="../../../../css/jquery-ui.min.css"/&amp;gt;
&amp;lt;link rel="stylesheet" type="text/css" href="../css/style.css"&amp;gt;

&amp;lt;div id="messages" class="message_area"&amp;gt;
&amp;lt;textarea disabled class="messageHistory" rows="30" cols="120" id="MessageHistoryBox"&amp;gt;&amp;lt;/textarea&amp;gt;
&amp;lt;input type ="text" id="MessageBox" size="120"/&amp;gt;
&amp;lt;script src="../js/sync.js" type="text/javascript"&amp;gt;&amp;lt;/script&amp;gt;

Step 4 : The contents of css file are

body {padding: 0; margin: 0; overflow: hidden;}

	  padding-left: 60px;
	background-color: transparent;
	background: transparent;

Step 5 : The contents of js file

//send message when mouse is on mesage dicv ans enter is hit
    if(event.keyCode == 13){
    var msg=$('#MessageBox').val();
    //send to peer
    var data ={

function addMessageLog(msg){
    //add text to text area for message log for self
 $('#MessageHistoryBox').text( $('#MessageHistoryBox').text() + '\n'+ 'you : '+ msg);

// handles send message
function sendMessage(message) {
      var widgetdata={
  // postmessage

//to handle  incoming message
function onmessage(evt) {
    //add text to text area for message log from peer
  if(evt.data.msgcontent!=null ){
      $('#MessageHistoryBox').text( $('#MessageHistoryBox').text() +'\n'+ 'other : '+ evt.data.msgcontent );


Step 6: The end result is :

TangoFX v8 Developer's manual - Google Docs (2)

Developing a cross origin Widget ( XHR)

Let us demonstrate the process and important points to create a cross- origin widget :

step 1 : Develop a separate web project and run it on a https

step 2 : Add the widget frame in TFX . Following is the code I added to make an XHR request over GET

var xmlhttp;
xmlhttp=new XMLHttpRequest();
  if (xmlhttp.readyState==4 &amp;amp;&amp;amp; xmlhttp.status==200)


step 3 : Using self made https we have have to open the url separately in browser and give it explicit permission to open in advanced setting. Make sure the original file is visible to you at the widgets url .

TangoFX v8 Developer's manual - Google Docs (3)
step 4: Adding permission to manifest for access the cross origin requests

"permissions": [

Step 5 : Rest of the process are similar to develop a regular widget ie css and js .

Step 6: Resulting widget on TFX

TangoFX v8 Developer's manual - Google Docs (4)

Note 1 :
In absence of changes to manifest file the cross origin request is meet with a Access-Control-Allow-Origin error .

Note 2:
While using POST the TFX responds with Failed to load resource: the server responded with a status of 404 (Not Found)

Note 3:
Also if instead of https http is used the TFX still responds with Failed to load resource: the server responded with a status of 404 (Not Found)