Fault Tolerance and Error Correction in WebRTC


Fluctuating Networks

WebRTC has build in capabilities to detect network glitches and adapt itself to changing situations. Some of the methodologies used are listed below.

Dynamic Bandwidth estimation

Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.

JitterBuffer

An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.

A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.

  • (+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
  • (-) can introduce latency for packets that arrive early
  • (-) Need active resisizing by means of feedback
    • for hi speed and goog network jitterbuffer can ve small sized
    • for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
  • (-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.

SDP renegotiation

-TBD

Demand for High Quality Video

Applications telehealth, advertising or broadcasting on WebRTC media streams

Tradeoff between Latency vs Quality

Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.

Layering for adaptive streaming

using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.

  • (-) more complex compression algorithms

Better compression algorithms vs CPU compute

A better performing compression algorithm produces fewer bits to encode the same video quality as its predecessor.

  • (-) Higher performing compression engines most always has higher energy consumption and carbon footprint
  • (+) resilent to network fluctuations

Full INTRA-frame Request (FIR)

Requests a key frame to decode the frame. Can be used when a new peer joins the conference a key frane is required to start decoding its video strea,.

Picture Loss Indication (PLI)

Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
FIRPIL
request a full key frame from the sender , when new memeber enters the session.request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss

Redundant Encoding (RED) in Media Packets

Recovers packet loss under lossy networks by adding extra bits of information in following packets.

  • (+) good for unpredictable networks

LBRR ( low bit-rate redundancy) – tbd

Congestion

Congestion is created when a network path has reached its maximum limits which could be due to

  • failures(switches, routers, cables, fibres ..)
  • over subscription and operating at peak bandwidth.
  • broadcast storms
  • Inapt BGP routing and congestion detection
  • BGP is responsisble for finiding the shortest routable path for a packet

The direct consequences of congestion for any network transport can be

  • High Latency
  • Connection Timeouts
  • Low throughput
  • Packet loss
  • Queueing delay

With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.

Feedback Loop

A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.

Overcome congestion with lower bitrate

Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems

Reduce frame quality and resolution

Full HD constraint
vga constraints

Congestion control Algorithms : Google Congestion Control ( GCC)

Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.

  • On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
  • On the sender side TWCC(Transport wide congestion control) can be used.

Other congestion control algorithms

  • QUIC Loss Detection and Congestion Control RFC 9002
  • Coupled Congestion Control for RTP Media rfc8699
  • NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
  • Self-Clocked Rate Adaptation for Multimedia RMCAT WG
  • SCReAM – Mobile optimised congestion control algorithm by Ericson

Low Network Strength and High Packet Loss

Packet loss is the loss of packets in transmission which could be owing to

  • network resources and path
  • transmission medium congestion
  • applications inability to absord delayed packets.
  • Maximum Transmission Unit size : measure of how large a single apcket can be.

Recovering Lost packets

High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.

Acknowledgement to identify packet loss

A receiver can notifiy the sender of the possible concerns around packet loss by means of sendings acks.

  • Selective Acknowledgement (SACK) : notifies the sender of multiple packets and thereby indicating gaps
  • Negative Acknowledgements (NACK ) : notifies the sender of packets lost
    • RTCP Packet Type 193 denotes NACK.
  • (+) higher NACK count is suggestive of high packet loss
  • (-) round trip time for NACK to send and waiting for packet to be retransmitted and receive in response can cause significant delay

Forward Error Correction (FEC)

The sender proactively send redundant data such that lost packets dont affact the stream on receiver’s end.

  • (+) receiver doesnt have to request for exgtra data to be sent , the sender does it by itself at RTP level
  • (+) less delay than NACK which incurs round trip time
  • (-) involve extra bandwidth.

Long distance Calls and High Round Trip Time

Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.

transmission_time = timestamp_send - timestamp_receive

For this reason RTT( Round Trip Time)is a better means to avoid clock synchoronization errors.

transmission_time = rtt /2 

Using Receiver reports and Sender Reports from RTCP to adjust to network conditions

Sender and receiver reports (SR and RR) provide a highlight of the connection and media quality streaming on this connection.

RTCP Senders report for WebRTC media stream
RTCP receivers report for WebRTC media stream

Low Latency Media Streaming

Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.

Optimize Compute resource

  • mobile agents have lesser computative power
  • Camera with features such as auto focus or other adjustments will taker more time to cappture
  • network should be of suited bandwidth and strength

Reduce information to be encoded and sent

  • Subject focus and blurring backgroud
  • Filtering noise at source
  • Voice Activity Detection (VAD)
    • send extra data in FEC only is there is voice activity detected in packet
  • Echo Cancellation

Measuring latency

Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization

NTP Synchronization of Audio Video Sync

During the buffereng of incoming [ackets ( which canrage from few ten of miliseconds to few hundred milisecond ) the streams are synchronized.

Time used by RTP for sync is NTP and RTP based ( which are not required to be in sync).

  • NTP Timestamp : 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. Formatted as fractional seconds since Jan 1, 1900
  • RTP Timestamp : RTP timestamp corresponds to the same instant as the NTP timestamp. Expressed in the units of the RTP media clock.
    • Majority of video formats use a 90kHz clock.
    • For receiver to sync audio and video streams these two streasm must be from same clock
Frame 300: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0 (outbound)
....
Packet type: Sender Report (200)
Length: 6 (28 bytes)
Sender SSRC: 0x39a659b4 (967203252)
Timestamp, MSW: 3855754463 (0xe5d224df)
Timestamp, LSW: 2364654374 (0x8cf1c326)
[MSW and LSW as NTP timestamp: Mar 8, 2022 18:54:23.550563999 UTC]
RTP timestamp: 1110449770

Demand for higher security on WebRTC’s CPaaS

Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.

Features :

  • multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
  • Multistreaming transmit several independent streams of chunks in parallel
  • SCTP has similarities to TCP retransmission and partial reliability like UDP.
  • Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
  • Validation and acknowledgment mechanisms protect against flooding attack

SCTP frames data as datagrams and not as a byte stream

  • (+) SCTP enables WebRTC to be multiplexing
  • (+) It has flow control and congestion avoidance support
  • (+)  

End to End Encryption

End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.

Minimize Public-private mapping pairs vai RTCP-mux

Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.

With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .

  • (+) easier to manage security by gathering ICE candidates for a single port only instead of 2
  • (+) increases the systesm capacity for media session using the same number of ports
  • (+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
  • WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing

References :

Performance of WebRTC sites and electron apps


This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.

WebRTC Clients

  1. Single-page applications (HTML5 + Js + CSS) on browser engine on OS
  2. Electron app 
    • Facebook Messenger, slack , twitch are some of the RTC based applications which have have electron clients as well.
  3. Web-view on mobile 
    • (-) doesn’t have advanced Webrtc API support eg, Media Recorder
  4. Native Applications on mobile OS( Android, iOS)
  5. Hybrid Applications (React Native)
  6. Embedded Device ( set-top box, IP camera, robots on raspberry pi)
    1. raw codecs libraries and gstt=reamer/FFmpeg script to create RTSP stream

Codecs weight

opus (111, minptime=10;useinbandfec=1)
VP8 (96)
frameWidth 640
frameHeight 480
framesPerSecond 30

DNS lookup Time

Services such as Pingdom (https://tools.pingdom.com/) or WebPageTest can quickly calculate your website’s DNS lookup times.

Load / sgtress testing for Caching and lookup times can be perfomed over tools such as LoadStorm , JMeter.

Alternatively use websoscket like setup inplace of non reusable TCP connection like HTTP or polling to set up signalling.

Bandwidth Estimation

Bandwidth can be estmated by

RTCP Receiver Reports which periodically summary to indicate packet loss rate and jitter etc from receiver.

a=rtcp-mux

TWCC (Transport Wide congestion Control ) calculates the intra-packet delays to estimate the Sender Side Bandwidth

a=rtcp-fb:96 transport-cc

REMB( Receiver Side Bandwidth Estimation) provide bandwidth estimation  by measuing the packet loss

  • used to configure the bitrate in video encoding
  • used to avoid congestion or slow media transmission
a=rtcp-fb:96 goog-remb

Best practices for WebRTC web clients

As a communication agent become a single HTML page driven client, a lot of authentication, heartbeat sync, web workers, signalling event-driven flow management resides on the same page along with the actual CPU consumption for the audio-video resources and media streams processing. This in turn can make the webpage heavy and many a time could result in a crash due to being ” unresponsive”.

Here are some my best to-dos for making sure the webrtc communication client page runs efficiently

Visual stability and CLS ( Cummulative Layout Shift)

CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.

To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .

Minimize main thread work

The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive.

Deprication of XMLHTTP request on main thread

Reduce Javacsipt execution Time

Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.

If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.

Some effective tips to spedding up JS execution include

  • minifying and compressing code
  • Removing the unused code and console.logs
  • Apply caching to save lookup time

Cookies – Security vs persistent state

Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.

While adding cookies we must ensure that if SameSite =None , the cookies must be secure

Set-Cookie: widget_session=abc123; SameSite=None; Secure

SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar. 

Set-Cookie: promo_shown=1; SameSite=Strict

You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.

Web Client Performance monitoring

Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .

Lighthouse results

Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App

Also shsows individual categories and comments

Time to render and Page load

Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has

  • Time to Title
  • Time to render
  • Time to inetract

Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports

Task interaction time

Other page interaction crtiteria includes the frames their inetraction and timings for the same.

In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.

Page’s total memeory

measureMemory()

performance.measureUserAgentSpecificMemory

The above functions ( old and new ) estimates the memory usage of the entire web page

these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks. Can also use these memery metrics to do A/B testing .

Page weight and PRPL

Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the page light and active and prevent any chrome tab crashes.

PRPL expands to Push/preload , Render , PreCache , Lazy load

  • Render the initial route as soon as possible.
  • Pre-cache remaining assets.
  • Lazy load other routes and non-critical assets.

Preload is a declarative fetch request that tells the browser to request a resource as soon as possible. Hence should be used for crticial assests .

<link rel="preload" as="script" href="critical.js">

The non critical compoenents could then be loaded on async .

Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.

Web Workers

Web Workers are a simple means for web content to run scripts in background threads.The Worker interface spawns real OS-level threads

By acting as a proxy, service workers can fetch assets directly from the cache rather than the server on repeat visits. 

Native Applications on mobile OS

Threads and Cores

Memeory

Network

CPU profiling

Energy consumption

References :


WebRTC APIs


MediaDevices

getUserMedia()

Media Capture and streams

const stream = await navigator.mediaDevices.getUserMedia({video: true});

getDisplayMedia()

used for screenshare access which could be full csreen a specific appliction or a browser tab

Other APIs in pipeline

  • preferCurrentTab – formerly known as getCurrentBrowsingContextMedia() API
navigator.mediaDevices.getDisplayMedia(
    {video: true, audio: true, preferCurrentTab: true});

enumerateDevices()

Enumerate Devices

Supported constraints

> navigator.mediaDevices.getSupportedConstraints();

aspectRatio: true
autoGainControl: true
brightness: true
channelCount: true
colorTemperature: true
contrast: true
deviceId: true
echoCancellation: true
exposureCompensation: true
exposureMode: true
exposureTime: true
facingMode: true
focusDistance: true
focusMode: true
frameRate: true
groupId: true
height: true
iso: true
latency: true
noiseSuppression: true
pan: true
pointsOfInterest: true
resizeMode: true
sampleRate: true
sampleSize: true
saturation: true
sharpness: true
tilt: true
torch: true
whiteBalanceMode: true
width: true
zoom: true

Peer-to-peer connections

RTCPeerConnection Interface

RTCPeerConnection creates , manages , monitors as well as tears down p2p communication channel

new RTCPeerConnection(servers, pcConstraint);

Deatiled Interface defination

interface RTCPeerConnection : EventTarget {

	RTCConfiguration getConfiguration();
	void setConfiguration(RTCConfiguration configuration);
	void close();

	Promise<RTCSessionDescriptionInit> createOffer(optional RTCOfferOptions options);
	Promise<RTCSessionDescriptionInit> createAnswer(optional RTCAnswerOptions options);
	Promise<void> setLocalDescription(optional RTCSessionDescriptionInit description);
	Promise<void> setRemoteDescription(optional RTCSessionDescriptionInit description);
	Promise<void> addIceCandidate(optional RTCIceCandidateInit candidate);

	readonly attribute RTCSessionDescription? localDescription;
	readonly attribute RTCSessionDescription? currentLocalDescription;
	readonly attribute RTCSessionDescription? pendingLocalDescription;  
	readonly attribute RTCSessionDescription? remoteDescription;
	readonly attribute RTCSessionDescription? currentRemoteDescription;
	readonly attribute RTCSessionDescription? pendingRemoteDescription;
	readonly attribute RTCSignalingState signalingState;
	readonly attribute RTCIceGatheringState iceGatheringState;
	readonly attribute RTCIceConnectionState iceConnectionState;
	readonly attribute RTCPeerConnectionState connectionState;
	readonly attribute boolean? canTrickleIceCandidates;

	void restartIce();
	static sequence<RTCIceServer> getDefaultIceServers();

	attribute EventHandler onnegotiationneeded;
	attribute EventHandler onicecandidate;
	attribute EventHandler onicecandidateerror;
	attribute EventHandler onsignalingstatechange;
	attribute EventHandler oniceconnectionstatechange;
	attribute EventHandler onicegatheringstatechange;
	attribute EventHandler onconnectionstatechange;
};

RTCConfiguration Dictionary

dictionary RTCConfiguration {
  sequence<RTCIceServer> iceServers;
  RTCIceTransportPolicy iceTransportPolicy;
  RTCBundlePolicy bundlePolicy;
  RTCRtcpMuxPolicy rtcpMuxPolicy;
  DOMString peerIdentity;
  sequence<RTCCertificate> certificates;
  [EnforceRange] octet iceCandidatePoolSize = 0;
};

RTCOAuthCredential Dictionary

Describes the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server

dictionary RTCOAuthCredential {
  required DOMString macKey;
  required DOMString accessToken;
};

RTCRtcpMuxPolicy Enum

what ICE candidates are gathered to support non-multiplexed RTCP.

  • negotiate – Gather ICE candidates for both RTP and RTCP candidates. If the remote-endpoint is capable of multiplexing RTCP, multiplex RTCP on the RTP candidates. If it is not, use both the RTP and RTCP candidates separately.
  • require – Gather ICE candidates only for RTP and multiplex RTCP on the RTP candidates. If the remote endpoint is not capable of rtcp-mux, session negotiation will fail.

If the value of configuration.rtcpMuxPolicy is set and its value differs from the connection’s rtcpMux policy, throw an InvalidModificationError. If the value is “negotiate” and the user agent does not implement non-muxed RTCP, throw a NotSupportedError.

An RTCPeerConnection object has a signaling state, a connection state, an ICE gathering state, and an ICE connection state.

RTCSignalingState Enum

“stable”,
“have-local-offer”,
“have-remote-offer”,
“have-local-pranswer”,
“have-remote-pranswer”,
“closed”

An RTCPeerConnection object has an operations chain which ensures that only one asynchronous operation in the chain executes concurrently.

Also an RTCPeerConnection object MUST not be garbage collected as long as any event can cause an event handler to be triggered on the object. When the object’s internal slot is true ie closed, no such event handler can be triggered and it is therefore safe to garbage collect the object.

CreateOffer()

generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including

  • descriptions of the local MediaStreamTracks attached to this RTCPeerConnection,
  • codec/RTP/RTCP capabilities
  • ICE agent (usernameFragment, password , local candiadtes etc )
  • DTLS connection
const pc = new RTCPeerConnection();
pc.createOffer()
.then(desc => pc.setLocalDescription(desc));

With more attributes

var pc = new RTCPeerConnection();
pc.createOffer({
     mandatory: {
        OfferToReceiveAudio: true,
        OfferToReceiveVideo: true
    },
    optional: [{
        VoiceActivityDetection: false
    }]
}).then(function(offer) {
	return pc.setLocalDescription(offer);
})
.then(function() {
// Send the offer to the remote through signaling server
})
.catch(handleError);

CreateAnswer()

generates an SDPanswer with the supported configuration for the session that is compatible with the parameters in the remote configuration

var pc = new RTCPeerConnection();
pc.createAnswer({
  OfferToReceiveAudio: true
  OfferToReceiveVideo: true
})
.then(function(answer) {
  return pc.setLocalDescription(answer);
})
.then(function() {
  // Send the answer to the remote through signaling server
})
.catch(handleError);

Offer/Answer Options – VoiceActivityDetection

dictionary RTCOfferAnswerOptions {
  boolean voiceActivityDetection = true;
};

capable of detecting “silence”

dictionary RTCOfferOptions : RTCOfferAnswerOptions {
  boolean iceRestart = false;
};
dictionary RTCAnswerOptions : RTCOfferAnswerOptions {};

Codec preferences of an m= section’s associated transceiver is said to be the value of the RTCRtpTranceiver with the following filtering applied

  • If direction is “sendrecv”, exclude any codecs not included in the intersection of RTCRtpSender.getCapabilities(kind).codecs and RTCRtpReceiver.getCapabilities(kind).codecs.
  • If direction is “sendonly”, exclude any codecs not included in RTCRtpSender.getCapabilities(kind).codecs.
  • If direction is “recvonly”, exclude any codecs not included in RTCRtpReceiver.getCapabilities(kind).codecs.

Legacy Interface Extensions

partial interface RTCPeerConnection {
 
Promise<void> createOffer(
 RTCSessionDescriptionCallback successCallback,
 RTCPeerConnectionErrorCallback failureCallback,
 optional RTCOfferOptions options);
 
Promise<void> setLocalDescription(
optional RTCSessionDescriptionInit description,
VoidFunction successCallback,                                  
RTCPeerConnectionErrorCallback failureCallback);

Promise<void> createAnswer(
RTCSessionDescriptionCallback successCallback,
RTCPeerConnectionErrorCallback failureCallback);  

Promise<void> setRemoteDescription(
optional RTCSessionDescriptionInit description,
VoidFunction successCallback,                    
RTCPeerConnectionErrorCallback failureCallback);

Promise<void> addIceCandidate(
RTCIceCandidateInit candidate,
VoidFunction successCallback,
RTCPeerConnectionErrorCallback failureCallback);
};

Session Description Model

enum RTCSdpType {
  "offer",
  "pranswer",
  "answer",
  "rollback"
};
interface RTCSessionDescription {
  readonly attribute RTCSdpType type;
  readonly attribute DOMString sdp;
  [Default] object toJSON();
};
dictionary RTCSessionDescriptionInit {
  RTCSdpType type;
  DOMString sdp = "";
};

RTCPriorityType
Priority and QoS Model which can be : “very-low”, “low”, “medium”, “high”

RTP Media API

Send and receive MediaStreamTracks over a peer-to-peer connection.
Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.

getTracks()

const pc = new RTCPeerConnection();
let stream = await navigator.mediaDevices.getUserMedia({
video: true});
...
stream.getTracks()

[MediaStreamTrack]
0: MediaStreamTrack
contentHint: ""
enabled: true
id: "d5b10d39-cf26-4705-8831-6a0fa89546a0"
kind: "video"
label: "Integrated Webcam (0bda:58c2)"
muted: false
oncapturehandlechange: null
onended: null
onmute: null
onunmute: null
readyState: "live"

RTCRtpSenders objects manages encoding and transmission of MediaStreamTracks .

RTCRtpReceivers objects manage the reception and decoding of MediaStreamTracks. These are associated with one track.

  • RTCRtpReceiver.getContributingSources()
  • RTCRtpReceiver.getParameters()
  • RTCRtpReceiver.getStats()
  • RTCRtpReceiver.getSynchronizationSources()
    • returns audio level, rtptimestamp, source , timesatmp
  • RTCRtpReceiver.getCapabilities()

RTCRtpTransceivers interface describes a permanent pairing of an RTCRtpSender and an RTCRtpReceiver. Each transceiver is uniquely identified using its mid ( media id) property from the corresponding m-line.

They are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack(), or explicitly when the application uses the addTransceiver(). They are also created when a remote description is applied that includes a new media description.

rtpTransceiver = RTCPeerConnection.addTransceiver(trackOrKind, init);

trackOrKind should be either audio or video othereise a TypeError is thrown. init is optional – can contain direction , sendEncodings , streams.

var tracks = stream.getTracks();
tracks.forEach(function(track){ 
    pc.addTrack(track, stream)
 });

For example if the stream’s track is

stream.getTracks()

[MediaStreamTrack]
0: MediaStreamTrack
    contentHint: ""
    enabled: true
    id: "d5b10d39-cf26-4705-8831-6a0fa89546a0"
    kind: "video"
    label: "Integrated Webcam (0bda:58c2)"
    muted: false
    oncapturehandlechange: null
    onended: null
    onmute: null
    onunmute: null
    readyState: "live"
[[Prototype]]: MediaStreamTrack
length: 1

the the sender ( fetched from getTransceivers() API) is

pc.getTransceivers()

0: RTCRtpTransceiver
	currentDirection: null
	direction: "sendrecv"
	headerExtensionsNegotiated: []
	headerExtensionsToOffer: (15) [{…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}]
	mid: null
	receiver: RTCRtpReceiver {track: MediaStreamTrack, transport: null, rtcpTransport: null, playoutDelayHint: null}
	sender: RTCRtpSender
	    dtmf: null
	    rtcpTransport: null
	    track: MediaStreamTrack {kind: 'video', id: 'd5b10d39-cf26-4705-8831-6a0fa89546a0', label: 'Integrated Webcam (0bda:58c2)', enabled: true, muted: false, …}
	    transport: null
	    [[Prototype]]: RTCRtpSender
	    stopped: false
	[[Prototype]]: RTCRtpTransceiver
length: 1

RTCPeerConnection Interface

partial interface RTCPeerConnection {
  
sequence<RTCRtpSender> getSenders();
sequence<RTCRtpReceiver> getReceivers();
sequence<RTCRtpTransceiver> getTransceivers();

RTCRtpSender addTrack(MediaStreamTrack track, MediaStream... streams);
void removeTrack(RTCRtpSender sender);
RTCRtpTransceiver addTransceiver((MediaStreamTrack or DOMString) trackOrKind, optional RTCRtpTransceiverInit init);
attribute EventHandler ontrack;
};

RTCRtpTransceiverInit

dictionary RTCRtpTransceiverInit {
  RTCRtpTransceiverDirection direction = "sendrecv";
  sequence<MediaStream> streams = [];
  sequence<RTCRtpEncodingParameters> sendEncodings = [];
};

RTCRtpTransceiverDirection can be either of : “sendrecv”, “sendonly”, “recvonly”, “inactive”, “stopped”

RTCRtpSender Interface

Allows an application to control how a given MediaStreamTrack is encoded and transmitted to a remote peer.

interface RTCRtpSender {
readonly attribute MediaStreamTrack? track;
readonly attribute RTCDtlsTransport? transport;
readonly attribute RTCDtlsTransport? rtcpTransport;
static RTCRtpCapabilities? getCapabilities(DOMString kind);
Promise<void> setParameters(RTCRtpSendParameters parameters);
RTCRtpSendParameters getParameters();
Promise<void> replaceTrack(MediaStreamTrack? withTrack);
void setStreams(MediaStream... streams);
Promise<RTCStatsReport> getStats();
};

RTCRtpParameters Dictionary

dictionary RTCRtpParameters {
  required sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
  required RTCRtcpParameters rtcp;
  required sequence<RTCRtpCodecParameters> codecs;
};

RTCRtpSendParameters Dictionary

dictionary RTCRtpSendParameters : RTCRtpParameters {
  required DOMString transactionId;
  required sequence<RTCRtpEncodingParameters> encodings;
  RTCDegradationPreference degradationPreference = "balanced";
  RTCPriorityType priority = "low";
};

RTCRtpReceiveParameters Dictionary

dictionary RTCRtpReceiveParameters : RTCRtpParameters {
  required sequence<RTCRtpDecodingParameters> encodings;
};

RTCRtpCodingParameters Dictionary

dictionary RTCRtpCodingParameters {
  DOMString rid;
};

RTCRtpDecodingParameters Dictionary

dictionary RTCRtpDecodingParameters : RTCRtpCodingParameters {};

RTCRtpEncodingParameters Dictionary

dictionary RTCRtpEncodingParameters : RTCRtpCodingParameters {
  octet codecPayloadType;
  RTCDtxStatus dtx;
  boolean active = true;
  unsigned long ptime;
  unsigned long maxBitrate;
  double maxFramerate;
  double scaleResolutionDownBy;
};

RTCDtxStatus Enum

disabled- Discontinuous transmission is disabled.
enabled- Discontinuous transmission is enabled if negotiated.

RTCDegradationPreference Enum

enum RTCDegradationPreference {
  "maintain-framerate",
  "maintain-resolution",
  "balanced"
};

RTCRtcpParameters Dictionary

dictionary RTCRtcpParameters {
  DOMString cname;
  boolean reducedSize;
};

RTCRtpHeaderExtensionParameters Dictionary

dictionary RTCRtpHeaderExtensionParameters {
  required DOMString uri;
  required unsigned short id;
  boolean encrypted = false;
};

RTCRtpCodecParameters Dictionary

dictionary RTCRtpCodecParameters {
  required octet payloadType;
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;
};

payloadType – identify this codec.
mimeType – codec MIME media type/subtype. Valid media types and subtypes are listed in [IANA-RTP-2]
clockRate – expressed in Hertz
channels – number of channels (mono=1, stereo=2).
sdpFmtpLine – “format specific parameters” field from the “a=fmtp” line in the SDP corresponding to the codec

RTCRtpCapabilities Dictionary

dictionary RTCRtpCapabilities {
  required sequence<RTCRtpCodecCapability> codecs;
  required sequence<RTCRtpHeaderExtensionCapability> headerExtensions;
};

RTCRtpCodecCapability Dictionary

dictionary RTCRtpCodecCapability {
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;
};

RTCRtpHeaderExtensionCapability Dictionary

dictionary RTCRtpHeaderExtensionCapability {
  DOMString uri;
};

Example JS code to RTCRtpCapabilities

const pc = new RTCPeerConnection();
const transceiver = pc.addTransceiver('audio');
const capabilities = RTCRtpSender.getCapabilities('audio');

Output :

codecs: Array(13)
0: {channels: 2, clockRate: 48000, mimeType: "audio/opus", sdpFmtpLine: "minptime=10;useinbandfec=1"}
1: {channels: 1, clockRate: 16000, mimeType: "audio/ISAC"}
2: {channels: 1, clockRate: 32000, mimeType: "audio/ISAC"}
3: {channels: 1, clockRate: 8000, mimeType: "audio/G722"}
4: {channels: 1, clockRate: 8000, mimeType: "audio/PCMU"}
5: {channels: 1, clockRate: 8000, mimeType: "audio/PCMA"}
6: {channels: 1, clockRate: 32000, mimeType: "audio/CN"}
7: {channels: 1, clockRate: 16000, mimeType: "audio/CN"}
8: {channels: 1, clockRate: 8000, mimeType: "audio/CN"}
9: {channels: 1, clockRate: 48000, mimeType: "audio/telephone-event"}
10: {channels: 1, clockRate: 32000, mimeType: "audio/telephone-event"}
11: {channels: 1, clockRate: 16000, mimeType: "audio/telephone-event"}
12: {channels: 1, clockRate: 8000, mimeType: "audio/telephone-event"}
length: 13

headerExtensions: Array(6)
0: {uri: "urn:ietf:params:rtp-hdrext:ssrc-audio-level"}
1: {uri: "http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time"}
2: {uri: "http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01"}
3: {uri: "urn:ietf:params:rtp-hdrext:sdes:mid"}
4: {uri: "urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id"}
5: {uri: "urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id"}
length: 6

RTCRtpReceiver Interface

allows an application to inspect the receipt of a MediaStreamTrack.

interface RTCRtpReceiver {
  readonly attribute MediaStreamTrack track;
  readonly attribute RTCDtlsTransport? transport;
  readonly attribute RTCDtlsTransport? rtcpTransport;
  static RTCRtpCapabilities? getCapabilities(DOMString kind);
  RTCRtpReceiveParameters getParameters();
  sequence<RTCRtpContributingSource> getContributingSources();
  sequence<RTCRtpSynchronizationSource> getSynchronizationSources();
  Promise<RTCStatsReport> getStats();
};

dictionary RTCRtpContributingSource

dictionary RTCRtpContributingSource {
  required DOMHighResTimeStamp timestamp;
  required unsigned long source;
  double audioLevel;
  required unsigned long rtpTimestamp;
};

dictionary RTCRtpSynchronizationSource

dictionary RTCRtpSynchronizationSource : RTCRtpContributingSource {
  boolean voiceActivityFlag;
};

voiceActivityFlag of type boolean – Only present for audio receivers. Whether the last RTP packet, delivered from this source, contains voice activity (true) or not (false).

RTCRtpTransceiver Interface

Each SDP media section describes one bidirectional SRTP (“Secure Real Time Protocol”) stream. RTCRtpTransceiver describes this permanent pairing of an RTCRtpSender and an RTCRtpReceiver, along with some shared state. It is uniquely identified using its mid property.

Thus it is combination of an RTCRtpSender and an RTCRtpReceiver that share a common mid. An associated transceiver( with mid) is one that’s represented in the last applied session description.

interface RTCRtpTransceiver {
    readonly attribute DOMString? mid;
    [SameObject] readonly attribute RTCRtpSender sender;
    [SameObject] readonly attribute RTCRtpReceiver receiver;
    attribute RTCRtpTransceiverDirection direction;
    readonly attribute RTCRtpTransceiverDirection? currentDirection;
    void stop();
    void setCodecPreferences(sequence<RTCRtpCodecCapability> codecs);
};

Method stop() – Irreversibly marks the transceiver as stopping, unless it is already stopped. This will immediately cause the transceiver’s sender to no longer send, and its receiver to no longer receive.
stopping transceiver will cause future calls to createOffer to generate a zero port in the media description for the corresponding transceiver and stopped transceiver will cause future calls to createOffer or createAnswer to generate a zero port in the media description for the corresponding transceiver

setCodecPreferences()

This method overrides the default codec preferences used by the user agent.

Example setting codec Preference for OPUS in audio

peer = new RTCPeerConnection();    
const transceiver = peer.addTransceiver('audio');
const audiocapabilities = RTCRtpSender.getCapabilities('audio');
let codec = [];
codec.push(audiocapabilities.codecs[0]); // opus 
transceiver.setCodecPreferences(codec);

Before setting codec preference for OPUS

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0

a=recvonly
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000

After setting codec preference for OPUS audio

m=audio 9 UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0

a=sendrecv
a=msid:hcgvWcGG7WhdzboWk79q39NiO8xkh4ArWhbM f15d77bb-7a6f-4f41-80cd-51a3c40de7b7
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

RTCDtlsTransport

Access to information about the Datagram Transport Layer Security (DTLS) transport over which RTP and RTCP packets are sent and received by RTCRtpSender and RTCRtpReceiver objects, as well other data such as SCTP packets sent and received by data channels.
Each RTCDtlsTransport object represents the DTLS transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

interface RTCDtlsTransport : EventTarget {
  [SameObject] readonly attribute RTCIceTransport iceTransport;
  readonly attribute RTCDtlsTransportState state;
  sequence<ArrayBuffer> getRemoteCertificates();
  attribute EventHandler onstatechange;
  attribute EventHandler onerror;
};

RTCDtlsTransportState Enum

  • “new”- DTLS has not started negotiating yet.
  • “connecting” – DTLS is in the process of negotiating a secure connection and verifying the remote fingerprint.
  • “connected”- DTLS has completed negotiation of a secure connection and verified the remote fingerprint.
  • “closed” – transport has been closed intentionally like close_notify alert, or calling close().
  • “failed” – transport has failed as the result of an error like failure to validate the remote fingerprint

RTCDtlsFingerprint dictionary

dictionary RTCDtlsFingerprint {
  DOMString algorithm;
  DOMString value;
};

Protocols multiplexed with RTP (e.g. data channel) share its component ID. This represents the component-id value 1 when encoded in candidate-attribute while ICE candadte for RTCP has component-id value 2 when encoded in candidate-attribute.

RTCTrackEvent

The track event uses the RTCTrackEvent interface.

interface RTCTrackEvent : Event {
  readonly attribute RTCRtpReceiver receiver;
  readonly attribute MediaStreamTrack track;
  [SameObject] readonly attribute FrozenArray<MediaStream> streams;
  readonly attribute RTCRtpTransceiver transceiver;
};

dictionary RTCTrackEventInit

dictionary RTCTrackEventInit : EventInit {
  required RTCRtpReceiver receiver;
  required MediaStreamTrack track;
  sequence<MediaStream> streams = [];
  required RTCRtpTransceiver transceiver;
};

RTCIceCandidate

This interface candidate Internet Connectivity Establishment (ICE) configuration used to setup RTCPeerconnection. To facilitate routing of media on given peer connection, both endpoints exchange several candidates and then one candidate out of the lot is chosen which will be then used to initiate the connection.

const pc = new RTCPeerConnection();
pc.addIceCandidate({candidate:''});
  • candidate – transport address for the candidate that can be used for connectivity checks.
  • component – candidate is an RTP or an RTCP candidate
  • foundation – unique identifier that is the same for any candidates of the same type , helps optimize ICE performance while prioritizing and correlating candidates that appear on multiple RTCIceTransport objects.
  • ip , port
  • priority
  • protocol – tcp/udp
  • relatedAddress , relatedPort
  • sdpMid – candidate’s media stream identification tag
  • sdpMLineIndex

usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).

RTCIceCredentialType Enum : supports OAuth 2.0 based authentication. The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection setConfiguration method to periodically refresh the TURN credentials.

enum RTCIceCredentialType {
  "password",
  "oauth"
};

RTCIceServer Dictionary : Describes the STUN and TURN servers that can be used by the ICE Agent to establish a connection with a peer.

dictionary RTCIceServer {    
   required (DOMString or sequence<DOMString>) urls; 
   DOMString username;      
   (DOMString or RTCOAuthCredential) credential; 
   RTCIceCredentialType credentialType = "password";  
};   
 [{urls: 'stun:stun1.example.net'},
  {urls: ['turns:turn.example.org', 'turn:turn.example.net'],
    username: 'user',
    credential: 'myPassword',
    credentialType: 'password'},
  {urls: 'turns:turn2.example.net',
    username: 'xxx',
    credential: {
      macKey: 'xyz=',
      accessToken: 'xxx+yyy=='
    },
    credentialType: 'oauth'}
];

RTCIceTransportPolicy Enum

ICE candidate policy [JSEP] to select candidates for the ICE connectivity checks

  • relay – use only media relay candidates such as candidates passing through a TURN server. It prevents the remote endpoint/unknown caller from learning the user’s IP addresses
  • all – ICE Agent can use any type of candidate when this value is specified.

RTCBundlePolicy Enum

  • balanced – Gather ICE candidates for each media type (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
  • max-compat – Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
  • max-bundle – Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

If the value of configuration.bundlePolicy is set and its value differs from the connection’s bundle policy, throw an InvalidModificationError.

Interfaces for Connectivity Establishment

describes ICE candidates

interface RTCIceCandidate {
  DOMString candidate;
  DOMString sdpMid;
  unsigned short sdpMLineIndex;
  DOMString foundation;
  RTCIceComponent component;
  unsigned long priority;
  DOMString address;
  RTCIceProtocol protocol;
  unsigned short port;
  RTCIceCandidateType type;
  RTCIceTcpCandidateType tcpType;
  DOMString relatedAddress;
  unsigned short relatedPort;
  DOMString usernameFragment;
  RTCIceCandidateInit toJSON();
};

RTCIceProtocol can be either tcp or udp

TCP candidate type which can be either of

  • active – An active TCP candidate is one for which the transport will attempt to open an outbound connection but will not receive incoming connection requests.
  • passive – A passive TCP candidate is one for which the transport will receive incoming connection attempts but not attempt a connection.
  • so – An so candidate is one for which the transport will attempt to open a connection simultaneously with its peer.

UDP candidate type

  • host – actual direct IP address of the remote peer
  • srflx – server reflexive ,  generated by a STUN/TURN server
  • prflx – peer reflexive ,IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE
  • relay – generated using TURN

ICE Candidate UDP Host

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:27784895 1 udp 2122260223 192.168.1.114 51577 typ host generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:27784895 1 udp 2122260223 192.168.1.114 51382 typ host generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:27784895 1 udp 2122260223 192.168.1.114 53600 typ host generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate TCP Host

Notice TCP host candidates for mid 0 , 1 and 3 for video , audio and data media types

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1327761999 1 tcp 1518280447 192.168.1.114 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:1327761999 1 tcp 1518280447 192.168.1.114 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:1327761999 1 tcp 1518280447 192.168.1.114 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate UDP Srflx

Notice 3 candidates for 3 streams sdpMid 0,1 and 2

sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:2163208203 1 udp 1686052607 117.201.90.218 27177 typ srflx raddr 192.168.1.114 rport 53600 generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:2163208203 1 udp 1686052607 117.201.90.218 27176 typ srflx raddr 192.168.1.114 rport 51382 generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2163208203 1 udp 1686052607 117.201.90.218 27175 typ srflx raddr 192.168.1.114 rport 51577 generation 0 ufrag muSq network-id 1 network-cost 10
Peer reflexive Candidate

ICE Candidate (host)

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2880323124 1 udp 2122260223 192.168.1.116 61622 typ host generation 0 ufrag jsPO network-id 1 network-cost 10

usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).

a=ice-ufrag:D+yg
a=ice-pwd:2ep6CXO0FP/JfS4ue/dfjeQM
a=ice-options:trickle

RTCPeerConnectionIceEvent

interface RTCPeerConnectionIceEvent : Event {
  readonly attribute RTCIceCandidate? candidate;
  readonly attribute DOMString? url;
};

RTCPeerConnectionIceErrorEvent

interface RTCPeerConnectionIceErrorEvent : Event {
  readonly attribute DOMString hostCandidate;
  readonly attribute DOMString url;
  readonly attribute unsigned short errorCode;
  readonly attribute USVString errorText;
};

RTCIceTransport Interface

Access to information about the ICE transport over which packets are sent and received. Each RTCIceTransport object represents the ICE transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

interface RTCIceTransport : EventTarget {
    readonly attribute RTCIceRole role;
    readonly attribute RTCIceComponent component;
    readonly attribute RTCIceTransportState state;
    readonly attribute RTCIceGathererState gatheringState;
    sequence<RTCIceCandidate> getLocalCandidates();
    sequence<RTCIceCandidate> getRemoteCandidates();
    RTCIceCandidatePair? getSelectedCandidatePair();
    RTCIceParameters? getLocalParameters();
    RTCIceParameters? getRemoteParameters();
    attribute EventHandler onstatechange;
    attribute EventHandler ongatheringstatechange;
    attribute EventHandler onselectedcandidatepairchange;
};

RTCIceParameters Dictionary

dictionary RTCIceParameters {
  DOMString usernameFragment;
  DOMString password;
};

RTCIceCandidatePair Dictionary

dictionary RTCIceCandidatePair {
  RTCIceCandidate local;
  RTCIceCandidate remote;
};

RTCIceGatheringState Enum

  1. “closed”,
  2. “failed”,
  3. “disconnected”,
  4. “new”,
  5. “connecting”,
  6. “connected”

RTCIceGathererState Enum

  1. “new”,
  2. “gathering”,
  3. “complete”

RTCIceTransportState Enum

  1. new – ICE agent is gathering addresses or is waiting to be given remote candidates 
  2. checking –
  3. connected – Found a working candidate pair, but still performing connectivity checks to find a better one.
  4. completed – Found a working candidate pair and done performing connectivity checks.
  5. disconnected
  6. failed
  7. closed

RTCIceRole Enum

unknown : agent who role is not yet defined
controlling : controlling agent
controlled : controlled agent

RTCIceComponent Enum

rtp : ICE Transport is used for RTP (or RTCP multiplexing)
rtcp : ICE Transport is used for RTCP

Peer-to-peer Data API

Connection.createDataChannel('sendDataChannel', dataConstraint);

With SCTP, the protocol used by WebRTC data channels, reliable and ordered data delivery is on by default.

Sending large files

 Split data channel message in chunks 
var CHUNK_LEN = 64000; // 64 Kb
var img = photoContext.getImageData(0, 0, photoContextW, photoContextH),
len = img.data.byteLength,
n = len / CHUNK_LEN | 0;

for (var i = 0; i < n; i++) {
    var start = i * CHUNK_LEN, end = (i + 1) * CHUNK_LEN;
    dataChannel.send(img.data.subarray(start, end));
}
// last chunk
if (len % CHUNK_LEN) {
   dataChannel.send(img.data.subarray(n * CHUNK_LEN));
}

Statistics

The browser maintains a set of statistics for monitored objects, in the form of stats objects. A group of related objects may be referenced by a selector( like MediaStreamTrack that is sent or received by the RTCPeerConnection).

Statistics API extends the RTCPeerConnection interface

partial interface RTCPeerConnection {
  Promise<RTCStatsReport> getStats(optional MediaStreamTrack? selector = null);
  attribute EventHandler onstatsended;
};

getStats()

Method getStats() Gathers stats for the given selector and reports the result asynchronously.

{
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_96",
  mimeType: "video/VP8",
  payloadType: 96,
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
}
{
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_97",
  mimeType: "video/rtx",
  payloadType: 97,
  sdpFmtpLine: "apt=96",
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
}
{
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_98",
  mimeType: "video/VP9",
  payloadType: 98,
  sdpFmtpLine: "profile-id=0",
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
}
{
  clockRate: 90000,
  id: "RTCCodec_0_Inbound_99",
  mimeType: "video/rtx",
  payloadType: 99,
  sdpFmtpLine: "apt=98",
  timestamp: 1644263636999.506,
  transportId: "RTCTransport_0_1",
  type: "codec"
}

Quality limitation in pc.getstats()

{
  bytesSent: 2646771,
  codecId: "RTCCodec_0_Outbound_96",
  encoderImplementation: "libvpx",
  firCount: 0,
  frameHeight: 540,
  framesEncoded: 471,
  framesPerSecond: 31,
  framesSent: 471,
  frameWidth: 960,
  headerBytesSent: 67900,
  hugeFramesSent: 0,
  id: "RTCOutboundRTPVideoStream_971979131",
  keyFramesEncoded: 4,
  kind: "video",
  mediaSourceId: "RTCVideoSource_48",
  mediaType: "video",
  nackCount: 0,
  packetsSent: 2657,
  pliCount: 0,
  qpSum: 6598,
  qualityLimitationDurations: {
    bandwidth: 15882,
    cpu: 0,
    none: 36,
    other: 0
  },
  qualityLimitationReason: "bandwidth",
  qualityLimitationResolutionChanges: 0,
  remoteId: "RTCRemoteInboundRtpVideoStream_971979131",
  retransmittedBytesSent: 0,
  retransmittedPacketsSent: 0,
  ssrc: 971979131,
  timestamp: 1644263635999.547,
  totalEncodedBytesTarget: 0,
  totalEncodeTime: 2.866,
  totalPacketSendDelay: 47.68,
  trackId: "RTCMediaStreamTrack_sender_48",
  transportId: "RTCTransport_0_1",
  type: "outbound-rtp"
}

RTCStatsReport Object

map between strings that identify the inspected objects (id attribute in RTCStats instances), and their corresponding RTCStats-derived dictionaries.

interface RTCStatsReport {
  readonly maplike<DOMString, object>;
};

RTCStats Dictionary

stats object constructed by inspecting a specific monitored object.

dictionary RTCStats {
  required DOMHighResTimeStamp timestamp;
  required RTCStatsType type;
  required DOMString id;
};

RTCStatsEvent

Constructor (DOMString type, RTCStatsEventInit eventInitDict)]

interface RTCStatsEvent : Event {
  readonly attribute RTCStatsReport report;
};

dictionary RTCStatsEventInit

dictionary RTCStatsEventInit : EventInit {
  required RTCStatsReport report;
};

Stats

  • RTCSenderVideoTrackAttachmentStats
  • RTCSenderAudioTrackAttachmentStats

Stream stats

  • RTCRTPStreamStats : ssrc, kind, transportId, codecId + attr
  • RTCReceivedRTPStreamStats : packetsReceived, packetsLost, jitter, packetsDiscarded + attr
  • RTCSentRTPStreamStats : packetsSent, bytesSent + attr
  • RTCPeerConnectionStats, with attributes dataChannelsOpened, dataChannelsClosed
  • RTCDataChannelStats : label, protocol, dataChannelIdentifier, state, messagesSent, bytesSent, messagesReceived, bytesReceived + attr
  • RTCMediaStreamStats : streamIdentifer, trackIds
  • RTCVideoSenderStats : framesSent
  • RTCVideoReceiverStats : framesReceived, framesDecoded, framesDropped, partialFramesLost + attr
  • RTCCodecStats : payloadType, codecType, mimeType, clockRate, channels, sdpFmtpLine
  • RTCTransportStats : bytesSent, bytesReceived, rtcpTransportStatsId, selectedCandidatePairId, localCertificateId, remoteCertificateId
  • RTCCertificateStats : fingerprint, fingerprintAlgorithm, base64Certificate, issuerCertificateId + attr

OutboundInbound stats

  • RTCOutboundRTPStreamStats : trackId, senderId, remoteId, framesEncoded, nackCount + attr
  • RTCRemoteOutboundRTPStreamStats : localId, remoteTimestamp + attr
  • RTCInboundRTPStreamStats : trackId, receiverId, remoteId, framesDecoded, nackCount + attr
  • RTCRemoteInboundRTPStreamStats : localId, bytesReceived, roundTripTime + attr

Handler Stats

  • RTCMediaHandlerStats : trackIdentifier, ended + attr
  • RTCAudioHandlerStats : audioLevel + attr
  • RTCVideoHandlerStats : frameWidth, frameHeight, framesPerSecond + attr

RTCICE stats

  • RTCIceCandidatePairStats : transportId, localCandidateId, remoteCandidateId, state, priority, nominated, bytesSent, bytesReceived, totalRoundTripTime, currentRoundTripTime
  • RTCIceCandidateStats : address, port, protocol, candidateType, url

References :

JavaScript Session Establishment Protocol (JSEP) in WebRTC handshake


This article is aimed at explaining the intricacies and detailed offer answer flow in webrtc handshake and JSEP. You can read the following articles on WebRTC as a prereq before reading through this one. WebRTC has API s namely – Peerconnection , getUserMedia , Datachannel and getStats.

JSEP (JavaScript Session Establishment Protocol)

JSEP is used during signalling via w3c’s recommended RTCPeerConnectionAPI interface to set up a multimedia session. The multimedia session description specifies the critical components of setting up a session between local and remote such as transport ports, protocol, profiles. It also handles the interaction with the ICE state machine.

Offer/Answer Excahange Flow

prereq : Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

  1. Side initiating the session creates a offer by CreateOffer() API
aPromise = myPeerConnection.createOffer([options]);

options is type of RTC Offer Options

  • iceRestart
  • offerToReceiveAudio ( legacy)
  • offerToReceiveVideo ( legacy)
  • voiceActivityDetection

2. The application then stores the offer in local config as setLocalDescriptionAPI()

 myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);
})

3. Offer is sent to remote side using its choice of signalling ( SIP , WS , HTTP, XMPP .. )

4. Remote party stores it use setRemoteDescription() API

myPeerConnection.setRemoteDescription(sdp)
.then(function () {
  return createMyStream();
})

4. Remote part generates an answer using createAnswer() API

aPromise = RTCPeerConnection.createAnswer([options]);

5. Remote party stores the answer in its local config using setLocalDescription() API

6. Answer is transferred to Initiator side using choice of signalling ( SIP , WS , HTTP, XMPP .. ) again

7. Initiating side stores it use setRemoteDescription() API

Interfaces of webrtc and tracks to stream addition

Perform webRTC handshake

Webrtc call setup and incoming call callflow between remote peer , peerconnection factory , peerconnection and application

Outgoing Call: setting up a call with remote by sending an offer. Wait for the remote’s answer to process it to create the session.

JSEP setup a call

Incoming Call : Receive remote’s offer and process to reply with an answer.

JSEP flow to receive a call

Signalling state Transitions on PeerConnection

As the caller initiates a new RTCPeerConnection() , the RTCSignalingState state is “stable” as remote and local descriptions are empty

As the caller initiates call and calls createOffer() , he now has offer SDP and procced to store offer locally with setLocalDescription(offer) the RTCSignalingState state is “have-local-offer” . After than caller send the offer to callee over signalling channel

Simillarily as the calle recives the offer, it starts with RTCSignalingState stable and then proceeds to store the Remote’s offer using setRemoteDescription(offer), its state is now “have-remote-offer”

The callee generates a provsional answer and for caller and stores it locally , state transitiosn to “have-local-pranswer“. The pranswer SDP is send to caller over signalling channel again .

Caller stores the callee’s pr answer SDP and state updates to “have-remote-pranswer”

img : https://w3c.github.io/webrtc-pc

Once there is no offer/answer exchange in progress the state again changes to ” stable “.

State schanges to “closed” if RTCpeerConnection is closed

Detailed Offer / Answer SDP

Local Offer created by side initiating the session / Caller

The first offfer called initial offer can have dummy date for contact line such as 0.0.0.0 to prevent leaking a local Ip address

c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0

“o=” line contains <username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

o=- 4445251981417004127 2 IN IP4 127.0.0.

shows username – and 4445251981417004127 as session id. Same username “-” is specified in “s=” line

“t=” line shows <start time> <stop time>

t=0 0

Full session Block example

type: offer, sdp: v=0
o=- 4445251981417004127 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2

Media Section : An m= section is generated for each RtpTransceiver that has been added to the PeerConnection. For the initial offer since no ports are available yet , dummy port 9 can be sadded. However if it is bundle only then port value is set to 0. Later the port value will be set to the port value of default ICE candidate.

DTLS filed “UDP/TLS/RTP/SAVPF” is followed by the list of codecs in order of priority.

“c=” line in msection too must be filled with dummy values if IP 0.0.0.0 as no candidates are available yet .

ICE

a=ice-options:trickle

Transport

“a=ice-ufrag” , “a=ice-pwd” , “a=fingerprint” , “a=setup” , “a=tls-id”

Media Stream Identification attribute “a-mid:”

For each media format on the m= line, “a=rtpmap” for “rtx” with the clock rate of codec and “a=fmtp” to reference the payload type of the primary codec.  “a=rtcp-fb” specified RTCP feedback

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

Audio Block exmaple

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:JDMg
a=ice-pwd:6OARDQ8U/orhtXZbfN+ars37
a=ice-options:trickle
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A
a=setup:actpass
a=mid:0
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3968544080 cname:da0nYe1oYR8AvVNp
a=ssrc:3968544080 msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=ssrc:3968544080 mslabel:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2
a=ssrc:3968544080 label:7525d75c-ffe7-4038-8b71-653d249e63bb

// remove video section for simplicity

Data Block is created if data channle has been created with m= section for data.

“a=sctp-port” line referencing the SCTP port number set to 5000

 “a=max-message-size”  set to 262144 here

Data Block example

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
a=ice-ufrag:JDMg
a=ice-pwd:6OARDQ8U/orhtXZbfN+ars37
a=ice-options:trickle
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A
a=setup:actpass
a=mid:2
a=sctp-port:5000
a=max-message-size:262144

Subsequent Offers

When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processig is different due to gathered ICE candidates . However the <session-version> is not changed .

Additionally m section is updated if RtpTransceiver is added or removed

Each “m=” and c=” line MUST be filled in with the port, relevant RTP profile, and address of the default candidate for the m= section

If the m= section is not bundled into another m= section, update the “a=rtcp” with port and address of RTCP camdidate and add “a=camdidate” with  “a=end-of-candidates” 

Local Answer created by side receiving the session/ Callee

When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer. 

Each offered m= section will have an associated RtpTransceiver

Remote Destination / Callee can reject the m section by setting port in m line to 0 . It can reject msection if neither of the offered media format are supported , RtpTransceiver is stoopped etc.

For the initial offer the dummy port value of 9 is set as no ICE candudate is avaible yet. Simillarly  “c=” line must contain the “dummy” value “IN IP4 0.0.0.0” too.

The <proto> field MUST be set to exactly match the <proto> field for the corresponding m= line in the offer.

type: answer, sdp: v=0
o=- 5730481682283561642 3 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT

Audio section

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:MgKS
a=ice-pwd:X3oTkKO/v7GVgd/CDC3e9B7c
a=ice-options:trickle
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43
a=setup:active
a=mid:0
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT e817fe0f-1cc0-4901-9fd9-e810289cc85d
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3260997313 cname:FxLUKuXrLQe0r1rn

Video section removed for simplicity

Data stream

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
b=AS:30
a=ice-ufrag:MgKS
a=ice-pwd:X3oTkKO/v7GVgd/CDC3e9B7c
a=ice-options:trickle
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43
a=setup:active
a=mid:2
a=sctp-port:5000
a=max-message-size:262144

Subsequent Answers

 Port value would normally be set to the port of the default ICE candidate for this m= section. For the exmaple above

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

will be changes with relevant port adress such as

type: offer, sdp: v=0
o=- 6407282338169184323 3 IN IP4 54.190.54.190
s=-
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS bSrCUCFybGovIy0FUhPTZAr9ToRmx8I09nEj
m=audio 55375 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 54.190.54.190
a=rtcp:9 IN IP4 0.0.0.0
a=candidate:2880323124 1 udp 2122260223 54.190.54.190 55375 typ host generation 0 network-id 1 network-c

Simillarly m video and data line will also get ports

m=video 53877 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4 54.190.54.190
a=rtcp:9 IN IP4 0.0.0.0
a=candidate:2880323124 1 udp 2122260223 54.190.54.190 53877 typ host generation 0 network-id 1 network-cost 10
..
m=application 57991 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 54.190.54.190
a=candidate:2880323124 1 udp 2122260223 54.190.54.190 57991 typ host generation 0 network-id 1 network-cost 10

If the answer contains any “a=ice-options” attributes where “trickle” is listed as an attribute, update the PeerConnection canTrickle property to be true. 

Modifying Offer/answer SDP

SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription. After calling setLocalDescription with an offer or answer, the application MAY modify the SDP to reduce its capabilities before sending it to the far side.

Assume we have a MCU at location and want the video stream to relay via a Media Server.

SDP Parsing

SDP is used for session parsing and contians sequence of line with key value pairs. SDP is read, line-by-line, and converted to a data structure that contains the deserialized information.

JSEP SDP bears a lot of simillarity to SIP SDP explained here : SIP and SDP Messages Explained

Session-Level Parsing

  • Line “v=” , “o=”,”b=” and “a=” are processed . The “i=”, “u=”, “e=”, “p=”, “t=”, “r=”, “z=”, and “k=” lines are not used by this specification; they MUST be checked for syntax but their values are not used. Line “c=” is checked for syntax and ICE mismatch detection
  • “a= ” attribute could be : “a=group” , “s=”ice-lite” , “a=ice-pwd”, “a=ice-options” , “a=fingerprint”, “a=setup” , a=tls-id”, “a=identity” , “a=extmap”

Media Section Parsing

Line “m=” for media , proto , port , fmt in RTP

Attributes “a=” can be :

  • “a=rtpmap” or “a=fmtp” : map from an RTP payload type number to a media encoding name that identifies the payload format.
a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]
m=audio 49230 RTP/AVP 96 97 98
a=rtpmap:96 L8/8000
a=rtpmap:97 L16/8000
a=rtpmap:98 L16/11025/2
  • Packetization parameters as “a=ptime” , “a=maxptime” which define the length of each RTP packet.
  • Direction as  “a=sendrecv” , a=recvonly , a=sendonly , a=inactive
  • Muxing as “a=rtcp-mux” , “a=rtcp-mux-only”
  • RTCP attributes “a=rtcp” , “a=rtcp-rsize”
  • Line “c=” is checked.
  • Line “b=” for bandiwtdh , bwtype
  • Attribites for “a=” could be “a=ice-ufrag”, “a=”ice-pwd”, “a=ice-options” , “a=candidate”, “a=remote-candidate” , a=end-of-candidates” and “a=fingerprint”

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN, also allows for address selection for multi-homed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

ICE Gathering

Caller and callee performs checks to finalize the protocol and routing needed to establish a peer connection . Number of candudates are proposed till they mutually agree upon one . Peerconnection then uses that candiadte detaisl to initiate the connection .

While Applying a Local Description at the media engine level if m= section is new, WebRTC media stacks begins gathering candidates for it.

RTCPeerconnection specified canTrickleIceCandidates. ICE trickling is the process of continuing to send candidates after the initial offer or answer has already been sent to the other peer.

ICE TransportRole is responsible for Choosing a candidate pair.

ICE layer sets one peer as controlling and other as controlled agent. The controling agent makes the final decision as to which candidate pair to choose.

Final selected canduadte in SDP

a=group:BUNDLE 0 1 2
a=msid-semantic: WMS 9Cv3eIelHVuhxrGfxSvUsfokNu4eb4R9PYw2

m=audio 59937 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 x.x.x.x
a=rtcp:9 IN IP4 0.0.0.0
a=candidate:2880323124 1 udp 2122260223 x.x.x.x 59937 typ host generation 0 network-id 1 network-cost 10
a=candidate:3844981444 1 tcp 1518280447 x.x.x.x 9 typ host tcptype active generation 0 network-id 1 network-cost 10

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    • translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    • addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address

Steps for mappling Server Reflexive Address

  1. Agent sends the TURN Allocate request from IP address and port X:x,
  2. NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
  3. Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
  4. Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

Only STUN based Binding

agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

Checks : ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first.

  • TRIGGERED CHECKS – accelerates the process of finding a valid candidate
  • ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

Checks ensure maintaining frozen candidates and pairs with some foundation for media stream. Each candidate pair in the check list has a foundation and a state. States for candidates pairs

1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.

2. In-Progress: A check has been sent for this pair, but the transaction is in progress.

3. Succeeded: A check for this pair was already done and produced a successful result.

4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.

5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 192.168.0.2 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 106.51.26.168 37542 typ srflx raddr 192.168.0.2 rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 192.168.0.2 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 74.125.39.44 27190 typ relay raddr 106.51.26.168 rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 172.217.163.158 28049 typ relay raddr 106.51.26.168 rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:ydvf
a=ice-pwd:mb4ousBoT6B0l//ljjD/9Z/M
a=ice-options:trickle


m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:ydvf
a=ice-pwd:mb4ousBoT6B0l//ljjD/9Z/M
a=ice-options:trickle

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 192.168.0.2 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

Selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement. Controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. There are 2 ways : regular nomination and aggressive nomination.

ReadMore :

WebRTC Media Stack

WebRTC service’s

References :

  • [1] WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019 http://w3c.github.io/webrtc-pc/
  • [2] RFC 5245 Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols

WebRTC compatible android client

This post describes the requirement of creating a SIP phone application on android over the same codecs as WebRTC ( PCMA , PCMU , VP8) . In my project concerning the demonstration of WebRTC inter operability ( presence , audio / video call , message )  with a native android client , I had to develop a lightweight Android SIP application , customized for the look and feel of the webrtc web application . This also enables the added services to WebRTC client such as geolocation , visual voice mail , phonebook , call control options be set from android application as well .

Aim :

Android webrtc- sip client development , using sipml5 stack implemented through web services and native android programming .  

Software Used:

⦁ Eclipse IDE
⦁ Java SE Development Kit 7.0
⦁ Android SDK

Tasks :

⦁ Authorization of a user, based on his/her credentials (Database local to the application).

webrtc_android_2
⦁ Navigation Drawer on the home page which shows a menu giving the user various options like:
⦁ View Home Page
⦁ View Contact List
⦁ View/Edit My Profile
⦁ View My Location
⦁ Sign Out

⦁ Phonebook sync : Importing contact list of the Android Phone into the application. Editing user profile with values like  User Name ,  Password ,  Domain. 

webrtc_android_1
⦁ Inclusion of a Web View in the application which currently opens the desired webpage(http://sipml5.org/call.htm).

⦁ Geolocation: Showing marker for the current location of user in Google Maps.Displaying the address of the user in a Toast Message.

webrtc_android_4

⦁ Audio / Video call capability 

android_webrtc

figure 1 : Login page , figure 2 : Call page , Figure 3 : Menu bar 

Future Roadmap:

⦁ Connecting the application to a database which sits on the cloud.
⦁ Based on the entries in the database the user will be able to:
⦁ Login to the application.
⦁ View or edit his/her details in the My Profile Section.
⦁ Understanding codes of sample applications for making SIP calls from Android OS like:
⦁ SipDroid
⦁ SipDemo
⦁ IMSDroid
⦁ Modifying the existing application to be able to make SIP calls like one of the apps listed above.

Modules :

Development Done:
  1. Development of an authorization page connecting the application to a local database from where values are inserted and retrieved.
  2. Development of navigation drawer where additional options for the application will be displayed making it a user friendly application.
Development Planned:

1.Connectivity to a cloud database.  

2. App engine on cloud.

3. Importing contacts from phone address book .

4. Offine storage of profile details and few call logs .  

Architecture:

webrtc_android_enviornment

……………………………………………………………………………………..

Difference between WebRTC and plugin based communication

A lot of service providers ie telecom operators had deduced their own ways to provide Web based communication even before WebRTC was born . With time , as WebRTC has become stronger , more secure , resilient to failure they have come around to migrate their existing system from previous closed box native APIs to opensource WebRTC APIs.

The first figure ( given below ) depicts a communication platform build over plugins and proprietary APIs using HTTP REST based signaling .

2014-07-22_1212
Web Communication Service Architecture over HTTP/ REST API

As the migration took place the proprietary API components were replaced by Open standard based entities such as plugins were replaced by WebRTC APIs, HTTP REST based signalling was replaced by SIP ( Session Initiation Protocol ) .

Web Communication Service Architecture over WebRTC SIP
Web Communication Service Architecture over WebRTC SIP

Note telecom operator network did not had to face transformation by integration of WebRTC elements .

WebRTC


webrtc draft

WebRTC 1.0: Real-time Communication Between Browsers – W3C Candidate Recommendation 13 December 2019 https://www.w3.org/TR/webrtc/

webrtc_development_logowebrtcdevelopment Open Source WebRTC SDK and its implementation steps https://github.com/altanai/webrtc

Read more in the layers of webrtc  and their functionalities here :  WebRTC layers

What is WebRTC ?

WebRTC (Web Real-Time Communication) is an API definition drafted by the World Wide Web Consortium (W3C) that supports browser-to-browser applications for voice calling, video chat, and P2P file sharing without the need for either internal or external plugins.

  • Enables browser to browser media streaming over secure RTP profile
  • Standardization, on an API level at the W3C and at the protocol level at the IETF.
  • Enables web browsers with Real-Time Communications (RTC) capabilities
  • written in c++ and javascript
  • BSD style license
  • free, open project available in all major browsers 

Media Stack in Browser

The following is the browser side stack for webrtc media .  

WebRTC media stack Solution Architecture
WebRTC Media Stack

Voice Engine

  • iSAC: wideband and super wideband audio codec for streaming audio
  • iLBC: narrowband speech codec for streaming audio
  • Opus: constant and variable bitrate encoding 
  • NetEQ: Net Equalizer
  • Dynamic jitter buffer + error concealment algorithm
  • Acoustic Echo Canceler (AEC) : remove acoustic echo
  • Noise Reduction (NR) : remove background noise

Video engine

  • VideoEngine is a framework video media chain for video, from camera to the network, and from network to the screen.
  • VP8 : Video codec from the WebM Project. Designed for low latency Real time Comm. 
  • Video Jitter Buffer: conceal the effects of jitter and packet loss on overall video quality.
  • Image enhancements : removes video noise 

Transport

  • Transport / Session Layer of WebRTC stack provide Session Management for WebRTC media streams .
  • It consists of network stack for Secure RTP, the Real Time Protocol.
  • STUN/ICE for NAT , Network Address Traversal across various types of networks.
  • Session Management which is an abstracted session layer for call setup.

Standardization by IETF and W3C

As of the 2019 update the W3C defines it as

a set of ECMAScript APIs in WebIDL to allow media to be sent to and received from another browser or device implementing the appropriate set of real-time protocols. The specification being developed in conjunction with a protocol specification developed by the IETF RTCWEB group and an API specification to get access to local media devices.

W3C contribution to WebRTC standardization

w3c

  • Media Stream Functions : API for connecting processing functions to media devices and network connections, including media manipulation functions.
  • Audio Stream Functions : An extension of the Media Stream Functions to process audio streams (e.g. automatic gain control, mute functions and echo cancellation).
  • Video Stream Functions : An extension of the Media Stream Functions to process video streams (e.g. bandwidth limiting, image manipulation or “video mute“).
  • Functional Component : API to query presence of WebRTC components in an implementation, instantiate them and connect them to media streams.
  • P2P Connection Functions : API functions to support establishing signalling protocol-agnostic peer-to-peer connections between Web browsers
  • API specification Availability

WebRTC 1.0: Real-time Communication Between Browsers –  Draft 3 June 2013 available

  • Implementation Library: WebRTC Native APIs

Media Capture and Streams – Draft 16 May 2013

  • Supported by Chrome , Firefox, Opera in desktop of all OS ( Linux, Windows , Mac )
  • Supported by Chrome , Firefox  in Mobile browsers ( android )

IETF contribution to to WebRTC standardization

ietf
  • Communication model
  • Security model
  • Firewall and NAT traversal
  • Media functions
  • Functionality such as media codecs, security algorithms, etc.,
  • Media formats
  • Transport of non media data between clients
  • Input to W3C for APIs development
  • Interworking with legacy VoIP equipment

Open and Free Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

WebRTC uses bare MediaStreamTrack objects for each track being shared from one peer to another. Codecs associated in those tracks is not mandated by webrtc soecification.

For video as per RFC 7742 WebRTC Video Processing and Codec Requirements , the manadatory codesc to be supported by webrtc clients are : VP8 and H.264‘s Constrained Baseline profile.

For Audio as per RFC 7874 WebRTC Audio Codec and Processing Requirements, browser must support Opus codec as well as G.711‘s PCMA and PCMU formats.

Video Resolution handling

Unless the SDP specifically signals otherwise, the web browser receiving a WebRTC video stream must be able to handle video at at least 20 FPS at a minimum resolution of 320 pixels wide by 240 pixels tall.

In the best scenarios ( avaible bandwidth and media devices ) VP8 had no upper mark set on resolution of vdieo stream hence the stream can even go asfar as  maximum resolution of 16384×16384 pixels.

Independant of Signalling 

Webrtc does not specify any signalling / telecommunication protocl and it is upto the adoptor to perform ofeer/answer exchaneg in any way deemed fit for the usecase . For ex maple for a web only application on may use only plain websockets, whereas for a teelcom endpoints compatible app one should SIP as the protocol. 

NAT-traversal ( ICE, STUN, and TURN)

The post describe ICE  (Interactive Connectivity Establishment )  framework which is  mandatory by WebRTC standards.  It is find network interfaces and ports in Offer / Answer Model to exchange network based information with participating communication clients. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN). I have written in detail about TURN based WebRTC flow diagrams in post below.

NAT and TURN Relay

Learn about hosting / integrating different TURN servers for WebRTC in the article on “TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys “.

Why is WebRTC so importatnt ?

(+) Significantly better video qualityWebRTC video quality is noticeably better than Flash.
(+) Up to 6x faster connection timesUsing JavaScript WebSockets, also an HTML5 standard, improves session connection times and accelerates delivery of other OpenTok events.
(+) Reduced audio/video latencyWebRTC offers significant improvements in latency through WebRTC, enabling more natural and effortless conversations.
(+) Freedom from plugins like FlashWith WebRTC and JavaScript WebSockets, you no longer need to rely on Flash for browser-based RTC.
(+) Native HTML5 elementsCustomize the look and feel and work with video like you would any other element on a web page with the new video tag in HTML5.

The major players behind the conception and advancement of WebRTC standards and libraries are IETF, W3C, Java community, GSMA. The idea is to develop a Lightweight browser-based call console, to make SIP calls from a Web page. This was successfully achieved using fundamental technologies – Javascript, html5, web-sockets and TCP /UDP, open-source sip server. It is good to note that there is no extra extension, plugin or gateway required, such as flash support. Also, it bears cross-platform support, including Mozilla, chrome so on.

Bottlnecks

Although WebRTC is a great technology and holds very good potential it is not devoid of problems

(-) Secure networks and Firewalls block RTP
(-) Security in VPN and topology hiding
(-) Cross-platform concerns and codecs incompatible
(-) Late adopters like Microsoft and Apple

 Peer to peer Communication

 WebRTC forms a p2p communication channel between all the peers . that means as the participant count grows  , it converts to  a mesh networking topology with incoming and outgoing stream towards direction of each of its peers .

Two party call p2p

Peer to peer calling

two party call
p2p call

Multiparty Call and mesh network

Mesh based arrangement .

Multiparty party call
Mesh based webrtc video confeerncing

 In special case of broadcasting or  large number of viewers ( without outgoing media stream ) it is recommended to setup a Media Control Unit ( MCU) which will replay the incoming stream to large number of users without putting traffic load on the clients from where the stream is actually originating .   Important note :    

  1. It should be noted that these diagrams do not depict the ICE and NAT traversal and have been simplified for better understanding. In real-world scenarios, almost all the time STUN and TURN servers are involved. 
  2. Also, the webrtc mandates the use of secure origin ( HTTPS ) on the webpage which invoke getusermedia to capture user media devices like audio, video and location.

Browser Adoption

As of March 2020 , webrtc is supported on following client’s browsers

  • Desktop PC
    Microsoft Edge 12+[25]
    Google Chrome 28+
    Mozilla Firefox 22+[26]
    Safari 11+[27]
    Opera 18+[28]
    Vivaldi 1.9+
  • Android
    Google Chrome 28+ (enabled by default since 29)
    Mozilla Firefox 24+[29]
    Opera Mobile 12+
  • Chrome OS
  • Firefox OS
  • BlackBerry 10
  • iOS
    MobileSafari/WebKit (iOS 11+)
  • Tizen 3.0

Furthermore, read about the Steps for building and deploying WebRTC solution.

TURN based media Relay

WebRTC APIs are the Javascript functions to access and process the browser media stack.

getUserMedia

acquires the audio and video media (e.g., by accessing a device’s camera and microphone)

Properties

ondevicechange

Methods

enumerateDevices()
getDisplayMedia()
getSupportedConstraints()
getUserMedia()

navigator.mediaDevices.getUserMedia({ audio: true, video: true })
.then(function(stream) {
  var video = document.querySelector('video');
  // Older browsers may not have srcObject
  if ("srcObject" in video) {
    video.srcObject = stream;
  } else {
    // Avoid using this in new browsers, as it is going away.
    video.src = window.URL.createObjectURL(stream);
  }
  video.onloadedmetadata = function(e) {
    video.play();
  };
})
.catch(function(err) {
  console.log(err.name + ": " + err.message);
});

DOMException Error on getusermedia

Rejections of the returned promise are made by passing a DOMException error object to the promise’s failure handler. Possible errors are:

AbortError : Although the user and operating system both granted access to the hardware device, problem occurred which prevented the device from being used.

NotAllowedError : One or more of the requested source devices cannot be used at this time. This will happen if the browsing context is insecure( http instead of https) or if the user has specified that the current browsing instance /sessionis not permitted access to the device or has denied all access to user media devices globally.

NotFoundError : No media tracks of the type specified were found that satisfy the given constraints.

NotReadableError : Although the user granted permission to use the matching devices, a hardware error occurred at the operating system, browser, or Web page level which prevented access to the device.

OverconstrainedError : no candidate devices which met the criteria requested. String value is the name of a constraint which was not meet, and a message property containing a human-readable string explaining the problem. Exmaple conatraints :

var constraints = { video: { facingMode: (front? "user" : "environment") } };

SecurityError : User media support is disabled on the Document on which getUserMedia() was called.

TypeError : The list of constraints specified is empty, or has all constraints set to false.

Pan/Tilt/Zoom camera controls

RTCPeerConnection

enables audio and video communication between peers. It performs signal processing, codec handling, peer-to-peer communication, security, and bandwidth management.

Properties

canTrickleIceCandidates
connectionState
getDefaultIceServers()
iceConnectionState
iceGatheringState
onsignalingstatechange
onconnectionstatechange
ondatachannel

onicecandidate
oniceconnectionstatechange
onicegatheringstatechange
onidentityresult
onnegotiationneeded
onremovestream onaddstream
ontrack

peerIdentity currentLocalDescription
currentRemoteDescription
pendingLocalDescription
pendingRemoteDescription
localDescription remoteDescription
sctp
signalingState

Methods

addIceCandidate()
addStream()
addTrack()
close()
createAnswer()
createDataChannel()
createOffer()

getIdentityAssertion()
getReceivers()
getSenders()
getStats()
getStreamById()
getTransceivers()
removeStream() removeTrack()

restartIce()
setConfiguration()
setIdentityProvider()
setLocalDescription()
setRemoteDescription() generateCertificate()
getConfiguration()

 signalling state transitions diagram , source W3C

RTC Signalling states

  • stable : There is no offer/answer exchange in progress. This is also the initial state, in which case the local and remote descriptions are empty.
  • have-local-offer : Local description, of type “offer”, has been successfully applied.
  • have-remote-offer : Remote description, of type “offer”, has been successfully applied.
  • have-local-pranswer : Remote description of type “offer” has been successfully applied and a local description of type “pranswer” has been successfully applied.
  • have-remote-pranswer : Local description of type “offer” has been successfully applied and a remote description of type “pranswer” has been successfully applied.
    closed The RTCPeerConnection has been closed; its [[IsClosed]] slot is true.

RTCSDPType

  • offer : SDP offer.
  • pranswer : RTCSdpType of pranswer indicates that a description MUST be treated as an [SDP] answer, but not a final answer.
  • answer : treated as an [SDP] final answer, and the offer-answer exchange MUST be considered complete. A description used as an SDP answer may be applied as a response to an SDP offer or as an update to a previously sent SDP pranswer.
  • rollback : canceling the current SDP negotiation and moving the SDP [SDP] offer back to what it was in the previous stable state.

RTCPeerConfiguration

Defines a set of parameters to configure how the peer-to-peer communication established via RTCPeerConnection

iceServers of type sequence : array of objects describing servers available to be used by ICE, such as STUN and TURN servers.

iceTransportPolicy of type RTCIceTransportPolicy : bundle policy affects which media tracks are negotiated if the remote endpoint is not bundle-aware, and what ICE candidates are gathered. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

  • relay : ICE Agent uses only media relay candidates such as candidates passing through a TURN server.
  • all : The ICE Agent can use any type of candidate when this value is specified.

bundlePolicy of type RTCBundlePolicy.
media-bundling policy to use when gathering ICE candidates. Types :

  • balanced : Gather ICE candidates for each media type in use (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
  • max-compat : Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
  • max-bundle : Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track.

rtcpMuxPolicy of type RTCRtcpMuxPolicy.
rtcp-mux policy to use when gathering ICE candidates.

certificates of type sequence
A set of certificates that the RTCPeerConnection uses to authenticate.

iceCandidatePoolSize of type octet, defaulting to 0
Size of the prefetched ICE pool as defined in [JSEP]

RTCDataChannel

Allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.

  • (+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
  • (+) build in security due to p2p transfer
  • (+) high throughput than text transfer via a messaging server
  • (+) lower latency as p2p transfer takes shortest route

getStats

allows the web application to retrieve a set of statistics about WebRTC sessions. These statistics data are being described in a separate W3C document.

Call Setup betweeb WebRTC Endpoints

WebRTC CPaaS Solutions

Basics for building a WebRTC based communication solution :-

  • Websockets for signalling / Offer Answer
  • TURN server like xirsys(paid), CoTURN(opensource , self hosted)
  • Js library for WebRTC wrappers
  • Https served webpage
  • WebRTC enabled Browser
two party chat.png

Approaches to develop webrtc unified communication system

1. Pluggable module or npm

Source code for the WebRTC project is shipped as a pluggable library or npm module.

2. collaboration as a Service ie CaaS

Clients redirect users to our WebRTC platform for communication.

3. Communication Platform

We provider all communication and related Services as a standalone platform

Updates in W3C 13 Dec , 2019

Over the years since its adoption many of the associated tech were depricated from the Webrtc based platforms and enviornments , some of which are: OAuth as a credential method for ICE servers
Negotiated RTCRtcpMuxPolicy (previously marked at risk)
voiceActivityDetection
RTCCertificate.getSupportedAlgorithms()
RTCRtpEncodingParameters: ptime, maxFrameRate, codecPayloadType, dtx, degradationPreference
RTCRtpDecodingParameters: encodings
RTCDatachannel.priority

Some of the newly added features include:

restartIce() method added to RTCPeerConnection
Introduced the concept of “perfect negotiation”, with an example to solve signalling races.
Implicit rollback in setRemoteDescription to solve races.
Implicit offer/answer creation in setLocalDescription to solve races.

References :

WebRTC Stack Architecture and Layers


WebRTC stands for Web Real-Time Communications and introduces a real-time media framework in the browser core alongside associated JavaScript APIs for controlling the media frame and HTML5 tags for displaying. If you are new to WebRTC, read “What is WebRTC?” From a technical point of view, WebRTC will hide all the complexity of real-time media behind a very simple JavaScript API. 

WebRTC Layers

Missing Signalling

Webrtc is a media framework which is independant of signalling protocol which means that we can plug any form of signalling to support session establishment using offer-answer handshake and SDP. Some of the popular options

  • Polling
  • XHR ( XML over HTTP Request)
  • Websocket ( HTTP upgraded )
  • SSE ( Server Sent Events )
  • socket.io ( use set of protocols for best compatibility and fallback)
  • HTTP/2

Other form of less used signalling options

  • FTP
  • HTTP
  • long poll
  • XMPP
  • MQTT

One may also send the SDP for local and remote over any other means of communication mechanism such as email, REST API or any custom propriatory protocol.

Security in WebRTC

SSL is the secure session layer which adds encryption capability to an otherwise readable packet.

  • DTLS (Datagram TLS) adds Security on UDP packets which is used by Media stream and Data Channel messages.
  • TLS ( Tansport Layer Security) adds security to TCP messahes used in signalling such as SDP based offer answer handshake which enables setup, modification or breakdown of the session.

WebRTC Stack

WebRTC offers web application developers the ability to write rich, realtime multimedia applications (think video chat) on the web, without requiring plugins, downloads or installs. It’s purpose is to help build a strong RTC platform that works across multiple web browsers, across multiple platforms.

WebRTCpublicdiagramforwebsite

Web API – An API to be used by third-party developers for developing web-based video chat-like applications.

WebRTC Native C++ API – An API layer that enables browser makers to easily implement the Web API proposal

Transport / Session – The session components are built by re-using components from libjingle, without using or requiring the XMPP/jingle protocol.

RTP Stack – A network stack for RTP, the Real-Time Protocol.

Session Management – An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.

Voice Engine

VoiceEngine is a framework for the audio media chain, from sound card to the network.

NetEQ for Voice– A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

Acoustic Echo Canceler (AEC) – The Acoustic Echo Canceler is a software-based signal processing component that removes, in real-time, the acoustic echo resulting from the voice being played out coming into the active microphone.

Noise Reduction (NR) -The Noise Reduction component is a software-based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)

Video Engine

VideoEngine is a framework video media chain for video, from the camera to the network, and from network to the screen.

Video Jitter Buffer – Dynamic Jitter Buffer for video. Helps conceal the effects of jitter and packet loss on overall video quality.

Image enhancements -For example, removes video noise from the image capture by the webcam.

Transport

STUN/ICE – A component allowing calls to use the STUN and ICE mechanisms to establish connections across various types of networks.

Error resilinency and fault tolerance

  • REMB (receiver-side bandwidth estimation) is more common and transport-wide-cc (sender-side bandwidth estimation) is the more modern and future looking approach
  • BWE (Bandwidth Estimation )
  • FEC (Forward Error Correction) and ULPFEC (Uneven Level Protection Forward Error Correction)
    RED (Redundant coding)
    FIR (Full Intra Request)
    PLI (Picture Loss Indication) for video
  • PLC (Packet Loss Concealment) mostly for audio
  • NACK (Negative Acknowledgement)

API support from browser around WebRTC

  • PeerConnection
  • getUserMedia and getDisplayMedia
  • dataChannels
  • getStats
  • MediaRecorder
  • MediaStream / Media Tracks
  • MediaConstraints
  • WebAudio Integration
  • TURN support
  • Echo cancellation
  • srcObject in media element
  • Promise based getUserMedia and PeerConnection

JavaScript Session Establishment Protocol (JSEP)

Outgoing Call : Send Offer to remote peer

JSEP flow for Outgoing Call : Send Offer to remote peer and process incoming answer

Incoming Call : process received offer from remote peer

JSEP flow for Incoming Call : process received offer from remote peer

WebRTC supported Codecs

RTCRtpEncodingParameters

RTCRtpEncodingParameters dictionary describes a single configuration of a codec for an RTCRtpSender.

  • active : flag to set if encoding is currently actively being used.
    codecPayloadType : single 8-bit byte (or octet) specifying the codec to use for sending the stream.
  • dtx : used for audio to indicate if discontinuous transmission (a feature by which a phone is turned off or the microphone muted automatically in the absence of voice activity)
  • maxBitrate : (unsigned long integer) maximum number of bits per second to allow for this encoding.
  • maxFramerate : (double-precision floating-point) maximum number of frames per second to allow for this encoding.
  • ptime: (unsigned long integer) preferred duration of a media packet in milliseconds used in audio encodings.
  • rid : (DOMString) if set, specifies an RTP stream ID (RID) to be sent using the RID header extension.
  • scaleResolutionDownBy :(double-precision floating-point) specifying a factor by which to scale down the video during encoding.
    • default value, 1.0 if video’s size will be the same as the original.
    • 2.0 scales the video frames down by a factor of 2 in each dimension, resulting in a video 1/4 the size of the original.
    • can’t use this to scale the video up

Video Codecs

  • VP8  Video codec from the WebM Project. Well suited for RTC as it is designed for low latency. It was the codec of choice being royalty-free.
  • VP9
  • H264 : not royalty-free. But it is native in most mobile handsets due to its high adoption.
  • AV1

Audio Codecs

  • iSAC: A wideband and super wideband audio codec for VoIP and streaming audio.
  • iLBC: A narrowband speech codec for VoIP and streaming audio.
  • Opus : lossy audio codec for broad range of interactive real-time applications licensed under royalty-free BSD terms.
  • G.711
  • Speex
  • G.722
  • AMR-WB

update 2020 – This article was written very early in 2013 while WebRTC was being standardised and not as widely adopted since the inception of WebRTC began in 2012.

There are many more articles written after that to explain and emphasize the detailing and application of WebRTC. List of these is below :

For SIP IMS and WebRTC

STUN and TURN which form a crtical part of any webrtc based communication system

Security of WebRTC based CaaS and CPaaS

WebRTC SDK

Developing WebRTC / CPaaS Solution

References

  • [1] Working Group RFC
    • draft-ietf-rtcweb-audio-02      2013-08-02
    • draft-ietf-rtcweb-data-channel-05      2013-07-15
    • draft-ietf-rtcweb-data-protocol-00     2013-07-15
    • draft-ietf-rtcweb-jsep-03      2013-02-27
    • draft-ietf-rtcweb-overview-07      2013-08-14
    • draft-ietf-rtcweb-rtp-usage-07     2013-07-15
    • draft-ietf-rtcweb-security-05      2013-07-15
    • draft-ietf-rtcweb-security-arch-07      2013-07-15
    • draft-ietf-rtcweb-transports-00      2013-08-19
    • draft-ietf-rtcweb-use-cases-and-reqs-11 2013-06-27
    • Plus over 20 discussion RFC drafts
  • TLS
  • HTTP/2 offer answer

 WEBRTC CALL BETWEEN BROWSER AND SIP PHONE

Call Between Web client and SIP client

  1. HTML5 and WebRTC enabled Web Client :

We are using open source HTML5 SIP client entirely written in javascript to make it light and to have easy integration with the SIP server. No extension, plugin or gateway is needed to initiate the call from the web Client. The media stack rely on WebRTC. The client can be used to connect to any SIP or IMS network from HTML5 and WebRTC enabled browser to make and receive audio/video calls and instant messages.

  1.  Proxy Server / WS to UDP Translator :

For the Proposed Solution we are proposing the Freeware light SIP – Server which besides acting like the normal Sip Server and Registrar can also act like the Translator Engine to convert the SIP over WS message to SIP over UDP. As one of the requirement we need to terminate the call on the hard-phone like Turret which supports only SIP over UDP we need to have the translator in the overall picture which can convert the SIP over WS request to SIP over UDP. Through this component the use case like initiating the call from the web Browser the terminating the call at the Hard-phone is possible.

  1. Soft Phone/ SIP  Client :

We are using the Boghe IMS client to act like the Soft phone which supports the Audio Codec required to talk with web Client like PCMU And PCMA audio Codec.

Working on the discussed Components we have successfully established the following Use- Case Scenario.

  1. Call Initiated from the Browser and Terminated on Browser :

(a)   Signalling Part – Initial Handshake is done and Call is established. (Captured from Wire-Shark)

(b)   Media Part – SDP is being exchanged as capture by Wire-shark and both the client can exchange Voice.

  1. Call initiated from the Browser and Terminated on the Softphone and Vice-Vera :

(a)    Signalling Part – Initial Handshake is done and Call is established. (Captured from Wire-Shark)

(b)   Media Part – SDP is being exchanged as capture by Wire-shark and both the client can exchange Voice but have some dependency on machine being used.

  1. Call initiated from the Softphone and Terminating on SoftPhone.

(a)   Signalling Part : Initial Handshake is done and Call is established. (Captured from Wire-Shark)

(b)   Media Part : No hiccup its working fine.


The structure for multi network traversal using ICE – STUN and TURN is described in the following diagram .

Call Between Web client and SIP client (1)

You can read more about NAT traversal using STUN and TURN here .

Detailed TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys is here .