WebRTC SIP / IMS solution

We started in winters on 2012 with Webrtc . At time time it just looked like a new tech jargon that might fade away when new ones comes . In many many WebRTC’s buzz has died down since its massive adoption. But i nevertheless still see a lot of potential and development around it.

What really is WebRTC ? I made an entry on it  here .

Around nov – dec 2012 , team and I spend the time learning the nitty-grities of HTML5 based media operation and Javascript sip stack of SIPML. I remember toward the end of the year ie before Christmas , We were done with the explanation and education aspects of WebRTC , a technology that will revolutionise communication in ages to come , at-least so says the numerous other blogs ,  and documents i read so far .


Usecases for WebRTC range across a wide variety , of them the most revenue generating ones are around video conferencing with realtime HD audio-video-data streams ,

To bridge the flow between a webrtc client to a PSTN endpoint via IMS , interworking between webrtc media standards and codecs with that of gateways in IMS is critical . For instance WebRTC mandates secure RTP ( SRTP) the media engine / gateway should be able to support and connect with RTP from PSTN endpoints.

client BOB -> webrtc2sip Gateway -> SIP server -> client Alice

can be  understood with the callflow of a simple SIP Invite initiated from one html page towards another which passes through the configuration of gateway to IMS world ,  SIP Telecom Application server , Database , nodes of IMS environment etc.

For the purpose of a simple Explanation a simplified call flow ca be depicted as ,

webrtccallflow

A very high level architecture of solution deployment in IMS world could be

solution arch2

As the solution matures into a full fleshed project . The alpha version has been released with the following feature set . The WebRTC platform Suite offers a easily deploy-able solution to enable communication

Alpha Release WebRTC platform Suite

  • Single Sign On
  • Login with id and password to access all services
  • Audio / Video Call
    • Call Hold / Call Transfer
  • Messaging:
    • SIP Instant Messaging
    • Message to Facebook Messenger
    • Message delivered as Email
  • Chatroom
    • group chat between multiple users . Room is created for set of users .
  • Video Conferencing
    • video chat between multiple parties . Room is created for set of users .
  • File Transfer
    • Sharing of files from local to remote , in peer-to-peer and broadcasting fashion .
  • Third party Webservices
    • Widgets like calendar , weather , stocks , twitter are embedded.
  • Visual Voice Mail
    • Record and deliver voice message to recipients voice mail inbox which can be accessed/ played from web client .
  • Phonebook
    • cloud integration
    • add new entries
    • add photos to contacts identity
    • import contacts from google account
  • Click to Call :
    • Drop down list of contacts form mail call console
    • 2 step Click to call from Phonebook
  • Presence :
    • Publish online / offline status
    • Use Subscribe / notify requests of SIP
  • Web Ssocket to SIP Gateway
    • Conversion between the signal coming from the WebRTC and SIP client to the IMS core
    • Conversion of “voice/video ” media between sRTP and RTP
    • Conversion of other media (data channel) towards MSRP and Transcoding.
    • Support of ICE procedure
    • Implementation of a STUN server
  • QoS Support

Beta Release WEBRTC PLATFORM SUITE

  • Logs
    • calls logs
    • Message logs
  • User Profile
    • user details like address , email and social networking accounts
    • Phonenumber for GSM integration through SMS
    • User’s Media storage like Pictures , profile picture , Audio , video
    • File sharing documents storage for future access in the same format
  • Real Time and Offline Analytics
  • service usage with graphical and tabular history trends
  • Session Management
    • Single Sign-on
    • Forgot password regeneration using secure question
    • Registration of new user account
    • Logout and clearance of session parameters
  • Security
    • No redirection to any page through url entry without valid session
    • No going back to home page after logout by back button on browser
    • No data vulnerability
    • Multiple login through different devices handled
  • OAuth
    • Login via IMAP / token through facebook and Google
  • Phonebook with Presence functionality inbuilt
  • Directory Service based on country / region
  • Geolocation of approximate location detection of device logged in and visibility to others
webrtc solution
WebRTC client deployment view , accessible devices , network elements
WebRTC deploymenet overview and inetraction with other network elemets such as gateway , cloud storage ,  sipserver , IMS
WebRTC deploymenet overview and inetraction with other network elemets such as gateway , cloud storage , sipserver , IMS

Commercial release features specs for WebRTC over IMS

  • Integration with new age CSP deployments like VoLTE, ViLTE, VoWiFi
  • Multi vendor support
  • Interactive webrtc services
  • Media Services
    • Automated Natural language Speech recognition
    • Semantic processing via ML
    • Enhanced incall services replacing IVR ( touch -tone)
    • VQE (voice Quality Enhancements)
    • Encoding and Decoding – Multiple Codec Support
    • Transcoding
    • Silence Suppression
  • Security via TLS, encryption and AAA
  • Http, NFS caching
  • NAT using Xirsys TURN
  • Recording, playback and media file compression
  • active frame selection
  • DTMF (Dual Tone Multi Frequency)
    • SIP info messages (out-of-band)
    • SIP notify messages (out-of-band)
    • Inband DTMF not supported yet
  • Audio
    • mixing
    • announcements ( VXML, MSML )
    • filters
    • gain control ( AGC using webrtc stack)
    • noise suppresesion ( webrtc stack)
    • speakers notification
    • Narrowband, Wideband, and Super Wideband
    • dynamic sample rate
  • Video
    • continuous presence ( Face detetion )
    • floor control
    • video lipsync (sync)
    • speaker tile selection
  • VQE (Voice Quality Enhancement )
    • Acoustic Echo Cancelation
    • noise reduction
    • noise line detection
    • noise gating
    • Packet Loss concealment
  • Call analyics
    • progress analysis
    • MOS , R-factor ( derived from latency , jitter , packet loss )
  • CDR (Call detail records ) and accounting
  • Lawful interception

Updating this article 2019

There was a long journey from traditional telecom architectures to NFV cloud based architectures ( like openstack). supported over web , 4G , LTE or other upcoming networks. Many OTT providers prefer using the public cloud over a NFV data centre.

Multinode / Multiedge computing platforms like Media Resource Function are expected to meet the need for quick delivery with additional features like hardware accelerated media , algorithms for optimised data flow (packetization, decongesting , security ) etc . With th decomposed architecture they can better utilise the

  • CPU – contains couple of cores optimised for sequential serial processing such as   graphics or video processing
  • GPU – contains many smaller cores to accelerate creation of images for computer display . Can include texture mapping, image rotation, translation, shading or more enhanced features like motion compensation, calculation of inverse DCT, etc. for accelerated video decoding.
  • DSP- processing data representing analog signals

Although IMS based solutions are more suited to telephony applications and CSPs ( Communication service providers like telecom companies ) but similar or same architectures are widely finding their into newer developed cloud communications solutions supporting tens of millions of subscribers and hyper scale deployment . It could be around applications such as

  • HD (High Definition ) calls
  • UCC ( conf , draw-board, speech recognition , realtime streaming)
  • immersive experiences ( Augmented reality , virtual reality , face recognition , tracking )
  • contextual communication ( transcription etc)
  • video content delivery with deep media analytics

Demand these says is for a decentralised system of pool of servers ( media and signalling ) that can scale independently to match up to peak traffic at any moment , with ofcourse carrier class performance . Not only these flexible solutions reduce complexity but also OpEX .

Ref:

Unified Communicator and Collaborator for Enterprise

Modular enterprise communicator solution for enterprise based communication and collaboration . Use sipml5 client side library to provide webRTC based media stream capture and propagation from client side without external plugins.

Github Repo – https://github.com/altanai/unifiedCommunicator

Unified Communications and Collaborations ( UC&C ) – https://telecom.altanai.com/2013/07/12/unified-communication/

WebRTC


webrtc draft

WebRTC 1.0: Real-time Communication Between Browsers – W3C Candidate Recommendation 13 December 2019 https://www.w3.org/TR/webrtc/

webrtc_development_logowebrtcdevelopment Open Source WebRTC SDK and its implementation steps https://github.com/altanai/webrtc

Read more in the layers of webrtc  and their functionalities here :  WebRTC layers

What is WebRTC ?

WebRTC (Web Real-Time Communication) is an API definition drafted by the World Wide Web Consortium (W3C) that supports browser-to-browser applications for voice calling, video chat, and P2P file sharing without the need for either internal or external plugins.

  • Enables browser to browser media streaming over secure RTP profile
  • Standardization, on an API level at the W3C and at the protocol level at the IETF.
  • Enables web browsers with Real-Time Communications (RTC) capabilities
  • written in c++ and javascript
  • BSD style license
  • free, open project available in all major browsers 

Media Stack in Browser

The following is the browser side stack for webrtc media .  

WebRTC media stack Solution Architecture
WebRTC Media Stack

Voice Engine

  • iSAC: wideband and super wideband audio codec for streaming audio
  • iLBC: narrowband speech codec for streaming audio
  • Opus: constant and variable bitrate encoding 
  • NetEQ: Net Equalizer
  • Dynamic jitter buffer + error concealment algorithm
  • Acoustic Echo Canceler (AEC) : remove acoustic echo
  • Noise Reduction (NR) : remove background noise

Video engine

  • VideoEngine is a framework video media chain for video, from camera to the network, and from network to the screen.
  • VP8 : Video codec from the WebM Project. Designed for low latency Real time Comm. 
  • Video Jitter Buffer: conceal the effects of jitter and packet loss on overall video quality.
  • Image enhancements : removes video noise 

Transport

  • Transport / Session Layer of WebRTC stack provide Session Management for WebRTC media streams .
  • It consists of network stack for Secure RTP, the Real Time Protocol.
  • STUN/ICE for NAT , Network Address Traversal across various types of networks.
  • Session Management which is an abstracted session layer for call setup.

Standardization by IETF and W3C

As of the 2019 update the W3C defines it as

a set of ECMAScript APIs in WebIDL to allow media to be sent to and received from another browser or device implementing the appropriate set of real-time protocols. The specification being developed in conjunction with a protocol specification developed by the IETF RTCWEB group and an API specification to get access to local media devices.

W3C contribution to WebRTC standardization

w3c

  • Media Stream Functions : API for connecting processing functions to media devices and network connections, including media manipulation functions.
  • Audio Stream Functions : An extension of the Media Stream Functions to process audio streams (e.g. automatic gain control, mute functions and echo cancellation).
  • Video Stream Functions : An extension of the Media Stream Functions to process video streams (e.g. bandwidth limiting, image manipulation or “video mute“).
  • Functional Component : API to query presence of WebRTC components in an implementation, instantiate them and connect them to media streams.
  • P2P Connection Functions : API functions to support establishing signalling protocol-agnostic peer-to-peer connections between Web browsers
  • API specification Availability

WebRTC 1.0: Real-time Communication Between Browsers –  Draft 3 June 2013 available

  • Implementation Library: WebRTC Native APIs

Media Capture and Streams – Draft 16 May 2013

  • Supported by Chrome , Firefox, Opera in desktop of all OS ( Linux, Windows , Mac )
  • Supported by Chrome , Firefox  in Mobile browsers ( android )

IETF contribution to to WebRTC standardization

ietf
  • Communication model
  • Security model
  • Firewall and NAT traversal
  • Media functions
  • Functionality such as media codecs, security algorithms, etc.,
  • Media formats
  • Transport of non media data between clients
  • Input to W3C for APIs development
  • Interworking with legacy VoIP equipment

Open and Free Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

WebRTC uses bare MediaStreamTrack objects for each track being shared from one peer to another. Codecs associated in those tracks is not mandated by webrtc soecification.

For video as per RFC 7742 WebRTC Video Processing and Codec Requirements , the manadatory codesc to be supported by webrtc clients are : VP8 and H.264‘s Constrained Baseline profile.

For Audio as per RFC 7874 WebRTC Audio Codec and Processing Requirements, browser must support Opus codec as well as G.711‘s PCMA and PCMU formats.

Video Resolution handling

Unless the SDP specifically signals otherwise, the web browser receiving a WebRTC video stream must be able to handle video at at least 20 FPS at a minimum resolution of 320 pixels wide by 240 pixels tall.

In the best scenarios ( avaible bandwidth and media devices ) VP8 had no upper mark set on resolution of vdieo stream hence the stream can even go asfar as  maximum resolution of 16384×16384 pixels.

Independant of Signalling 

Webrtc does not specify any signalling / telecommunication protocl and it is upto the adoptor to perform ofeer/answer exchaneg in any way deemed fit for the usecase . For ex maple for a web only application on may use only plain websockets, whereas for a teelcom endpoints compatible app one should SIP as the protocol. 

NAT-traversal ( ICE, STUN, and TURN)

The post describe ICE  (Interactive Connectivity Establishment )  framework which is  mandatory by WebRTC standards.  It is find network interfaces and ports in Offer / Answer Model to exchange network based information with participating communication clients. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN). I have written in detail about TURN based WebRTC flow diagrams in post below.

NAT and TURN Relay

Learn about hosting / integrating different TURN servers for WebRTC in the article on “TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys “.

Why is WebRTC so importatnt ?

(+) Significantly better video qualityWebRTC video quality is noticeably better than Flash.
(+) Up to 6x faster connection timesUsing JavaScript WebSockets, also an HTML5 standard, improves session connection times and accelerates delivery of other OpenTok events.
(+) Reduced audio/video latencyWebRTC offers significant improvements in latency through WebRTC, enabling more natural and effortless conversations.
(+) Freedom from plugins like FlashWith WebRTC and JavaScript WebSockets, you no longer need to rely on Flash for browser-based RTC.
(+) Native HTML5 elementsCustomize the look and feel and work with video like you would any other element on a web page with the new video tag in HTML5.

The major players behind the conception and advancement of WebRTC standards and libraries are IETF, W3C, Java community, GSMA. The idea is to develop a Lightweight browser-based call console, to make SIP calls from a Web page. This was successfully achieved using fundamental technologies – Javascript, html5, web-sockets and TCP /UDP, open-source sip server. It is good to note that there is no extra extension, plugin or gateway required, such as flash support. Also, it bears cross-platform support, including Mozilla, chrome so on.

Bottlnecks

Although WebRTC is a great technology and holds very good potential it is not devoid of problems

(-) Secure networks and Firewalls block RTP
(-) Security in VPN and topology hiding
(-) Cross-platform concerns and codecs incompatible
(-) Late adopters like Microsoft and Apple

 Peer to peer Communication

 WebRTC forms a p2p communication channel between all the peers . that means as the participant count grows  , it converts to  a mesh networking topology with incoming and outgoing stream towards direction of each of its peers .

Two party call p2p

Peer to peer calling

two party call
p2p call

Multiparty Call and mesh network

Mesh based arrangement .

Multiparty party call
Mesh based webrtc video confeerncing

 In special case of broadcasting or  large number of viewers ( without outgoing media stream ) it is recommended to setup a Media Control Unit ( MCU) which will replay the incoming stream to large number of users without putting traffic load on the clients from where the stream is actually originating .   Important note :    

  1. It should be noted that these diagrams do not depict the ICE and NAT traversal and have been simplified for better understanding. In real-world scenarios, almost all the time STUN and TURN servers are involved. 
  2. Also, the webrtc mandates the use of secure origin ( HTTPS ) on the webpage which invoke getusermedia to capture user media devices like audio, video and location.

Browser Adoption

As of March 2020 , webrtc is supported on following client’s browsers

  • Desktop PC
    Microsoft Edge 12+[25]
    Google Chrome 28+
    Mozilla Firefox 22+[26]
    Safari 11+[27]
    Opera 18+[28]
    Vivaldi 1.9+
  • Android
    Google Chrome 28+ (enabled by default since 29)
    Mozilla Firefox 24+[29]
    Opera Mobile 12+
  • Chrome OS
  • Firefox OS
  • BlackBerry 10
  • iOS
    MobileSafari/WebKit (iOS 11+)
  • Tizen 3.0

Furthermore, read about the Steps for building and deploying WebRTC solution.

TURN based media Relay

WebRTC APIs are the Javascript functions to access and process the browser media stack.

getUserMedia

acquires the audio and video media (e.g., by accessing a device’s camera and microphone)

Properties

ondevicechange

Methods

enumerateDevices()
getDisplayMedia()
getSupportedConstraints()
getUserMedia()

navigator.mediaDevices.getUserMedia({ audio: true, video: true })
.then(function(stream) {
  var video = document.querySelector('video');
  // Older browsers may not have srcObject
  if ("srcObject" in video) {
    video.srcObject = stream;
  } else {
    // Avoid using this in new browsers, as it is going away.
    video.src = window.URL.createObjectURL(stream);
  }
  video.onloadedmetadata = function(e) {
    video.play();
  };
})
.catch(function(err) {
  console.log(err.name + ": " + err.message);
});

DOMException Error on getusermedia

Rejections of the returned promise are made by passing a DOMException error object to the promise’s failure handler. Possible errors are:

AbortError : Although the user and operating system both granted access to the hardware device, problem occurred which prevented the device from being used.

NotAllowedError : One or more of the requested source devices cannot be used at this time. This will happen if the browsing context is insecure( http instead of https) or if the user has specified that the current browsing instance /sessionis not permitted access to the device or has denied all access to user media devices globally.

NotFoundError : No media tracks of the type specified were found that satisfy the given constraints.

NotReadableError : Although the user granted permission to use the matching devices, a hardware error occurred at the operating system, browser, or Web page level which prevented access to the device.

OverconstrainedError : no candidate devices which met the criteria requested. String value is the name of a constraint which was not meet, and a message property containing a human-readable string explaining the problem. Exmaple conatraints :

var constraints = { video: { facingMode: (front? "user" : "environment") } };

SecurityError : User media support is disabled on the Document on which getUserMedia() was called.

TypeError : The list of constraints specified is empty, or has all constraints set to false.

Pan/Tilt/Zoom camera controls

RTCPeerConnection

enables audio and video communication between peers. It performs signal processing, codec handling, peer-to-peer communication, security, and bandwidth management.

Properties

canTrickleIceCandidates
connectionState
getDefaultIceServers()
iceConnectionState
iceGatheringState
onsignalingstatechange
onconnectionstatechange
ondatachannel

onicecandidate
oniceconnectionstatechange
onicegatheringstatechange
onidentityresult
onnegotiationneeded
onremovestream onaddstream
ontrack

peerIdentity currentLocalDescription
currentRemoteDescription
pendingLocalDescription
pendingRemoteDescription
localDescription remoteDescription
sctp
signalingState

Methods

addIceCandidate()
addStream()
addTrack()
close()
createAnswer()
createDataChannel()
createOffer()

getIdentityAssertion()
getReceivers()
getSenders()
getStats()
getStreamById()
getTransceivers()
removeStream() removeTrack()

restartIce()
setConfiguration()
setIdentityProvider()
setLocalDescription()
setRemoteDescription() generateCertificate()
getConfiguration()

 signalling state transitions diagram , source W3C

RTC Signalling states

  • stable : There is no offer/answer exchange in progress. This is also the initial state, in which case the local and remote descriptions are empty.
  • have-local-offer : Local description, of type “offer”, has been successfully applied.
  • have-remote-offer : Remote description, of type “offer”, has been successfully applied.
  • have-local-pranswer : Remote description of type “offer” has been successfully applied and a local description of type “pranswer” has been successfully applied.
  • have-remote-pranswer : Local description of type “offer” has been successfully applied and a remote description of type “pranswer” has been successfully applied.
    closed The RTCPeerConnection has been closed; its [[IsClosed]] slot is true.

RTCSDPType

  • offer : SDP offer.
  • pranswer : RTCSdpType of pranswer indicates that a description MUST be treated as an [SDP] answer, but not a final answer.
  • answer : treated as an [SDP] final answer, and the offer-answer exchange MUST be considered complete. A description used as an SDP answer may be applied as a response to an SDP offer or as an update to a previously sent SDP pranswer.
  • rollback : canceling the current SDP negotiation and moving the SDP [SDP] offer back to what it was in the previous stable state.

RTCPeerConfiguration

Defines a set of parameters to configure how the peer-to-peer communication established via RTCPeerConnection

iceServers of type sequence : array of objects describing servers available to be used by ICE, such as STUN and TURN servers.

iceTransportPolicy of type RTCIceTransportPolicy : bundle policy affects which media tracks are negotiated if the remote endpoint is not bundle-aware, and what ICE candidates are gathered. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

  • relay : ICE Agent uses only media relay candidates such as candidates passing through a TURN server.
  • all : The ICE Agent can use any type of candidate when this value is specified.

bundlePolicy of type RTCBundlePolicy.
media-bundling policy to use when gathering ICE candidates. Types :

  • balanced : Gather ICE candidates for each media type in use (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
  • max-compat : Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
  • max-bundle : Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track.

rtcpMuxPolicy of type RTCRtcpMuxPolicy.
rtcp-mux policy to use when gathering ICE candidates.

certificates of type sequence
A set of certificates that the RTCPeerConnection uses to authenticate.

iceCandidatePoolSize of type octet, defaulting to 0
Size of the prefetched ICE pool as defined in [JSEP]

RTCDataChannel

Allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.

  • (+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
  • (+) build in security due to p2p transfer
  • (+) high throughput than text transfer via a messaging server
  • (+) lower latency as p2p transfer takes shortest route

getStats

allows the web application to retrieve a set of statistics about WebRTC sessions. These statistics data are being described in a separate W3C document.

Call Setup betweeb WebRTC Endpoints

WebRTC CPaaS Solutions

Basics for building a WebRTC based communication solution :-

  • Websockets for signalling / Offer Answer
  • TURN server like xirsys(paid), CoTURN(opensource , self hosted)
  • Js library for WebRTC wrappers
  • Https served webpage
  • WebRTC enabled Browser
two party chat.png

Approaches to develop webrtc unified communication system

1. Pluggable module or npm

Source code for the WebRTC project is shipped as a pluggable library or npm module.

2. collaboration as a Service ie CaaS

Clients redirect users to our WebRTC platform for communication.

3. Communication Platform

We provider all communication and related Services as a standalone platform

Updates in W3C 13 Dec , 2019

Over the years since its adoption many of the associated tech were depricated from the Webrtc based platforms and enviornments , some of which are: OAuth as a credential method for ICE servers
Negotiated RTCRtcpMuxPolicy (previously marked at risk)
voiceActivityDetection
RTCCertificate.getSupportedAlgorithms()
RTCRtpEncodingParameters: ptime, maxFrameRate, codecPayloadType, dtx, degradationPreference
RTCRtpDecodingParameters: encodings
RTCDatachannel.priority

Some of the newly added features include:

restartIce() method added to RTCPeerConnection
Introduced the concept of “perfect negotiation”, with an example to solve signalling races.
Implicit rollback in setRemoteDescription to solve races.
Implicit offer/answer creation in setLocalDescription to solve races.

References :

WebRTC Stack Architecture and Layers


WebRTC stands for Web Real-Time Communications and introduces a real-time media framework in the browser core alongside associated JavaScript APIs for controlling the media frame and HTML5 tags for displaying. If you are new to WebRTC, read “What is WebRTC?” From a technical point of view, WebRTC will hide all the complexity of real-time media behind a very simple JavaScript API. 

WebRTC Layers

Missing Signalling

Webrtc is a media framework which is independant of signalling protocol which means that we can plug any form of signalling to support session establishment using offer-answer handshake and SDP. Some of the popular options

  • Polling
  • XHR ( XML over HTTP Request)
  • Websocket ( HTTP upgraded )
  • SSE ( Server Sent Events )
  • socket.io ( use set of protocols for best compatibility and fallback)
  • HTTP/2

Other form of less used signalling options

  • FTP
  • HTTP
  • long poll
  • XMPP
  • MQTT

One may also send the SDP for local and remote over any other means of communication mechanism such as email, REST API or any custom propriatory protocol.

Security in WebRTC

SSL is the secure session layer which adds encryption capability to an otherwise readable packet.

  • DTLS (Datagram TLS) adds Security on UDP packets which is used by Media stream and Data Channel messages.
  • TLS ( Tansport Layer Security) adds security to TCP messahes used in signalling such as SDP based offer answer handshake which enables setup, modification or breakdown of the session.

WebRTC Stack

WebRTC offers web application developers the ability to write rich, realtime multimedia applications (think video chat) on the web, without requiring plugins, downloads or installs. It’s purpose is to help build a strong RTC platform that works across multiple web browsers, across multiple platforms.

WebRTCpublicdiagramforwebsite

Web API – An API to be used by third-party developers for developing web-based video chat-like applications.

WebRTC Native C++ API – An API layer that enables browser makers to easily implement the Web API proposal

Transport / Session – The session components are built by re-using components from libjingle, without using or requiring the XMPP/jingle protocol.

RTP Stack – A network stack for RTP, the Real-Time Protocol.

Session Management – An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.

Voice Engine

VoiceEngine is a framework for the audio media chain, from sound card to the network.

NetEQ for Voice– A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

Acoustic Echo Canceler (AEC) – The Acoustic Echo Canceler is a software-based signal processing component that removes, in real-time, the acoustic echo resulting from the voice being played out coming into the active microphone.

Noise Reduction (NR) -The Noise Reduction component is a software-based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)

Video Engine

VideoEngine is a framework video media chain for video, from the camera to the network, and from network to the screen.

Video Jitter Buffer – Dynamic Jitter Buffer for video. Helps conceal the effects of jitter and packet loss on overall video quality.

Image enhancements -For example, removes video noise from the image capture by the webcam.

Transport

STUN/ICE – A component allowing calls to use the STUN and ICE mechanisms to establish connections across various types of networks.

Error resilinency and fault tolerance

  • REMB (receiver-side bandwidth estimation) is more common and transport-wide-cc (sender-side bandwidth estimation) is the more modern and future looking approach
  • BWE (Bandwidth Estimation )
  • FEC (Forward Error Correction) and ULPFEC (Uneven Level Protection Forward Error Correction)
    RED (Redundant coding)
    FIR (Full Intra Request)
    PLI (Picture Loss Indication) for video
  • PLC (Packet Loss Concealment) mostly for audio
  • NACK (Negative Acknowledgement)

API support from browser around WebRTC

  • PeerConnection
  • getUserMedia and getDisplayMedia
  • dataChannels
  • getStats
  • MediaRecorder
  • MediaStream / Media Tracks
  • MediaConstraints
  • WebAudio Integration
  • TURN support
  • Echo cancellation
  • srcObject in media element
  • Promise based getUserMedia and PeerConnection

JavaScript Session Establishment Protocol (JSEP)

Outgoing Call : Send Offer to remote peer

JSEP flow for Outgoing Call : Send Offer to remote peer and process incoming answer

Incoming Call : process received offer from remote peer

JSEP flow for Incoming Call : process received offer from remote peer

WebRTC supported Codecs

RTCRtpEncodingParameters

RTCRtpEncodingParameters dictionary describes a single configuration of a codec for an RTCRtpSender.

  • active : flag to set if encoding is currently actively being used.
    codecPayloadType : single 8-bit byte (or octet) specifying the codec to use for sending the stream.
  • dtx : used for audio to indicate if discontinuous transmission (a feature by which a phone is turned off or the microphone muted automatically in the absence of voice activity)
  • maxBitrate : (unsigned long integer) maximum number of bits per second to allow for this encoding.
  • maxFramerate : (double-precision floating-point) maximum number of frames per second to allow for this encoding.
  • ptime: (unsigned long integer) preferred duration of a media packet in milliseconds used in audio encodings.
  • rid : (DOMString) if set, specifies an RTP stream ID (RID) to be sent using the RID header extension.
  • scaleResolutionDownBy :(double-precision floating-point) specifying a factor by which to scale down the video during encoding.
    • default value, 1.0 if video’s size will be the same as the original.
    • 2.0 scales the video frames down by a factor of 2 in each dimension, resulting in a video 1/4 the size of the original.
    • can’t use this to scale the video up

Video Codecs

  • VP8  Video codec from the WebM Project. Well suited for RTC as it is designed for low latency. It was the codec of choice being royalty-free.
  • VP9
  • H264 : not royalty-free. But it is native in most mobile handsets due to its high adoption.
  • AV1

Audio Codecs

  • iSAC: A wideband and super wideband audio codec for VoIP and streaming audio.
  • iLBC: A narrowband speech codec for VoIP and streaming audio.
  • Opus : lossy audio codec for broad range of interactive real-time applications licensed under royalty-free BSD terms.
  • G.711
  • Speex
  • G.722
  • AMR-WB

update 2020 – This article was written very early in 2013 while WebRTC was being standardised and not as widely adopted since the inception of WebRTC began in 2012.

There are many more articles written after that to explain and emphasize the detailing and application of WebRTC. List of these is below :

For SIP IMS and WebRTC

STUN and TURN which form a crtical part of any webrtc based communication system

Security of WebRTC based CaaS and CPaaS

WebRTC SDK

Developing WebRTC / CPaaS Solution

References

  • [1] Working Group RFC
    • draft-ietf-rtcweb-audio-02      2013-08-02
    • draft-ietf-rtcweb-data-channel-05      2013-07-15
    • draft-ietf-rtcweb-data-protocol-00     2013-07-15
    • draft-ietf-rtcweb-jsep-03      2013-02-27
    • draft-ietf-rtcweb-overview-07      2013-08-14
    • draft-ietf-rtcweb-rtp-usage-07     2013-07-15
    • draft-ietf-rtcweb-security-05      2013-07-15
    • draft-ietf-rtcweb-security-arch-07      2013-07-15
    • draft-ietf-rtcweb-transports-00      2013-08-19
    • draft-ietf-rtcweb-use-cases-and-reqs-11 2013-06-27
    • Plus over 20 discussion RFC drafts
  • TLS
  • HTTP/2 offer answer