WebRTC Stack Architecture and Layers

WebRTC Layers
- Missing Signalling
- Security in WebRTC
WebRTC Stack
Error resilinency and fault tolerance
API support from browser around WebRTC
JavaScript Session Establishment Protocol (JSEP)
- Outgoing Call
- Incoming Call
WebRTC supported Codecs
- RTCRtpEncodingParameters
- Video Codecs
- Audio Codecs

WebRTC stands for Web Real-Time Communications and introduces a real-time media framework in the browser core alongside associated JavaScript APIs for controlling the media frame and HTML5 tags for displaying. If you are new to WebRTC, read “What is WebRTC?” From a technical point of view, WebRTC will hide all the complexity of real-time media behind a very simple JavaScript API.

WebRTC

WebRTC Layers

Missing Signalling

Webrtc is a media framework which is independant of signalling protocol which means that we can plug any form of signalling to support session establishment using offer-answer handshake and SDP. Some of the popular options

Polling
XHR ( XML over HTTP Request)
Websocket ( HTTP upgraded )
SSE ( Server Sent Events )
socket.io ( use set of protocols for best compatibility and fallback)
HTTP/2

Other form of less used signalling options

FTP
HTTP
long poll
XMPP
MQTT

One may also send the SDP for local and remote over any other means of communication mechanism such as email, REST API or any custom propriatory protocol.

Security in WebRTC

SSL is the secure session layer which adds encryption capability to an otherwise readable packet.

DTLS (Datagram TLS) adds Security on UDP packets which is used by Media stream and Data Channel messages.
TLS ( Tansport Layer Security) adds security to TCP messahes used in signalling such as SDP based offer answer handshake which enables setup, modification or breakdown of the session.

WebRTC Security

WebRTC Stack

WebRTC offers web application developers the ability to write rich, realtime multimedia applications (think video chat) on the web, without requiring plugins, downloads or installs. It’s purpose is to help build a strong RTC platform that works across multiple web browsers, across multiple platforms.

Web API – An API to be used by third-party developers for developing web-based video chat-like applications.

WebRTC Native C++ API – An API layer that enables browser makers to easily implement the Web API proposal

Transport / Session – The session components are built by re-using components from libjingle, without using or requiring the XMPP/jingle protocol.

RTP Stack – A network stack for RTP, the Real-Time Protocol.

Session Management – An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.

Voice Engine

VoiceEngine is a framework for the audio media chain, from sound card to the network.

NetEQ for Voice– A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

Acoustic Echo Canceler (AEC) – The Acoustic Echo Canceler is a software-based signal processing component that removes, in real-time, the acoustic echo resulting from the voice being played out coming into the active microphone.

Noise Reduction (NR) -The Noise Reduction component is a software-based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)

Video Engine

VideoEngine is a framework video media chain for video, from the camera to the network, and from network to the screen.

Video Jitter Buffer – Dynamic Jitter Buffer for video. Helps conceal the effects of jitter and packet loss on overall video quality.

Image enhancements -For example, removes video noise from the image capture by the webcam.

Transport

STUN/ICE – A component allowing calls to use the STUN and ICE mechanisms to establish connections across various types of networks.

Error resilinency and fault tolerance

REMB (receiver-side bandwidth estimation) is more common and transport-wide-cc (sender-side bandwidth estimation) is the more modern and future looking approach
BWE (Bandwidth Estimation )
FEC (Forward Error Correction) and ULPFEC (Uneven Level Protection Forward Error Correction)
RED (Redundant coding)
FIR (Full Intra Request)
PLI (Picture Loss Indication) for video
PLC (Packet Loss Concealment) mostly for audio
NACK (Negative Acknowledgement)

API support from browser around WebRTC

PeerConnection
getUserMedia and getDisplayMedia
dataChannels
getStats
MediaRecorder
MediaStream / Media Tracks
MediaConstraints
WebAudio Integration
TURN support
Echo cancellation
srcObject in media element
Promise based getUserMedia and PeerConnection

JavaScript Session Establishment Protocol (JSEP)

Outgoing Call : Send Offer to remote peer

Incoming Call : process received offer from remote peer

JavaScript Session Establishment Protocol (JSEP) in WebRTC handshake

WebRTC supported Codecs

RTCRtpEncodingParameters

RTCRtpEncodingParameters dictionary describes a single configuration of a codec for an RTCRtpSender.

active : flag to set if encoding is currently actively being used.
codecPayloadType : single 8-bit byte (or octet) specifying the codec to use for sending the stream.
dtx : used for audio to indicate if discontinuous transmission (a feature by which a phone is turned off or the microphone muted automatically in the absence of voice activity)
maxBitrate : (unsigned long integer) maximum number of bits per second to allow for this encoding.
maxFramerate : (double-precision floating-point) maximum number of frames per second to allow for this encoding.
ptime: (unsigned long integer) preferred duration of a media packet in milliseconds used in audio encodings.
rid : (DOMString) if set, specifies an RTP stream ID (RID) to be sent using the RID header extension.
scaleResolutionDownBy :(double-precision floating-point) specifying a factor by which to scale down the video during encoding.
- default value, 1.0 if video’s size will be the same as the original.
- 2.0 scales the video frames down by a factor of 2 in each dimension, resulting in a video 1/4 the size of the original.
- can’t use this to scale the video up

Video Codecs

VP8 Video codec from the WebM Project. Well suited for RTC as it is designed for low latency. It was the codec of choice being royalty-free.
VP9
H264 : not royalty-free. But it is native in most mobile handsets due to its high adoption.
AV1

Audio Codecs

iSAC: A wideband and super wideband audio codec for VoIP and streaming audio.
iLBC: A narrowband speech codec for VoIP and streaming audio.
Opus : lossy audio codec for broad range of interactive real-time applications licensed under royalty-free BSD terms.
G.711
Speex
G.722
AMR-WB

WebRTC Audio/Video Codecs

update 2020 – This article was written very early in 2013 while WebRTC was being standardised and not as widely adopted since the inception of WebRTC began in 2012.

There are many more articles written after that to explain and emphasize the detailing and application of WebRTC. List of these is below :

For SIP IMS and WebRTC

STUN and TURN which form a crtical part of any webrtc based communication system

Security of WebRTC based CaaS and CPaaS

WebRTC SDK

APPRTC , Talky.io , TokBox

Developing WebRTC / CPaaS Solution

WebRTC business benefits

WebRTC business benefits to OTT and telecom carriers

References

[1] Working Group RFC
- draft-ietf-rtcweb-audio-02 2013-08-02
- draft-ietf-rtcweb-data-channel-05 2013-07-15
- draft-ietf-rtcweb-data-protocol-00 2013-07-15
- draft-ietf-rtcweb-jsep-03 2013-02-27
- draft-ietf-rtcweb-overview-07 2013-08-14
- draft-ietf-rtcweb-rtp-usage-07 2013-07-15
- draft-ietf-rtcweb-security-05 2013-07-15
- draft-ietf-rtcweb-security-arch-07 2013-07-15
- draft-ietf-rtcweb-transports-00 2013-08-19
- draft-ietf-rtcweb-use-cases-and-reqs-11 2013-06-27
- Plus over 20 discussion RFC drafts
TLS

TLS ( Transport Layer Security)

HTTP/2 offer answer

HTTP/2 – offer/answer signaling for WebRTC call

IP Multimedia Subsystem (IMS)

Why IMS ?
What benefits does IMS bring ?
Features of IMS Network
IMS Layers
1. Transport / Media Endpoint Layer
  - Backhaul network
  - Border Gateways
2. Session & Control Layer
  - HSS (Home Subscriber Server)
  - SCF (Call Session Control Function)
  - MGw (Media Gateway Control Function)
3. Application Services Layer
  - TAS (Telephony Application Server)
  - IM-SSF ( IP Multimedia Services Switching Function)
  - OSA-GW (Open Service Access Gateway)
IMS-architecture
- IMS standalone architecture
- Interoperable IMS core for heterogeneous access networks

IMS is a an architectural framework for IP based multimedia rich communications. It was standardized by a group called 3GPP formed in 1999. It started as an enabler for 3rd generation mobile networks in European market and later spread to wirelne networks too. IMS became the key to Fixed Mobile Convergence (FMC).

Based on IETF Protocols (such as SIP, RTP, RTSP, COPS, DIAMETER, etc), IMS is now crucial for controlling conmmunication in a IP based Next Genration Network (NGN).

Communication service providers and telecom operators are migrating from circuit-switched networks to IMS technology with the increasing bandwidth (5G) and user expectations.

Why IMS ?

Early days TDM networks were not robust enough to support emerging technologies and data networking. There was a need to migrate from voic eonly network to Triple play network ( voice , video and data ). Other factors included :

rapid service development
service availiability in both home and roaming network
wireline and wireless convergence

Due to these above mentioned reasons TDM was outdated and IMS gained support .

What benefits does IMS bring ?

It offers counteless applications around rich multimedia services on wireless , packet swtched and even tradional circuit switched networks.

Easier to Create and Deploy New Applications and Services

(+)Enhanced applications are easier to develop due to open APIs and common network services.
(+) Third-party developers can offer their own applications and use common network services, sharing profits with minimal risk
New services involving concurrent sessions of multimedia (voice, video, and data) during the same call are now possible.
(+) Reduced time-to-market for new services is possible because service providers are not tied to the timescales and functions of their primary NEPs

Capture New Subscribers,Retain Current Subscribers

(+) Better voice quality for business applications, such as conferencing, is possible
(+) Wireless applications (like SMS, and so on) can be offered to wire line or broadband subscribers.
(+) Service providers can more easily offer bundled services.

Lower Operating and Capital Costs

(+) Cost-effective implementation of services across multiple transports, such as Push-To-Talk (PTT), presence and Location-Based Services (LBS), Fixed-Mobile Convergence (FMC), mobile video services, and so on.
(+) Common provisioning, management and billing systems are supported for all networks.
(+) Significantly lower transport costs result when moving from time-switched to packet-switched channels.
(+) Service providers can take advantage of competitive offerings from multiple NEPs for most network elements.
(+) Reduced expenses for delivering licensed content to subscribers of different types of devices, encodings, or networks.

The strongest argument for adoption of IMS is that it follows established standards and open interfaces from 3GPP and ETSI. This makes it suited for interoperability, policy control accross networks, streamlined OSS/BSS, Value Added Services etc.

Features of IMS network

Abstraction from Underlying Network : IMS is essentially leading towards an open and standardized network and interface,irrespective of underlay network.

Fixed /Mobile Convergence : Inter operability with Circuit Switched (CS) Mobile application Part (MAP)

Roaming : Location awareness between home and visiting network.

Application layer Call Control : IMS application layer has the provision for defining proxy or B2BUA based call flow completion . This leads to operator being able to introduce business logic into call sessions.

IMS is supplemented by SIP (IETF), Diameter (IETF) and H248(ITU-T).The release cycle of IMS is as follows

2002-03-14 Rel-5 : IMS was introduced with SIP. Qos voice over MGW.
2004-12-16 Rel-6 : Services like emergency , voice call continuity , IPCAN ( IP connectivity Access Network )
2005-09-28 Rel-7 : Single Radio Voice Call Continuity , multimedia telephony,eCall ,ICS
2008-12-11 Rel-8 : IMS centralized services , supplementary services and internetworking between IMS and Circuit Switched Networks,charging , QoS
2009-12-10 Rel-9 : IMS emergency numbers on GPRS , EPS(Enhanced packet system) , Custom alert tone , MM broadcast/Multicast
2011-3-23 Rel-10 : home NodeB, M2M, Roaming and Inter UE transfer
2012-09-12 Rel-11 :-tbd
2014-09-17 Rel-12 :- tbd
2015-12-11 Rel-13 :- tbd

IMS Layers

Majorly IMS is divided into 3 horizontal layers given below :

Transport / Media Endpoint Layer

Unifies transports and media from analog, digital, or broadband formats to Real-time Transport Protocol (RTP) and SIP protocols. This is accomplished by media gateways and signaling gateways.

It also includes media servers with media processing elements to allow for announcements, in-band signaling, and conferencing. These media servers are shared across all applications (voicemail, interactive response systems, push-to-talk, and so on), maximizing statistical use of the equipment and creating a common base of media services without “hard-coding” these services into the applications.

Session & Control Layer

This layer arranges logical connections between various other network elements. It provides registration of end-points, routing of SIP messages, and overall coordination of media and signaling resources.

IMS core which is part of this layer primarily contains 2 important elements Call Session Control Function (CSCF) and Home Subscriber Server (HSS) database. These are explained below

HSS ( Home Subscriber Server)

It is a database of user profiles and location information . It is responsible for name/address resolution and also authorization/authentication .

CSCF ( Call Session Control Function)

Handles most routing, session and security related operation for SIP messages . It is further divided into 3 parts :

Proxy CSCF: P_CSCF is the first point of contact from any SIP UA. It proxies UE requests to subsystem.
Serving CSCF: S-CSCF is a powerful part of IMS Core as it decides how UE request will be forwarded to the application servers.
Interrogating CSCF: I-CSCF initiates the assignment of a user to an S-CSCF (by querying the HSS) during registration.

Application Services Layer

The Application Services Layer contains multiple Application Servers (AS), such as:

Telephony Application Server (TAS) – for defining custom call flow logic
IP Multimedia Services Switching Function (IM-SSF)
Open Service Access Gateway (OSA-GW), and so on.

IMS Architecture

The IMS standalone architecture is suited for an all IP network

The IMS standalone archietcture is suited for an all IP network

Interoperable IMS core for heterogeneous access networks

References

IMS service Switching function and reverse service Switching function read here.

IMSSF and RIMSSF

IP Multimedis subsystem – Detailed : Next part of IMS series , describing IMS components and call flow thoroughly

IP Multimedia Subsystem (IMS) – detailed part2

Internet Telephony Convergence- JAINSLEE Platform

Internet Telephony Convergence- JAINSLEE Platform

IMS in EPC ( Evolved PAcket Core )

IMS in EPC ( Evolved Packet Core )

Update on IMS :

IMS has been mandated as the control architecture for Voice over LTE (VoLTE) networks. Also IMS is being widely adopted to mange traffic for Voice over WiFi (VoWiFi) systems.

	Boris Ivanov on Asterisk – installation…
	Paras Kumar on Hosted IP-PBX and SBC
	altanai on Hosted IP-PBX and SBC
	Debra Olsen on Streaming / broadcasting Live…
	Things to know about… on WebRTC
	Hugo K on FreeSwitch SIP and Media …
	Bert H on Evolution of voice Commun…

Transport / Media Endpoint Layer

Session & Control Layer

Application Services Layer

Related