WebRTC Stack Architecture and Layers

WebRTC stands for Web Real-Time Communications and introduces a real-time media framework in the browser core alongside associated JavaScript APIs for controlling the media frame and HTML5 tags for displaying. If you are new to WebRTC, read “What is WebRTC?” From a technical point of view, WebRTC will hide all the complexity of real-time media behind a very simple JavaScript API. 

WebRTC Layers

Missing Signalling

Webrtc is a media framework which is independant of signalling protocol which means that we can plug any form of signalling to support session establishment using offer-answer handshake and SDP. Some of the popular options

  • Polling
  • XHR ( XML over HTTP Request)
  • Websocket ( HTTP upgraded )
  • SSE ( Server Sent Events )
  • socket.io ( use set of protocols for best compatibility and fallback)
  • HTTP/2

Other form of less used signalling options

  • FTP
  • HTTP
  • long poll
  • XMPP
  • MQTT

One may also send the SDP for local and remote over any other means of communication mechanism such as email, REST API or any custom propriatory protocol.

Security in WebRTC

SSL is the secure session layer which adds encryption capability to an otherwise readable packet.

  • DTLS (Datagram TLS) adds Security on UDP packets which is used by Media stream and Data Channel messages.
  • TLS ( Tansport Layer Security) adds security to TCP messahes used in signalling such as SDP based offer answer handshake which enables setup, modification or breakdown of the session.

WebRTC Stack

WebRTC offers web application developers the ability to write rich, realtime multimedia applications (think video chat) on the web, without requiring plugins, downloads or installs. It’s purpose is to help build a strong RTC platform that works across multiple web browsers, across multiple platforms.


Web API – An API to be used by third-party developers for developing web-based video chat-like applications.

WebRTC Native C++ API – An API layer that enables browser makers to easily implement the Web API proposal

Transport / Session – The session components are built by re-using components from libjingle, without using or requiring the XMPP/jingle protocol.

RTP Stack – A network stack for RTP, the Real-Time Protocol.

Session Management – An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.

Voice Engine

VoiceEngine is a framework for the audio media chain, from sound card to the network.

NetEQ for Voice– A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

Acoustic Echo Canceler (AEC) – The Acoustic Echo Canceler is a software-based signal processing component that removes, in real-time, the acoustic echo resulting from the voice being played out coming into the active microphone.

Noise Reduction (NR) -The Noise Reduction component is a software-based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)

Video Engine

VideoEngine is a framework video media chain for video, from the camera to the network, and from network to the screen.

Video Jitter Buffer – Dynamic Jitter Buffer for video. Helps conceal the effects of jitter and packet loss on overall video quality.

Image enhancements -For example, removes video noise from the image capture by the webcam.


STUN/ICE – A component allowing calls to use the STUN and ICE mechanisms to establish connections across various types of networks.

Error resilinency and fault tolerance

  • REMB (receiver-side bandwidth estimation) is more common and transport-wide-cc (sender-side bandwidth estimation) is the more modern and future looking approach
  • BWE (Bandwidth Estimation )
  • FEC (Forward Error Correction) and ULPFEC (Uneven Level Protection Forward Error Correction)
    RED (Redundant coding)
    FIR (Full Intra Request)
    PLI (Picture Loss Indication) for video
  • PLC (Packet Loss Concealment) mostly for audio
  • NACK (Negative Acknowledgement)

API support from browser around WebRTC

  • PeerConnection
  • getUserMedia and getDisplayMedia
  • dataChannels
  • getStats
  • MediaRecorder
  • MediaStream / Media Tracks
  • MediaConstraints
  • WebAudio Integration
  • TURN support
  • Echo cancellation
  • srcObject in media element
  • Promise based getUserMedia and PeerConnection

JavaScript Session Establishment Protocol (JSEP)

Outgoing Call : Send Offer to remote peer

JSEP flow for Outgoing Call : Send Offer to remote peer and process incoming answer

Incoming Call : process received offer from remote peer

JSEP flow for Incoming Call : process received offer from remote peer

WebRTC supported Codecs


RTCRtpEncodingParameters dictionary describes a single configuration of a codec for an RTCRtpSender.

  • active : flag to set if encoding is currently actively being used.
    codecPayloadType : single 8-bit byte (or octet) specifying the codec to use for sending the stream.
  • dtx : used for audio to indicate if discontinuous transmission (a feature by which a phone is turned off or the microphone muted automatically in the absence of voice activity)
  • maxBitrate : (unsigned long integer) maximum number of bits per second to allow for this encoding.
  • maxFramerate : (double-precision floating-point) maximum number of frames per second to allow for this encoding.
  • ptime: (unsigned long integer) preferred duration of a media packet in milliseconds used in audio encodings.
  • rid : (DOMString) if set, specifies an RTP stream ID (RID) to be sent using the RID header extension.
  • scaleResolutionDownBy :(double-precision floating-point) specifying a factor by which to scale down the video during encoding.
    • default value, 1.0 if video’s size will be the same as the original.
    • 2.0 scales the video frames down by a factor of 2 in each dimension, resulting in a video 1/4 the size of the original.
    • can’t use this to scale the video up

Video Codecs

  • VP8  Video codec from the WebM Project. Well suited for RTC as it is designed for low latency. It was the codec of choice being royalty-free.
  • VP9
  • H264 : not royalty-free. But it is native in most mobile handsets due to its high adoption.
  • AV1

Audio Codecs

  • iSAC: A wideband and super wideband audio codec for VoIP and streaming audio.
  • iLBC: A narrowband speech codec for VoIP and streaming audio.
  • Opus : lossy audio codec for broad range of interactive real-time applications licensed under royalty-free BSD terms.
  • G.711
  • Speex
  • G.722
  • AMR-WB

update 2020 – This article was written very early in 2013 while WebRTC was being standardised and not as widely adopted since the inception of WebRTC began in 2012.

There are many more articles written after that to explain and emphasize the detailing and application of WebRTC. List of these is below :

For SIP IMS and WebRTC

STUN and TURN which form a crtical part of any webrtc based communication system

Security of WebRTC based CaaS and CPaaS


Developing WebRTC / CPaaS Solution


  • [1] Working Group RFC
    • draft-ietf-rtcweb-audio-02      2013-08-02
    • draft-ietf-rtcweb-data-channel-05      2013-07-15
    • draft-ietf-rtcweb-data-protocol-00     2013-07-15
    • draft-ietf-rtcweb-jsep-03      2013-02-27
    • draft-ietf-rtcweb-overview-07      2013-08-14
    • draft-ietf-rtcweb-rtp-usage-07     2013-07-15
    • draft-ietf-rtcweb-security-05      2013-07-15
    • draft-ietf-rtcweb-security-arch-07      2013-07-15
    • draft-ietf-rtcweb-transports-00      2013-08-19
    • draft-ietf-rtcweb-use-cases-and-reqs-11 2013-06-27
    • Plus over 20 discussion RFC drafts
  • TLS
  • HTTP/2 offer answer