Performance of WebRTC sites and electron apps

This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.

Best practices for WebRTC single page applications

As a check the communication clients become single page driven , a lot of authetication , heartbeat sync , web workers , signalling event driven flow management resides on the same page along with the actuall CPU consumption for the audio video resources and media streams . This in turns can make the webpage heavy and many a times could result in crash due to being ” unresponsive”.

Here are some my best to-dos for making sure the webrtc communication client page runs efficiently

Visual stability and CLS ( Cummulative Layout Shift)

CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.

To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .

Minimize main thread work

The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive .

Depriciation of XMLHTTP request on main thread

Reduce Javacsipt execution Time

Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.

If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.

Some effective tips to spedding up JS execution include

  • minifying and compressing code
  • Removing the unused code and console.logs
  • Apply caching to save lookup time



Security Concerns

In one of my previous posts I have mentioned about Security threats to WebRTC Solution . It includes mainly 4 ways in which WebRTC Solution Providers and Users are vulnerable , which includes

  1. Identity Management ,
  2. Browser Security ,
  3. Authentication and
  4. Media encryption.

WebRTC Security

WebRTC Security
Identity Management ,
Browser Security ,
Authentication and
Media encryption.
Browser Threat Model
Best practices for the Webrtc comm agents
ICE TURN challenges

Cookies – Security vs persistsnt state

Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.

While adding cookies we must ensure that if SameSite =None , the cookies must be secure

Set-Cookie: widget_session=abc123; SameSite=None; Secure

SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar. 

Set-Cookie: promo_shown=1; SameSite=Strict

You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.

Performance monitoring

Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .

Lighthouse results

Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App

Also shsows individual categories and comments

Time to render and Page load

Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has

  • Time to Title
  • Time to render
  • Time to inetract

Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports

Task interaction time

Other page interaction crtiteria includes the frames their inetraction and timings for the same.

In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.

Page’s total memeory



The above functions ( old and new ) estimates the memory usage of the entire web page

these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks . Can also use these memery metrics to do A/B testing .

DNS lookup Time

Services such as Pingdom ( or WebPageTest can quickly calculate your website’s DNS lookup times

Load / sgtress tresting can be perfomed over tools such as LoadStorm , JMeter

Page weight and PRPL

Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the pagae light and active and prevent any chrome tab crashes .

PRPL expands to Push/preload , Render , PreCache , Lazy load

  • Render the initial route as soon as possible.
  • Pre-cache remaining assets.
  • Lazy load other routes and non-critical assets.

Preload is a declarative fetch request that tells the browser to request a resource as soon as possible. Hence should be used for crticial assests .

<link rel="preload" as="script" href="critical.js">

The non critical compoenents could then be loaded on async .

Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.

Web Workers

Web Workers are a simple means for web content to run scripts in background threads.The Worker interface spawns real OS-level threads

By acting as a proxy, service workers can fetch assets directly from the cache rather than the server on repeat visits. 

Conversion Rates over analytics Tools

Google Analytics is a good way of deducing Converdsation and bounce rates

Codecs weight

Vide codecs Vp8 vs VP9

SFU vs MCU or p2p media flow

References :

WebRTC Audio/Video Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.

RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

Quick links : If you are new to WebRTC read : Introduction to WebRTC is at and Layers of WebRTC at

WebRTC Media Stack

Media Stream Trcaks in WebRTC

The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

Media Flow in VoIP system
Media Flow in WebRTC Call


Video Capture insync with hardware’s capabilities

WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

Video Capture Resolution

Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

SDP attributes for resolution, frame rate, and bitrate

SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

Dynamic FPS control based on actual hardware encoding :

video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

Stream Orientation

support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer


WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

WebRTC Browsers MUST implement the VP8 video codec as described in
RFC6386] and H.264 Constrained Baseline as described in [H264].

RFC 7442 WebRTC Video Codec and Processing Requirements

most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding.

Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123

a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100

a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=102

a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=127

a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125

a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 red/90000
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=124


Developed by on2 and then acquired and opensource by google . Now free of royality fees.

Supported conatiner – 3GP, Ogg, WebM

No limit on frame rate or data rate and provides maximum resolution of 16384×16384 pixels.

libvpx encoder library.

VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.
encode and decode pixels with an implied 1:1 (square) aspect ratio.

supported simulcast


Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates .

Open and free of royalties and any other licensing requirements
Its supported Containers are – MP4, Ogg, WebM

H264/AVC constrained

AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC

Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

Other profiles , which are not supporedt are Baseline (BP) , Extended (XP), Main (MP) , High (HiP) , Progressive High (ProHiP) , High 10 (Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

Its supported containers are 3GP, MP4, WebM

Parameter settings:

  • packetization-mode
  • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
  • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
  • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

It is a propertiary , patented codec , mianted by MPEG / ITU

AV1 (AOMedia Video 1)

open format designed by the Alliance for Open Media
royality free
especially designed for internet video HTML element and WebRTC
higher data compression rates than VP9 and H.265/HEVC

offers 3 profiles in increasing support for color depths and chroma subsampling.
high, and

supports HDR
supports Varible Frame Rate

Supported container are ISOBMFF, MPEG-TS, MP4, WebM

Stats for Video based media stream track

timestamp 04/05/2020, 14:25:59
ssrc 3929649593
isRemote false
mediaType video
kind video
trackId RTCMediaStreamTrack_sender_2
transportId RTCTransport_0_1
codecId RTCCodec_1_Outbound_96
[codec] VP8 (payloadType: 96)
firCount 0
pliCount 9
nackCount 476
qpSum 912936
[qpSum/framesEncoded] 32.86666666666667
mediaSourceId RTCVideoSource_2
packetsSent 333664
[packetsSent/s] 29.021823604499957
retransmittedPacketsSent 0
bytesSent 342640589
[bytesSent/s] 3685.7715977714947
headerBytesSent 8157584
retransmittedBytesSent 0
framesEncoded 52837
[framesEncoded/s] 30.022576142586164
keyFramesEncoded 31
totalEncodeTime 438.752
[totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
totalEncodedBytesTarget 335009905
[totalEncodedBytesTarget/s] 3602.7091371103397
totalPacketSendDelay 20872.8
[totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
qualityLimitationReason bandwidth
qualityLimitationResolutionChanges 20
encoderImplementation libvpx
Graph for Video Track in chrome://webrtc-internals

Other RTP parameters

RTX(regtranmission ) – packet loss recovery technique for real-time applications with relaxed delay bounds.

Non WebRTC supported Video codecs

Need active realtime media transcoding


Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.
suited for low bandwidth networks
Although it is not comaptible with WebRTC but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

H.265 / HEVC

proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

Container – Mp4

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints


Audio Level

audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

normalization considering frequencies above 300 Hz, regardless of the sampling rate used.

adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

GAIN calculation

  • If the endpoint has control over the entire audio-capture path like a regular phone
    the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
  • If the endpoint does not have control over the entire audio capture like software endpoint
    then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
  • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

Acoustic Echo Cancellation (AEC)

Endpoints shoudl allow echo control mechsnisms


WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000


stabdardised by IETF

container- Ogg, WebM, MPEG-TS, MP4

supportes multiple comptression algorithms

For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommenda that Opus be offered before PCMA/PCMU.

AAC (Advanvced Audio Encoding)

part of the MPEG-4 (H.264) standard
supported congainers – MP4, ADTS, 3GP

Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

G.711 (PCMA and PCMU)

ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
vital to interface with the standard teelcom network and carriers

Fixed 64Kbpd bit rate

supports 3GP container formats

G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU


ncoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression
conatiners used 3GP, AMR-WB

Comfort noise (CN)

artificial background noise which is used to fill gaps in a transmission instead of using pure silence

avoids jarring or RTP Timeout

for streams encoded with G.711 or any other supported codec that does not provide its own CN.
Use of Discontinuous Transmission (DTX) / CN by senders is optional

Internet Low Bitrate Codec (iLBC)

opensource narrow band codec
designed specifically for streaming voice audio

Internet Speech Audio Codec (iSAC)

designed for voice transmissions which are encapsulated within an RTP stream.

DTMF and ‘audio/telephone-event’ media type

endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

DTMF events list
| 0 | DTMF digit “0”
| 1 | DTMF digit “1”
| 2 | DTMF digit “2”
| 3 | DTMF digit “3”
| 4 | DTMF digit “4”
| 5 | DTMF digit “5”
| 6 | DTMF digit “6”
| 7 | DTMF digit “7”
| 8 | DTMF digit “8”
| 9 | DTMF digit “9”
| 10 | DTMF digit “*”
| 11 | DTMF digit “#”
| 12 | DTMF digit “A”
| 13 | DTMF digit “B”
| 14 | DTMF digit “C”
| 15 | DTMF digit “D”

Stats for Audio Media track

timestamp 04/05/2020, 14:25:59
ssrc 3005719707
isRemote fals
mediaType audio
kind audio
trackId RTCMediaStreamTrack_sender_1
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
[codec] opus (payloadType: 111)
mediaSourceId RTCAudioSource_1
packetsSent 88277
[packetsSent/s] 50.03762690431027
retransmittedPacketsSent 0
bytesSent 1977974
[bytesSent/s] 150.11288071293083
headerBytesSent 2118648
retransmittedBytesSent 0
Graphs in chrome://webrtc-internals for Audio


m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE

Stats for Datachannel

Statistics RTCDataChannel_1
timestamp 04/05/2020, 14:25:59
label sctp
datachannelid 1
state open
messagesSent 1
[messagesSent/s] 0
bytesSent 228
[bytesSent/s] 0
messagesReceived 1
[messagesReceived/s] 0
bytesReceived 228
[bytesReceived/s] 0

Refrenecs :

JavaScript Session Establishment Protocol (JSEP) in WebRTC handshake

This article is aimed at explaning the intricacies and detailed offer answer flow in webrtc handshake and JSEP . You can read the following artciles on WebRTC as prereq before reading through this one

WebRTC API – Peerconnection , getUserMedia , Datachannel , DataStaats

JSEP (JavaScript Session Establishment Protocol)

JSEP (JavaScript Session Establishment Protocol) is used during signalling via w3c RTCPeerConnectionAPI interface to set up a multimedia session. The multimedia session description specfies the crtical components of setting up a session between local and remote such as transport ports , protcol , profiles . It also handles the intercation with ICE state machine

Offer/Answer Excahange Flow

prereq : Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

  1. Side initiating the session creates a offer by CreateOffer() API
aPromise = myPeerConnection.createOffer([options]);

options is type of RTC Offer Options

  • iceRestart
  • offerToReceiveAudio ( legacy)
  • offerToReceiveVideo ( legacy)
  • voiceActivityDetection

2. The application then stores the offer in local config as setLocalDescriptionAPI()

 myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);

3. Offer is sent to remote side using its choice of signalling ( SIP , WS , HTTP, XMPP .. )

4. Remote party stores it use setRemoteDescription() API

.then(function () {
  return createMyStream();

4. Remote part generates an answer using createAnswer() API

aPromise = RTCPeerConnection.createAnswer([options]);

5. Remote party stores the answer in its local config using setLocalDescription() API

6. Answer is transferred to Initiator side using choice of signalling ( SIP , WS , HTTP, XMPP .. ) again

7. Initiating side stores it use setRemoteDescription() API

Interfaces of webrtc and tracks to stream addition

Process to perform webrtc handshake

Webrtc call setup and incoming call callflow between remote peer , peerconnection actory , peerconnection and application

setup a call
receive a call

Signalling state Transitions on PeerConnection

As the caller initiates a new RTCPeerConnection() , the RTCSignalingState state is “stable” as remote and local descriptions are empty

As the caller initiates call and calls createOffer() , he now has offer SDP and procced to store offer locally with setLocalDescription(offer) the RTCSignalingState state is “have-local-offer” . After than caller send the offer to callee over signalling channel

Simillarily as the calle recives the offer , it starts with RTCSignalingState stable and then proceeds to store the Remote’s offer using setRemoteDescription(offer) , its state is now “have-remote-offer”

The callee generates a provsional answer and for caller and stores it locally , state transitiosn to “have-local-pranswer” . the pranswer SDP is send to caller over signalling channel again .

Caller stores the callee’s pr answer SDP and state updates to “have-remote-pranswer”

Once there is no offer/answer exchange in progress the state again changes to ” stable “.

State schanges to “closed” if RTCpeerConnection is closed

img :

Detailed Offer / Answer SDP

Local Offer created by side initiating the session / Caller

The first offfer called initial offer can have dummy date for contact line such as to prevent leaking a local Ip address

c=IN IP4
a=rtcp:9 IN IP4

“o=” line contains <username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

o=- 4445251981417004127 2 IN IP4 127.0.0.

shows username – and 4445251981417004127 as session id. Same username “-” is specified in “s=” line

“t=” line shows <start time> <stop time>

t=0 0

Full session Block example

type: offer, sdp: v=0
o=- 4445251981417004127 2 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2

Media Section : An m= section is generated for each RtpTransceiver that has been added to the PeerConnection. For the initial offer since no ports are available yet , dummy port 9 can be sadded. However if it is bundle only then port value is set to 0. Later the port value will be set to the port value of default ICE candidate.

DTLS filed “UDP/TLS/RTP/SAVPF” is followed by the list of codecs in order of priority.

“c=” line in msection too must be filled with dummy values if IP as no candidates are available yet .




“a=ice-ufrag” , “a=ice-pwd” , “a=fingerprint” , “a=setup” , “a=tls-id”

Media Stream Identification attribute “a-mid:”

For each media format on the m= line, “a=rtpmap” for “rtx” with the clock rate of codec and “a=fmtp” to reference the payload type of the primary codec.  “a=rtcp-fb” specified RTCP feedback

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

Audio Block exmaple

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3968544080 cname:da0nYe1oYR8AvVNp
a=ssrc:3968544080 msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=ssrc:3968544080 mslabel:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2
a=ssrc:3968544080 label:7525d75c-ffe7-4038-8b71-653d249e63bb

// remove video section for simplicity

Data Block is created if data channle has been created with m= section for data.

“a=sctp-port” line referencing the SCTP port number set to 5000

 “a=max-message-size”  set to 262144 here

Data Block example

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A

Subsequent Offers

When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processig is different due to gathered ICE candidates . However the <session-version> is not changed .

Additionally m section is updated if RtpTransceiver is added or removed

Each “m=” and c=” line MUST be filled in with the port, relevant RTP profile, and address of the default candidate for the m= section

If the m= section is not bundled into another m= section, update the “a=rtcp” with port and address of RTCP camdidate and add “a=camdidate” with  “a=end-of-candidates” 

Local Answer created by side receiving the session/ Callee

When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer. 

 Each offered m= section will have an associated RtpTransceiver

Remote Destination / Callee can reject the m section by setting port in m line to 0 . It can reject msection if neither of the offered media format are supported , RtpTransceiver is stoopped etc.

For the initial offer the dummy port value of 9 is set as no ICE candudate is avaible yet . Simillarly  “c=” line must contain the “dummy” value “IN IP4” too.

The <proto> field MUST be set to exactly match the <proto> field for the corresponding m= line in the offer.

type: answer, sdp: v=0
o=- 5730481682283561642 3 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT

Audio section

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT e817fe0f-1cc0-4901-9fd9-e810289cc85d
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3260997313 cname:FxLUKuXrLQe0r1rn

Video section removed for simplicity

Data stream

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43

Subsequent Answers

 Port value would normally be set to the port of the default ICE candidate for this m= section. For the exmaple above

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

will be changes with relevant port adress such as

type: offer, sdp: v=0
o=- 6407282338169184323 3 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS bSrCUCFybGovIy0FUhPTZAr9ToRmx8I09nEj
m=audio 55375 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 55375 typ host generation 0 network-id 1 network-c

Simillarly m video and data line will also get ports

m=video 53877 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 53877 typ host generation 0 network-id 1 network-cost 10
m=application 57991 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=candidate:2880323124 1 udp 2122260223 57991 typ host generation 0 network-id 1 network-cost 10

If the answer contains any “a=ice-options” attributes where “trickle” is listed as an attribute, update the PeerConnection canTrickle property to be true. 

Modifying Offer/answer SDP

SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription.
After calling setLocalDescription with an offer or answer, the application MAY modify the SDP to reduce its capabilities before sending it to the far side

Assume we have a MCU at location and want the video stream to relay via a Media Server.

SDP Parsing

SDP is used for session parsing and contians sequence of line with key value pairs. SDP is read, line-by-line, and converted to a data structure that contains the deserialized information.

JSEP SDP bears a lot of simillarity to SIP SDP explained here : SIP and SDP Messages Explained

Session-Level Parsing

Line “v=” , “o=”,”b=” and “a=” are processed . The “i=”, “u=”, “e=”, “p=”, “t=”, “r=”, “z=”, and “k=” lines are not used by this specification; they MUST be checked for syntax but their values are not used. Line “c=” is checked for syntax and ICE mismatch detection

“a= ” attribute could be : “a=group” , “s=”ice-lite” , “a=ice-pwd”, “a=ice-options” , “a=fingerprint”, “a=setup” , a=tls-id”, “a=identity” , “a=extmap”

Media Section Parsing

Line “m=” for media , proto , port , fmt in RTP

Attributes “a=” can be

“a=rtpmap” or “a=fmtp”

map from an RTP payload type number to a media encoding name that identifies the payload format.

a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]
m=audio 49230 RTP/AVP 96 97 98
a=rtpmap:96 L8/8000
a=rtpmap:97 L16/8000
a=rtpmap:98 L16/11025/2

“a=ptime” , “a=maxptime”

dierction as  “a=sendrecv” , a=recvonly , a=sendonly , a=inactive

Muxing as “a=rtcp-mux” ,


RTCP attributes “a=rtcp” , “a=rtcp-rsize”

Line “c=” is checked .

Line “b=” for bandiwtdh , bwtype

Attribites for “a=” could be “a=ice-ufrag”, “a=”ice-pwd”, “a=ice-options” , “a=candidate”, “a=remote-candidate” , a=end-of-candidates” and “a=fingerprint”

Semantics Verification

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN
also allows for address selection for multi-homed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

ICE Gathering

Caller and callee performs checks to finalize the protocol and routing needed to establish a peer connection . Number of candudates are proposed till they mutually agree upon one . Peerconnection then uses that candiadte detaisl to initiate the connection .

While Applying a Local Description at the media engine level if m= section is new, WebRTC media stacks begins gathering candidates for it.

RTCPeerconnection specified canTrickleIceCandidates . ICE trickling is the process of continuing to send candidates after the initial offer or answer has already been sent to the other peer.

ICE TransportRole is responsible for Choosing a candidate pair

ICE layer sets one peer as controlling and other as controlled agent . The controling agent makes the final decision as to which candidate pair to choose.

Final selected canduadte in SDP

a=group:BUNDLE 0 1 2
a=msid-semantic: WMS 9Cv3eIelHVuhxrGfxSvUsfokNu4eb4R9PYw2

m=audio 59937 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 x.x.x.x
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 x.x.x.x 59937 typ host generation 0 network-id 1 network-cost 10
a=candidate:3844981444 1 tcp 1518280447 x.x.x.x 9 typ host tcptype active generation 0 network-id 1 network-cost 10

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    • translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    • addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address

Agent sends the TURN Allocate request from IP address and port X:x,
NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

Only STUN based Binding

agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

Checks : ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first.

TRIGGERED CHECKS – accelerates the process of finding a valid candidate
ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

Checks ensure maintaining frozen candidates and pairs with some foundation for media stream

Each candidate pair in the check list has a foundation and a state. States for candidates pairs

1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.

2. In-Progress: A check has been sent for this pair, but the transaction is in progress.

3. Succeeded: A check for this pair was already done and produced a successful result.

4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.

5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

Example of ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 37542 typ srflx raddr rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 27190 typ relay raddr rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 28049 typ relay raddr rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Examaple Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement
controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. Ways
regular nomination and aggressive nomination


To read More on WebRTC Communication as a platform

WebRTC Media Stack

WebRTC service’s

References : WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019
RFC 5245 Inter

WebRTC CPaaS ( Communication Platform as a Service )

A CPasS ( communication platform as a service ) is cloud based communication platform alo B2B cloud communications platform that provides real time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces .

Traditionally , with IP protected protocols , licensed codecs maintaining a signalling protocol stack , network interfaces building communication platform was a costly affair. Cisco , Facetime , Skype were the only OTT ( over the top) players taking away from telco’s call revenue .

However with the advent of standardize , open source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real time communications on his platform has many options to choose from. This article provides an insight to how CPaaS solution are architectured and programmed

Sample CPass Architecture build on open source technologies

I have written an article before on Steps for building and deploying WebRTC solution , which includes standalone , cloud hosted and TURN based NAT handler systems .

A typical CPaaS solution provides

  • Call server + Media Server that can be interacted with via UA
  • Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
  • APIs that can trigger automated calls and perform preprogrammed routing.
  • Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
  • Some CPaaS providers also add features like transcribing ,transcoding , recording , playback etc to provide edge over other CPaaS providers

Datacentre vs Cloud server


Advantages of using a CPaaS vs building your own RTC platform

Tech insights and experiences

companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday

keeping up with emerging trends

Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP , other ML based enhancements are provided by CPaaS company

Auto Scaling , High Availability

A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure


using a Cpaas saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models

In a nutshell I have come across so many small size startups trying to build CPaaS solution from scratch but only realising it after weeks of trying to build a MVP that they are stuck with firewall, NAT, media quality or interoperability issues . Since there are so many solution already out in the market it is best to instead use them as underlying layer and build applications services using it such as callcentre or CRM services etc .

Session Border controller for WebRTC

Unified communication services build around WebRTC should be vendor agnostic and multi-tenant and be supported by other Communication Service Providers (CSPs), SIP trunks, PBXs, Telecom Equipment Manufacturers (TEMs), and Communication Platform as a Service (CPaaS). This can happen if all endpoints adhere to SIP standards in most updated RFC. However since not all are on the boat , Session border controllers are a great way to mitigate the differences and provide seamless connectivity to signalling and media , which could be between WebRTC, SIP or PSTN, from TDM to IP .

Session Border Controllers ( SBC )  assist in controlling the signalling and usually also the media streams involved in calls and sessions.

They are often part of a VOIP network on the border where there are 2 peer networks of service providers such as backbone network and access network of corporate communication system which is behind firewall.

A more complex example is that of a large corporation where different departments have security needs for each location and perhaps for each kind of data. In this case, filtering routers or other network elements are used to control the flow of data streams. It is the job of a session border controller to assist policy administrators in managing the flow of session data across these borders. – wikipedia

SBC act like a SIP-aware firewall with proxy/B2BUA.

What is B2BUA?

A Back to back user agent ( B2BUA ) is a proxy-like server that splits a SIP transaction in two pieces:

  • on the side facing User Agent Client (UAC), it acts as server;
  • on the side facing User Agent Server (UAS) it acts as a client.

B2BUAs keep state information about active dialog. Read more here .

Remote Access

SBC mostly have public url address  for teleworkers and a internal IP for enterprise/ inner LAN . This enables users connected to enterprise LAN ( who do not have public address ) to make a call to user outside of their network. During this process SBC takes care of following while relaying packets .

  1. Security
  2. Connectivity
  3. Qos
  4. Regulatory
  5. Media Services
  6. Statistics and billing information

Topology hiding

SBC hides and anonymize secure information like IP ports before forwarding message to outside world . This helps protect the internal node of Operators such as PSTN gateways or SIP proxies from revealing outside.

Explaining the functions of SBC in detail

1. Security

SBCs are often used by corporations along with firewalls and intrusion prevention systems (IPS) to enable VoIP calls to and from a protected enterprise network. VoIP service providers use SBCs to allow the use of VoIP protocols from private networks with Internet connections using NAT, and also to implement strong security measures that are necessary to maintain a high quality of service. The security features includes :

  • Prevent malicious attacks on network such as DOS, DDos.
  • Intrusion detection
  • cryptographic authentication
  • Identity/URL based access control
  • Blacklisting bad endpoints
  • Malformed packet protection
  • Encryption of signaling (via TLS and IPSec) and media (SRTP)
  • Stateful signalling and Validation
  • Toll Fraud – detect who is intending to use the telecom services without paying up

2. Connectivity

As SBC offers IP-to-IP network boundary, it recives SIP request from users like REGISTER , INVITE  and routes them towards destination, making their IP. During this process it performs various operations like

  • NAT traversal
  • IPv4 to IPv6 inter-working
  • VPN connectivity
  • SIP normalization via SIP message and header manipulation
  • Multi vendor protocol normalization

Further Routing features includes  :
Least Cost Routing based on MoS ( Mean Opinion Score ) : Choosing a path based on MoS is better than chooisng any random path . 

Protocol translations between SIP, SIP-I, H.323.

In essence SBC achieve interoperability, overcoming some of the problems that firewalls and network address translators (NATs) present for VoIP calls.

Automatic Rerouting

connectivity loss from UA for whole branch is detected by timeouts . But they can also be detected by audio trough SIP OPTIONS by SBC .  In such connectivity loss , SBC decides rerouting or sending back 504 to caller .

SBC 2 (1)

4. QoS
To introduce performance optimization and business rules in call management QoS is very important . This includes the following :

  • Traffic policing
  • Resource allocation
  • Rate limiting
  • Call Admission Control (CAC)
  • ToS/DSCP bit setting
  • Recording and Audit of messages , voice calls , files
  • System and event logging

5. Regulatory

Govt policies ( such as ambulance , police ) and/ or enterprise policies may require some calls to be holding priority over others . This can also be configured under SBC as emergency calls and prioritization.
Some instances may require communication provider to comply with lawful bodies and provide session information or content , this is also called as Lawful interception (LI) . This enables security officials to collect specific information rather than examining all the traffic that passes through a particular router. This is also part of SBC.
6. Media services

Many of the new generation of SBCs also provide built-in digital signal processors (DSPs) to enable them to offer border-based media control and services such as- DTMF relay , Media transcoding , Tones and announcements etc.

WebRTC enabled SBC’s also provide conversion between DTLS-SRTP, to and from RTCP/RTP. Also transcoding for Opus into G7xx codecs
and ability to relay VP8/VP9 and H.264 codecs.

7. Statistics and billing information

SBC have an interface with and OSS/BSS systems for billing process , as almost all traffic that pass through the edge of the network passes via SBC. For this reason it is also used to gather Statistics and usage-based information like bandwidth, memory and CPU.  PCAP traces of both signaling and media information of specific sessions .

New feature rich SBCs also have built-in digital signal processors (DSPs). Thus able to provide more control over session’s media/voice . They also add services like Relay and Interworking, Media Transcoding, Tones and Announcements, DTMF etc.

Session Border Controller (SBC)
Session Border Controller for WebRTC , SIP , PSTN , IP PBX and Skype for business .

Diagram Component Description

Gateways provide compression or decompression, control signaling, call routing, and packetizing.

PSTN Gateway : Converts analog to VOIP and vice versa . Only audio no support for rich multimedia .

VOIP Gateway : A VoIP Gateway acts like a translator converting digital telecom lines to VoIP . VOIP gateway often also include voice and fax. They also have interfaces to Soft switches and network management systems.

WebRTC Gateway : They help in providing NAT with ICE-lite and STUN connectivity for peers behind policies and Firewall .

SIP trunking : Enterprises save on significant operation cost by switching to IP /SIP trunking in place of TDM (Time Division Multiplexing). Read more on SIP trunk and VPN  here. 

SIP Server : A Telecom application server ( SIP Server ) is useful for building VAS ( Value Added Services ) and other fine grained policies on real time services . Read more on SIP Servers here . 

VOIP/SIP service Provider :   There are many Worldwide SIP Service providers such as Verizon in USA , BT in europe, Swisscom in Switzerland etc .

Building a SBC

The latest trends in Telecommunications industry demand an open standardized SBC to cater to growing and large array of SIP Trunking, Unified Multimedia Communications UC&C, VoLTE, VoWi-Fi, RCS and OTT services worldwide . Building an SBC requires that it meet the following prime requirements :

  • software centric
  • Cloud Deploybale
  • Rich multimedia (audio , video , files etc) processing
  • open interfaces
  • The end product should be flexible to be deployed as COTS ( Commercial Off the shelf) product or as a virtual network function in the NFV cloud.
  • Multi Configuration , should be supported such as Hosted or Cloud deployed .
  • Overcome inconsistencies in SIP from different Vendors
  • Security and Lawful Interception
  • Carrier Grade Scaling

Flow Diagram 


Thus we see how SBC became important part of comm systems developed over SIP and MGCP. SBC offer B2BUA ( Back to Back user agent) behavior to control both signalling and media traffic.

Setting up ubuntu ec2 t2 micro for webrtc and socketio

Setting up a ec2 instance on AWS for web real time communication platform over nodejs and using WebRTC.

Primarily a Web Call  , Chat and conference platform uses WebRTC for the media stream and socketio for the signalling . Additionally used technologies are nosql for session information storage , REST Apis for getting sessions details to third parties.

Below is a comprehensive setup if ec2 t2.micro free tier instance, installation with a webrtc project module and samples of customisation and usage .

Technologies used are listed below :


  1. ec2 instance t2.micro covered under free tier
  2. domain name
  3. SSL certificate

Core module for Web Calling feature

  1. WebRTC
  2. Node.js

UI components

  1. javascript
  2. css
  3. html5
  4. bootstrap
  5. jquerry

Supporting setup for session management

  1. Code version-ing  and maintenance
  2. git
  3. npm

Sample Project

Amazon’s free tier ec2

Amazon EC2 : These are elastic compute general purpose storage servers that mean that they can resize the compute capacity in the cloud based on load . 750 hours per month of Linux, RHEL, or SLES t2.micro instance usage. Expires 12 months after sign-up.

Some other products are also covered under free tier which may come in handy for setting up the complete complatorm. Here is a quick summary

Amazon S3 : it is a storage server. Can be used to store media file like image s, music , videos , recorded video etc .

Amazon RDS : It a relational database server . If one is using mysql or postgress for storing session information or user profile data . It is good option .

Amazon SES : email service. Can be used to send invites and notifications to users over mail for scheduled sessions or missed calls .

Amazon CloudFront : It is a CDN ( content delivery network ) . If one wants their libraries to be widly available without any overheads . CDN is a good choice .

Alternatively any server from Google cloud , azure free tier or digital ocean or even heroku can be used for WebRTC code deployment . Note that webrtc capture now requires htps in domain name.

Server Setup

Set up environment by installing nvm  , npm  and git ( source version control)

1. NVM ( node version manager )


curl -o- | bash

or Wget:

wget -qO- | bash

To check installation

command -v nvm

2. NPM( node package manager)

sudo apt-get install npm
Screenshot from 2016-05-16 12-41-42

2. Git

sudo apt-get install git
Screenshot from 2016-05-17 11-25-01

 SSL certificates

Since 2015 it has become mandatory to have only https origin request WebRTC’s getUserMedia API ie Voice, video, geolocation , screen sharing require https origins.
Note that this does not apply to case where its required to only serve peer’s media Stream or using Datachannels . Voice, video, geolocation , screen sharing now require https origins

For A POC purpose here is th way of generating a self signed certificate
Transport Layer Security and/or Secure Socket Layer( TLS/SSL) is a public/private key infrastructure.Following are the steps

1.create a private key

 openssl genrsa -out webrtc-key.pem 2048

2.Create a “Certificate Signing Request” (CSR) file

 openssl req -new -sha256 -key webrtc-key.pem -out webrtc-csr.pem

3.Now create a self-signed certificate with the CSR,

 openssl x509 -req -in webrtc-csr.pem -signkey webrtc-key.pem -out webrtc-cert.pem

However in production or actual implementation it is highly recommended to use a signed certificate by CA as For examples include

Web Server

create https certificate using self generate or purchased SSL certificates using fs , node-static and https modules . To know how to create self generated SSL certificates follow section above on SSL certificates.

var fs = require(‘fs’);
var _static = require(‘node-static’);
var https = require(‘https’);

var file = new _static.Server(&amp;amp;amp;amp;amp;amp;amp;quot;./&amp;amp;amp;amp;amp;amp;amp;quot;, {
cache: 3600,
gzip: true,
indexFile: &amp;amp;amp;amp;amp;amp;amp;quot;index.html&amp;amp;amp;amp;amp;amp;amp;quot;

var options = {
key: fs.readFileSync(‘ssl_certs/webrtc-key.pem’),
cert: fs.readFileSync(‘ssl_certs/webrtc-cert.pem’),
ca: fs.readFileSync(‘ssl_certs/webrtc-csr.pem’),
requestCert: true,
rejectUnauthorized: false

var app = https.createServer(options, function(request, response){
request.addListener(‘end’, function () {
file.serve(request, response);


Web servers work with the HTTP (and HTTPS) protocol which is TCP based. As a genral rule TCP establishes connection whereas UDP send data packets

Scoketio signalling server as npm determines which of the following real-time communication method is suited to the particular client and its network bandwidth .

  • WebSocket
  • Adobe Flash Socket
  • AJAX long polling
  • AJAX multipart streaming
  • Forever Iframe
  • JSONP Polling

The server needs a HTTP Server for initial handshake. The general steps for socketio signalling server

1.require and keep the reference

var io = require(‘’)

2.Create your http / https server outline in section on webserver

3.bind your http and https servers (.listen)

io.listen(app, {
    log: false,
    origins: ‘*:*’

4. Optionally set transport

io.set(‘transports’, [

5.setup io events

io.sockets.on(‘connection’, function (socket) {
//Do domething

Note that or websockets require an http server for the initial handshake.

Install ssocketio npm module

npm install

Complete code for signalling server

const io = require("")
    .listen(app, {
        log: false,
        origins: "*:*"
io.set("transports", ["websocket"])

var channels = {};

io.sockets.on("connection", function (socket) {

    var initiatorChannel = "";

    if (!io.isConnected) {
        io.isConnected = true;

    socket.on("namespace", function (data) {
        onNewNamespace(, data.sender);

    socket.on("new-channel", function (data) {
        if (!channels[]) {
            initiatorChannel =;
        console.log("new channel",, "by", data.sender)
        channels[] = {
            users: [data.sender]


    socket.on("join-channel", function (data) {
        console.log("Join channel",, "by", data.sender)

    socket.on("presence", function (channel) {
        var isChannelPresent = !!channels[];
        console.log("presence for channel ", isChannelPresent)
        socket.emit("presence", isChannelPresent)

    socket.on("disconnect", function (channel) {
        // handle disconnected event

    socket.on("admin_enquire", function (data) {
        switch (data.ask) {
            case "channels":
                socket.emit("response_to_admin_enquire", channels)
            case "channel_clients":
                socket.emit("response_to_admin_enquire", io.of("/" +;
            default :
                socket.emit("response_to_admin_enquire", channels)

function onNewNamespace(channel, sender) {
    console.log("onNewNamespace", channel);
    io.of("/" + channel).on("connection", function(socket) {
        var username;
        if (io.isConnected) {
            io.isConnected = false;
            socket.emit("connect", true)
        socket.on("message", function (data) {
            if (data.sender == sender) {
                if(!username) username =;
        socket.on("disconnect", function() {
            if(username) {
                socket.broadcast.emit("user-left", username)
                username = null;

WebRTC main HTML5  project

This is the front  end section of the whole exercise . It contains JavaScript , css and html5 to make a webrtc call

<body id="pagebody">
<div id="elementToShare" className="container-fluid">
    <!-- ................................ top panel ....................... -->
    <div className="row topPanelClass">
        <div id="topIconHolder">
            <ul id="topIconHolder_ul">
                <li hidden><span id="username" className="userName" hidden>a</span></li>
                <li hidden><span id="numbersofusers" className="numbers-of-users" hidden></span></li>
                <li><span id="HelpButton" className="btn btn-info glyphicon glyphicon-question-sign topPanelButton"
                          data-toggle="modal" data-target="#helpModal"> Help </span></li>
    <!-- .............alerts................. -->
    <div className="row" id="alertBox" hidden="true"></div>
    <!-- .......................... Row ................................ -->
    <div className="row thirdPanelClass">
        <div className="col-xs-12 videoBox merge" id="videoHold">
            <div className="row users-container merge" id="usersContainer">
                <div className="CardClass" id="card">

                    <!-- when no remote -->
                    <div id="local" className="row" hidden="">
                        <video name="localVideo" autoPlay="autoplay" muted="true"/>
                    <!-- when remote is connected -->
                    <div id="remote" className="row" style="display:inline" hidden>
                        <div className="col-sm-6 merge" className="leftVideoClass" id="leftVideo">
                            <video name="video1" hidden autoPlay="autoplay" muted="true"></video>
                        <div className="col-sm-6 merge" className="rightVideoClass" id="rightVideo">
                            <video name="video2" hidden autoPlay="autoplay"></video>
    <!--modal help -->
    <div className="modal fade" id="helpModal" role="dialog">
        <div className="modal-dialog modal-lg">
            <div className="modal-content">
                <div className="modal-header">
                    <button type="button" className="close" data-dismiss="modal">&times;</button>
                    <h4 className="modal-title">Help</h4>
                <div className="modal-body">
                    WebRTC Runs in only https due to getusermedia security contraints
                <div className="modal-footer">
                    <button type="button" className="btn btn-default" data-dismiss="modal">Close</button>

the document start script that invokes the JS script

$('document').ready(function () {

    sessionid = init(true);

    var local = {
        localVideo: "localVideo",
        videoClass: "",
        userDisplay: false,
        userMetaDisplay: false

    var remote = {
        remotearr: ["video1", "video2"],
        videoClass: "",
        userDisplay: false,
        userMetaDisplay: false

    webrtcdomobj = new WebRTCdom(
        local, remote

    var session = {
        sessionid: sessionid,
        socketAddr: "https://localhost:8084/"

    var webrtcdevobj = new WebRTCdev(session, null, null, null);


Screenshot from 2016-05-17 12-12-37.png

Common known issues:

1.Opening page https://<web server ip>:< web server port>/index.html says insecure

This is beacuse the self signed certificates produced by open source openSSL is not recognized by a trusted third party Certificate Agency.
A CA ( Certificate Authority ) issues digital certificate to certify the ownership of a public key for a domain.

To solve the access issue goto https://<web server ip>:< web server port> and given access permission such as outlined in snapshot below


2.Already have given permission to Web Server , page loads but yet no activity .

if you open developer console ( ctrl+shift+I on google chrome ) you will notice that there migh be access related errros in red .
If you are using different server for web server and signalling server or even if same server but different ports you need to explicity go to the signalling server url and port and give access permission for the same reason as mentione above. webcam capture on opening the page

This could happen due to many reasons

  •  page is not loaded on https
  • browser is not webrtc compatible
  • Media permission to webcam are blocked
  • the machine does have any media capture devices attached
  •  Driver issues in the client machine while accessing webcams and mics .

4.socketio + code: 0, message: “Transport unknown”

Due to the version  v1.0.x of while performing handshake . To auto correct this , downgrade to v0.9.x

WebGL , Three.js and WebRTC

For the last couple of weeks , I have been working on the concept of rendering 3D graphics on WebRTC media stream using different JavaScript libraries as part of a Virtual Reality project .

What is Augmented Reality ?

Augmented reality (AR) is viewing a real-world environment with elements that are supplemented by computer-generated sensory inputs such as sound, video, graphics , location etc.

How is it diff. from Virtual Reality ?

Virtual Reality – replaces the real world with simulated one , user is isolated from real life , Examples – Oculus Rift & Kinect

Augmented Reality – blending of virtual reality and real life , user interacts with real world through digital overlays , Examples – Google glass & Holo Lens

Methods for rendering augmented Reality

  • Computer Vision
  • Object Recognition
  • Eye Tracking
  • Face Detection and substitution
  • Emotion and gesture picker
  • Edge Detection

web based Augmented Reality solution

Components for a Web base  end to end AR solution

Web :

WebRTC getusermedia
Web Speech API
HTML5 canvas
sensor API

H/w :

Graphics driver
microphone and camera

3D :

Geometry and Math Utilities
3D Model Loaders and models
Lights, Materials,Shaders, Particles,


  • Web based Real Time communications
  • Definition for browser’s media stream and data
  • Awaiting more standardization , on a API level at the W3C and at the protocol level at the IETF.
  • Enable browser to browser applications for voice calling, video chat and P2P file sharing without plugins.
  • Enables web browsers with Real-Time Communications (RTC) capabilities
  • MIT : Free, open project

Code snippet for WebRTC API

1.To begin with WebRTC we first need to validate that the browser has permission to access the webcam.

&amp;amp;lt;video id=&amp;amp;quot;webcam&amp;amp;quot; autoplay width=&amp;amp;quot;640&amp;amp;quot; height=&amp;amp;quot;480&amp;amp;quot;&amp;amp;gt;&amp;amp;lt;/video&amp;amp;gt;
  1. Find out if the user’s browser can use the getUserMedia API.
function hasGetUserMedia() {
	return !!(navigator.webkitGetUserMedia);
  1. Get the stream from the user’s webcam.
var video = $('#webcam')[0];
if (navigator.webkitGetUserMedia) {
			{audio:true, video:true},
			function(stream) { video.src = window.webkitURL.createObjectURL(stream);  },
			function(e) {	alert('Webcam error!', e); }

Screenshot AppRTC

Augmented Reality in WebRTC Browser - Google Slides



  • Web Graphics Library
  • JavaScript API for rendering interactive 2D and 3D computer graphics in browser
  • no plugins
  • uses GPU ( Graphics Processing Unit ) acceleration
  • can mix with other HTML elements
  • uses the HTML5 canvas element and is accessed using Document Object Model interfaces
  • cross platform , works on all major Desktop and mobile browsers

WebGL Development

To get started you should know about :

  • GLSL, the shading language used by OpenGL and WebGL
  • Matrix computation to set up transformations
  • Vertex buffers to hold data about vertex positions, normals, colors, and textures
  • matrix math to animate shapes

Cleary WebGL is bit tough given the amount of careful coding , mapping and shading it requires .

Proceeding to some JS libraries that can make 3D easy for us .








  • MIT license
  • javascript 3D engine ie ( WebGL + more)
  • started a year ago
  • under active development
  • no dependencies or installation

3D space with webcam input as texture

Display the video as a plane which can be viewed from various angles in a given background landscape. Credits for code :

1.Use code from slide 10 to get user’s webcam input through getUserMedia

  1. Make a Screen , camera and renderer as previously described

  2. Give orbital CONTROLS for viewing the media plane from all angles

	controls = new THREE.OrbitControls( camera, renderer.domElement );
  1. Add point LIGHT to scene

  2. Make the FLOOR with an image texture

	var floorTexture = new THREE.ImageUtils.loadTexture( 'imageURL.jpg' );
	floorTexture.wrapS = floorTexture.wrapT = THREE.RepeatWrapping;
	floorTexture.repeat.set( 10, 10 );
	var floorMaterial = new THREE.MeshBasicMaterial({map: floorTexture, side: THREE.DoubleSide});
	var floorGeometry = new THREE.PlaneGeometry(1000, 1000, 10, 10);
	var floor = new THREE.Mesh(floorGeometry, floorMaterial);
	floor.position.y = -0.5;
	floor.rotation.x = Math.PI / 2;

6. Add Fog

scene.fog = new THREE.FogExp2( 0x9999ff, 0.00025 );

7.Add video Image Context and Texture.

video = document.getElementById( 'monitor' ); 
videoImage = document.getElementById( 'videoImage' ); 
videoImageContext = videoImage.getContext( '2d' ); 
videoImageContext.fillStyle = '#000000'; 
videoImageContext.fillRect( 0, 0, videoImage.width, videoImage.height ); 
videoTexture = new THREE.Texture( videoImage ); 
videoTexture.minFilter = THREE.LinearFilter; 
videoTexture.magFilter = THREE.LinearFilter; 
var movieMaterial=new THREE.MeshBasicMaterial({map:videoTexture,overdraw:true,side:THREE.DoubleSide}); 
var movieGeometry = new THREE.PlaneGeometry( 100, 100, 1, 1 ); 
var movieScreen = new THREE.Mesh( movieGeometry, movieMaterial ); 
  1. Set camera position
  1. Define the render function
    videoImageContext.drawImage( video, 0, 0, videoImage.width, videoImage.height );
    renderer.render( scene, camera );
  1. Animation
   requestAnimationFrame( animate );


Augmented Reality in WebRTC Browser - Google Slides (4)




1. Spinning Colored Cube

Step 1 : Get three.js from :

Step 2 : Make a empty HTML5 page and import the script + basic styling of page

		<title>Spinning colored Cube</title>
			body { margin: 0; }
			canvas { width: 100%; height: 100% }
		<script src="js/three.min.js"></script>
		<script>// Our Javascript will go here. </script>

Step 3 : Scene

var scene = new THREE.Scene();	

Step 4 : Camera
Camera types in three.js are CubeCamera , OrthographicCamera, PerspectiveCamera. We are using Perspective camera here . Attributes are field of view , aspect ratio , near and far clipping plane.

var camera = new THREE.PerspectiveCamera( 75, window.innerWidth / window.innerHeight, 0.1, 1000 );

Step 5: Renderer
Renderer uses a <canvas> element to display the scene to us.

var renderer = new THREE.WebGLRenderer();
renderer.setSize( window.innerWidth, window.innerHeight );
document.body.appendChild( renderer.domElement );

Step 6: . BoxGeometry object contains all the points (vertices) and fill (faces) of the cube.

var geometry = new THREE.BoxGeometry( 1, 1, 1 );

Step 7: Material
threejs has materials like – LineBasicMaterial , MeshBasicMaterial , MeshPhongMaterial , MeshLambertMaterial
These have their properties like -id, name, color , opacity , transparent etc. Use MeshBasicMaterial and color attribute of 0x00ff00, which is green.

	var material = new THREE.MeshBasicMaterial( { color: 0x00ff00 } );

Step 8: Mesh
A mesh is an object that takes a geometry, and applies a material to it, which we then can insert to our scene, and move freely around.

var cube = new THREE.Mesh( geometry, material );

Step 9: By default, when we call scene.add(), the thing we add will be added to the coordinates (0,0,0). This would cause both the camera and the cube to be inside each other. To avoid this, we simply move the camera out a bit.

scene.add( cube );
	camera.position.z = 5;

Step 10: Create a loop to render something on the screen

function render() {
	requestAnimationFrame( render );
	renderer.render( scene, camera );

This will create a loop that causes the renderer to draw the scene 60 times per second.
Step 11 : Animating the cube
This will be run every frame (60 times per second), and give the cube a nice rotation animation

cube.rotation.x += 0.1;
cube.rotation.y += 0.1;

Augmented Reality in WebRTC Browser - Google Slides (1)

2. Shaded Material on Sphere

Stepp 1 : create a empty page and import three.min.js and jquery

		<title>Shaded Material on Sphere </title>
			body { margin: 0; }
			canvas { width: 100%; height: 100% }
		<script src="js/jquery.min.js"></script>
<script src="js/three.min.js"></script>
	<script>// Our Javascript will go here.</script>
	<div id="container"></div>

Step 2 : Repeat the same steps at in previous example

var scene = new THREE.Scene();
var camera =  new THREE.PerspectiveCamera(45, 600/600 , 0.1, 10000);
var renderer = new THREE.WebGLRenderer();
renderer.setSize(600 , 600 );	
camera.position.z = 300;	// the camera starts at 0,0,0 so pull it back

3. Create the sphere’s material as MeshLambertMaterial
MeshLambertMaterial is non-shiny (Lambertian) surfaces, evaluated per vertex. Set the color to red .

var sphereMaterial =  new THREE.MeshLambertMaterial(  { color: 0xCC0000  });

4. create a new mesh with sphere geometry ( radius, segments, rings) and add to scene

var sphere = new THREE.Mesh(  new THREE.SphereGeometry(  50, 16, 16 ),  sphereMaterial);

5. Light
Create light , set its position and add it to scene as well . Light can be point light , spot light , directional light .

var pointLight = new THREE.PointLight(0xFFFFFF);
pointLight.position.x = 10;
pointLight.position.y = 50;
pointLight.position.z = 130;

6. Render the whole thing

renderer.render(scene, camera);

Augmented Reality in WebRTC Browser - Google Slides (2)

3. Complex objects like Torusknot

Step 1 : Same as before make scene , camera and renderer

scene = new THREE.Scene();
camera = new THREE.PerspectiveCamera(125, window.innerWidth / window.innerHeight, 1, 500);
camera.position.set(0, 0, 100);
camera.lookAt(new THREE.Vector3(0, 0, 0));
var renderer = new THREE.WebGLRenderer(); 
renderer.setSize( window.innerWidth, window.innerHeight );
document.body.appendChild( renderer.domElement );

Step 2 : Add the lighting

var light = new THREE.PointLight(0xffffff);
light.position.set(0, 250, 0);
var ambientLight = new THREE.AmbientLight(0x111111);

Step 3 : Add Torusknotgeometry with radius, tube, radialSegments, tubularSegments, arc

var geometry = new THREE.TorusKnotGeometry( 8, 2, 100, 16, 4, 3 ); 
var material = new THREE.MeshLambertMaterial( { color: 0x2022ff } );
var torusKnot = new THREE.Mesh( geometry, material ); 
torusKnot.position.set(3, 3, 3);
scene.add( torusKnot );
camera.position.z =25;

Step 4 : Do the animation and render on screen

var render = function () { 
    requestAnimationFrame( render ); 
    torusKnot.rotation.x += 0.01;    
    torusKnot.rotation.y += 0.01; 
    renderer.render(scene, camera);

Augmented Reality in WebRTC Browser - Google Slides (3)

TFX Widgets Development

TFX is a modular widget based WebRTC communication and collaboration solution. It is a customizable solution where developers can create and add their own widget over the underlying WebRTC communication mechanism . It can support extensive set of user activity such as video chat , message , play games , collaborate on code , draw something together etc . It can go as wide as your imagination . This post describes the process of creating widgets to host over existing TFX platform .


It is required to have TFX Chrome extension installed and running from Chrome App Store under above . To do this follow the steps described in TangoFX v0.1 User’s manual.

Test TFX Sessions  ?

TFX Sessions uses the browser’s media API’s , like getUserMedia and Peerconnection to establish p2p media connection . Before media can traverse between 2 end points the signalling server is required to establish the path using Offer- Answer Model . This can be tested by making unit test cases on these function calls .

TFX Sessions uses socketio based handshake between peers to ascertain that they are valid endpoints to enter in a communication session . This is determined by SDP ( Session Description Parameters ) . The same can be observed in chrome://webrtc-internals/ traces and graphs .

TangoFX v8 Developer's manual - Google Docs

How to make widgets using TFX API ?

Step 1:  To make widgets for TFX , just write your simple web program which should consist of one main html webpage and associated css and js files for it .

Step 2 : Find an interesting idea which is requires minimal js and css . Remember it is a widget and not a full fleshed web project , however js frameworks like requirejs , angularjs , emberjs etc , work as well.

Step 3: Make a compact folder with the name of widget and put the respective files in it. For example the html files or view files would go to src folder , javascript files would goto js folder , css files would goto css folder , pictures to picture folder , audio files to sound folder and so on .

Step 4 : Once the widget is performing well in standalone environment , we can add a sync file to communicate the peer behaviors across TFX network . For this we primarily use 2 methods .

  • SendMessage : To send the data that will be traversed over DataChannel API of TFX . The content is in json format and will be shared with the peers in the session .
  • OnMessage : To receive the message communicated by the TFX API over network

Step 5: Submit the application to us or test it yourself by adding the plugin description in in widgetmanifest.json file . Few added widgets are

"plugintype": "code",
"id" 	:"LiveCode",
"type" 	: "code",
"title" : "Code widget",
"icon" 	: "btn btn-style glyphicon glyphicon-tasks",
"url"	: "../plugins/AddCode/src/codewindow.html"

"plugintype": "draw",
"id" 	:"Draw",
"type" 	: "draw",
"title" : "Code widget",
"icon" 	: "btn btn-style glyphicon glyphicon-pencil",
"url"	: "../plugins/AddDraw/src/drawwindow.html"

"plugintype": "pingpong",
"id"   :"Pingpong",
"type" : "pingpong",
"title": "ping pong widget",
"icon" : "btn btn-style glyphicon glyphicon-record",
"url"  : "../plugins/AddPingpong/src/main.html"

Step 6 : For proper orientation of the application make sure that overflow is hidden and padding to left is atleast 60 px so that it doesnt overlap with panel
padding-left: 60px;
overflow: hidden;

Step 7 : Voila the widget is ready to go .

Simple Messaging Widget

For demonstration purpose I have summarised the exact steps followed to create the simple messaging widget which uses WebRTC ‘s Datachannel API in the back and TFX SendMessage & OnMessage API to achieve

Step 1 : Think of a general chat scenario as present in various messaging si

Step 2: Made a folder structure with separation for js , css and src. Add the respective files in folder. It would look like following figure:

TangoFX v8 Developer's manual - Google Docs (1)
Step 3 : The html main page is

&amp;lt;!-- jquerry --&amp;gt;
&amp;lt;script src="../../../../js/jquery/jquery.min.js"&amp;gt;&amp;lt;/script&amp;gt;
&amp;lt;link rel="stylesheet" type="text/css" href="../../../../css/jquery-ui.min.css"/&amp;gt;
&amp;lt;link rel="stylesheet" type="text/css" href="../css/style.css"&amp;gt;

&amp;lt;div id="messages" class="message_area"&amp;gt;
&amp;lt;textarea disabled class="messageHistory" rows="30" cols="120" id="MessageHistoryBox"&amp;gt;&amp;lt;/textarea&amp;gt;
&amp;lt;input type ="text" id="MessageBox" size="120"/&amp;gt;
&amp;lt;script src="../js/sync.js" type="text/javascript"&amp;gt;&amp;lt;/script&amp;gt;

Step 4 : The contents of css file are

body {padding: 0; margin: 0; overflow: hidden;}

	  padding-left: 60px;
	background-color: transparent;
	background: transparent;

Step 5 : The contents of js file

//send message when mouse is on mesage dicv ans enter is hit
    if(event.keyCode == 13){
    var msg=$('#MessageBox').val();
    //send to peer
    var data ={

function addMessageLog(msg){
    //add text to text area for message log for self
 $('#MessageHistoryBox').text( $('#MessageHistoryBox').text() + '\n'+ 'you : '+ msg);

// handles send message
function sendMessage(message) {
      var widgetdata={
  // postmessage

//to handle  incoming message
function onmessage(evt) {
    //add text to text area for message log from peer
  if(!=null ){
      $('#MessageHistoryBox').text( $('#MessageHistoryBox').text() +'\n'+ 'other : '+ );


Step 6: The end result is :

TangoFX v8 Developer's manual - Google Docs (2)

Developing a cross origin Widget ( XHR)

Let us demonstrate the process and important points to create a cross- origin widget :

step 1 : Develop a separate web project and run it on a https

step 2 : Add the widget frame in TFX . Following is the code I added to make an XHR request over GET

var xmlhttp;
xmlhttp=new XMLHttpRequest();
  if (xmlhttp.readyState==4 &amp;amp;&amp;amp; xmlhttp.status==200)

step 3 : Using self made https we have have to open the url separately in browser and give it explicit permission to open in advanced setting. Make sure the original file is visible to you at the widgets url .

TangoFX v8 Developer's manual - Google Docs (3)
step 4: Adding permission to manifest for access the cross origin requests

"permissions": [

Step 5 : Rest of the process are similar to develop a regular widget ie css and js .

Step 6: Resulting widget on TFX

TangoFX v8 Developer's manual - Google Docs (4)

Note 1 :
In absence of changes to manifest file the cross origin request is meet with a Access-Control-Allow-Origin error .

Note 2:
While using POST the TFX responds with Failed to load resource: the server responded with a status of 404 (Not Found)

Note 3:
Also if instead of https http is used the TFX still responds with Failed to load resource: the server responded with a status of 404 (Not Found)

TFX WebRTC SaaS (Software as a Service )

TFX  is WebRTC based communication platform built entirely on open standards making it extensively scalable. The underlying API completely masks the communication  aspect and lets the user enjoy an interactive communication session. It also supports easy to build widgets framework which can be used to build applications on the TFX platform .

TFX Sessions

TFX sessions is a part of TFX . It is a free Chrome extension WebRTC client that enables parties communicating and collaborating, to have an interactive and immersive experience. You can find it on Chrome Webstore here .

Features of TFX Sessions:

Through TFX, users can have instant multimedia Internet call sessions .

The core  features are :

  • No signin or account management
  • No additional requirement like Flash , Silverlight or Java
  • URL based session management
  • secure WebRTC based communication
  • complete privacy with no user tracking or media flow interruption
  • Ability to share session on social network platforms like Facebook , twitter , linkedin , gmail , google plus etc
Screenshot from 2015-05-19 11:47:42
  • ability to choose between multiple cameras
Screenshot from 2015-05-19 11:53:06

The TFX platform has developer friendly APIs to help build widgets. Some of the pre-built widgets available on TFX are:

  • Coding
  • Drawing
Screenshot from 2015-04-13 16:28:22
  • Multilingual chat
  • Screen sharing

TFX sessions is free for personal use and can be downloaded from Chrome Webstore.

What is the differentiator with other internet call services?

  • No registration , login for account management required
  • Communication is directly between peer to peer ie information privacy.
  • Third party apps , services can be included as widgets on TFX platform.
  • Can be skimmed to be embedded inside Mobile app webview , iframe, other portals etc anytime .

TFX Sessions Integration Models

The 3 possible approaches for TFX Integration  in increasing order of deployment time are  :

  1. WebSite’s widget on TFX chrome extension .
  2. Launch TFX extension in an independent window from website
  3. TFX call from embedded Window inside the website page

1. WebSite’s widget on TFX chrome extension .

This outlines the quickest deliverable approach of building the websites own customized widget on TFX widgets API and deployed on existing TFX communication setup .

Step 1 : Login using websites credentials to access the content

WebSite’s widget on TFX chrome extension

Step 2 : Access the website with the other person inside the TFX “ Pet Store “ Widget

WebSite’s widget on TFX chrome extension

2. Launch TFX in an independent window from “Click to Call” Button on website

This approach outlines the process of launching TFX in an independent window from a click of a button on website. However it is a prerequisite to have TFX extension installed on your Chrome browser beforehand.

Step 1 : Have TFX installed on chrome browser
Step 2 : Trigger and launch TFX chrome extension window on click of button on webpage

Launch TFX extension in an independent window from website

3. TFX call from embedded Window inside Website page

This section if for the third approach which is of being able to make TFX calls from embedded Window inside of the webpage. Refer to sample screen below :

Step 1 : Have TFX embedded in an iframe inside the website

TFX Integration Models (3)
Step 2 : Make session on click of button inside the iframe.

TFX Integration Models (4)

Technical Details about TFX like architecture , widgets development , components description etc can be found here : TFX Platform

To get more information about TFX and/or making custom widgets get i touch with me at or email to

WebRTC Security

Unlike most conventional  real-time systems (e.g., SIP-based soft phones) WebRTC communications  are directly controlled by a Web server over some signalling protocol which may be XMPP , websockets , , Ajax etc . This poses new challenges such as

  • Web browser might expose a JavaScript APIs which allows  web server to place a video call itself.This may cause web pages to secretly record and stream the webcam activity from user’s computer
  • malicious calling services can record the user’s conversation and misuse
  • malicious webpages can lure users via advertising and execute auto calling services .
  • Since JavaScript calling APIs are implemented as browser built-ins , un authorized access to these can also make user’s audio and camera streams vulnerable
  • If program and APIs allow the server to instruct the browser to send arbitrary content, then they can be used to bypass firewalls or mount denial of service attacks.

The general goal of security is to identify and resolve security issues during the design phase so they do not cost service provider time, money, and reputation at a later phase. Security for a large architecture project involves many aspects, there is no one device or methodology to guarantee that an architecture is now “secure” Areas that malicious individuals will attempt to attack include but are not limited to:

  • Improperly coded applications
  • Incorrectly implemented protocols
  • Operating System bugs
  • Social engineering and phishing attacks

As security is a broad topic touching on many sections of WebRTC this section is not meant to address all topics but instead to focus on specific “hot spots”, areas that require special attention due to the unique properties of the WebRTC service. There are several security-related topics that are of particular interest with respect to WebRTC.  They can be grouped into the following areas:

  1. Identity Management
  2. Browser Security
  3. Authentication
  4. Media encryption

The are discussed in detail below :

Identity Management

Support of WebRTC should not increase security risk to telecom network. Any device or software that is in the hands of the customer will be compromised, it is just a mater of time

  • All data received from untrusted sources (i.e. all data from customer controlled devices or software) must be validated.
  • Any data sent to the client will be obtained by malicious users

Provide exceptional protection for our customer’s data and make all reasonable attempts at protecting the customer from their own mistakes that may compromise their own systems. —Ensure that the new service does not adversely impact the data security, privacy, or service of existing customers.

Browser Security

Specific security concerns include:

Cross-site scripting (XSS)

A type vulnerability typically found in Web applications (such as web browsers through breaches of browser security) that enables attackers to inject client-side script into Web pages viewed by other users.

  • A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same origin policy.
  • Cross-site scripting carried out on websites accounted for roughly 80.5% of all security vulnerabilities documented by Symantec as of 2007 according to Wikipedia.
  • Their effect may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site’s owner.

As the primary method for accessing WebRTC is expected to be using HTML5 enabled browsers there are specific security considerations concerning their use such as; protecting keys and sensitive data from cross-site scripting or cross-domain attacks, websocket use, iframe security, and other issues. —Because the client software will be controlled by the user and because the browser does not, in most cases, run in a protected environment there are additional chances that the WebRTC client will become compromised. This means all data sent to the client could be exposed.

  • keys
  • hashes
  • registration elements (PUID etc.)

Therefore additional care needs to be taken when considering what information is sent to the client, and additional scrutiny needs to be performed on any data coming from the client.


(User Interface redress attack, UI redress attack, UI redressing) is a malicious technique of tricking a Web user into clicking on something different to what the user perceives they are clicking on, thus potentially revealing confidential information or taking control of their computer while clicking on seemingly innocuous web pages. It is a browser security issue that is a vulnerability across a variety of browsers and platforms, a clickjack takes the form of embedded code or a script that can execute without the user’s knowledge, such as clicking on a button that appears to perform another function. —Compromised personal computer with installed adware, viruses, spyware such as trojan horses, etc. can also compromise the browser and obtain anything the browser sees.


Authentication happens on different levels

End user Authentication

Through UID ( unique ID ) of USER

Device Authentication

  • SIM enabled devices follow standard IMS-AKA authentication
  • Non-SIM enabled “devices” are authenticated using user authentication

Application Authentication

  • Model mirrors current application onboarding procedures.
  • Application developers need to establish service agreement
  • Client_Id secrets are exchanged as part of this process.
  • Use  security gateway for authenticating applications

The Browser Threat Model

The browser acts as a TRUSTED COMPUTING BASE (TCB) both from the user’s perspective and to some extent from the  server’s.  HTML and JavaScript (JS) provided by the web server can execute scripts on browser and generate actions and events . However browser  operates in a sandbox that isolates these scripts both from the user’s computer and from server .

Access to Local Resources

The users computer may have lot of private and confidential data on the disk . Browser do make it mandatory that user must explicitly select the file and consent to its upload before doing file upload and transfer transactions . However still it is not very rare that misleading text and buttons can make users click files .  

Another way of accessing local resources is through downloading malicious files to users computer which are executable and may harm users computer.

SOP or Same Origin Policy

SOP  forces scripts from each site to run in their own, isolated, sandboxes.  It enables webpages and scripts from the same origin server to interact with each other’s JS variables, but prevents pages from the different origins or even iframes on the same page to not exchange information.

As part of SOP scripts are allowed to make HTTP requests via the  XMLHttpRequest() API to only those server which have same ORIGIN/domain as that of the originator .

CORS [Cross-Origin Resource Sharing ]

CORS enables multiple web services to intercommunicate . Therefore when a script from origin A executes what would otherwise be a forbidden cross-origin request, the browser instead contacts the target server B to determine whether it is willing to allow cross-origin requests from A.  If it is so willing, the browser then allows the request.  This consent verification process is designed to safely allow cross-origin requests.


Even websockets overcome SOP and establish cross origin transport channels .

Once a WebSockets connection has been established from a script to a site, the script can exchange any traffic it likes without being required to frame it as a series of HTTP request/response transactions.

WebSockets use masking technique to randomize the bits that are being transmitted , thus making it more difficult to generate traffic which resembles a given protocol , thus making it difficult for inspection from flowing traffic .


Jsonp is a hack designed to bypass origin restriction through script tag injection. A JSONp enabled server passes the response in user specified function

when we use <script> tags the domain limitation is ignored ie we can load scripts from any domain .  So when we need to fetch get exchange data just pass callback parameters through scripts . For example

function mycallback(data){
// this is the callback function executed when script returns
alert("hi"+ data);</span>
var script = document.createElement('script');
script.src = '//'

There have been found vulnerabilities in the existing Java and Flash consent verification techniques and handshake.

Security around ICE and TURN


Sender and receiver are able to share media stream after a offer answer handshake. But we already need one in order to do NAT hole-punching. Presuming the ICE server is malicious , in absence of transaction IDs by stun unknow to call scripts , it is not possible for the webpage of receiver to ascertain is the data is forged or original . Thus to prevent this the browser must generate hidden transaction Id’s and should not sharing with call scripts ,even via a diagnostic interface.

IP Location Privacy

As soon as the callee sends their ICE candidates, the caller learns the callee’s IP addresses.  The callee’s server reflexive address reveals a lot of information about the callee’s location.

To prevent server should suppress the start of ICE negotiation until the callee has answered.

Also user may hide their location entirely by forcing all traffic through a TURN server.

Communications Security

Goal of webrtc based call services should be to create channel which is secure  against both message recovery and message modification for all audio / video and data .

Threats from Screen Sharing

With the increasing requirement of screen sharing in web app and communication systems there is always a high threat of oversharing / exposing confidential passwords , pins , security details etc . This may either through some part of screen or some notification whihc pops up .

There is always the case when the user may believe he is sharing a window when in fact they are the entire desktop.

The attacker may request screensharing and make user open his webmail , payment settings or even net-banking accounts .

Long term access to camera and microphone

When user frequently uses a site he / she may want to give the site a long-term access to the camera and microphone ( indicated by ” Always allow on this site ” in chrome ). However the site may be hacked and thus initiate call on users’ computer automatically to secretly listen-in .

False UI shows cut off call while still being active

Unless the user checks his laptops glowing camera light LED or goes and monitors the traffic himself he would not know if there is active call in background, which according to him he had cut off . In such a case an attacker may pretend to cut a call shows red phone signs and supportive text but still keep the session and media stream active placing himself on mute .

During-Call Attack

Even if the calling service cannot directly access keying material ,it  can simply mount a man-in-the-middle attack on the connection. The idea is to mount a bridge capturing all the traffic.

To protect against this it is now mandatory to use https for using getusermedia and otherwise also recommended to keep webrtc comm services on https or use strict fingerprinting .
This section is derived from Security Considerations for WebRTC draft-ietf-rtcweb-security-08

So does existing WebRTC model offer security ?

We know that the forces behind WebRTC standardization are WHATWG, W3C, IETF and strong internet working groups . WebRTC security was already taken into consideration when standards were being build for it . The encryption methods and technologies like DTLS and SRTP were included to safeguard users from intrusions so that the information stays protected.

WebRTC media stack has native built-in features that address security concerns. The peer-to-peer media is already encrypted for privacy . Figure below:

WebRTC media stack Solution Architecture - Google Slides (1)
WebRTC media stack

Media Encryption

Primary issue with supporting DTLS is it can put a heavy load on the SBC’s handling encryption/decryption duties. —Interworking DTLS-SRTP to SDES is CPU intensive

  • SRTP from DTLS-SRTP end flows easily
  • SRTP from SDESC end requires auth+decrypt, and encrypt+auth

Reason:  DTLS-SRTP handshake has both ends choose “half” of the SRTP key The Encrypted Key Transport (EKT) proposed by Cisco solves this problem and provides additional security. Recommendation is to use DTLS-SRTP with EKT enhancements . Note: In order to avoid potential security issues, the SRTP authentication tag length used by the base authentication method must be at least ten octets.

For WebRTC to transfer real time data, the data is first encrypted using the DTLS (Datagram Transport Layer Security) method. This is a protocol built into all the WebRTC supported browsers from the start (Chrome, Firefox and Opera). On a DTLS encrypted connection, eavesdropping and information tampering cannot take place.

Other than DTLS, WebRTC also encrypts video and audio data via the SRTP (Secure Real-Time Protocol) method ensuring that IP communications – your voice and video traffic – can not be heard or seen by unauthorized parties.

What is SRTP ?

The Secure Real-time Transport Protocol (or SRTP) defines a profile of RTP (Real-time Transport Protocol), intended to provide encryption, message authentication and integrity, and replay protection to the RTP data in both unicast and multicast applications.

Earlier models of VOIP communication such as SIP based calls had an option to use only RTP for communication thereby subjecting the endpoint users to lot of problem like compromising media Confidentiality  . However the WebRTC model mandates the use of SRTP hence ruling out insecurities of RTP completely. For encryption and decryption of the data flow SRTP utilizes the Advanced Encryption Standard (AES) as the default cipher.

What is DTLS ?

DTLS allows datagram-based applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. The DTLS protocol is based on the stream-oriented Transport Layer Security (TLS) protocol .

Together DTLS and SRTP enables the exchange of the cryptographic parameters so that the key exchange takes place in the media plane and are multiplexed on the same ports as the media itself without the need to reveal crypto keys in the SDP.

Today the browser acts as a TRUSTED COMPUTING BASE (TCB) where the HTML and JS act inside of a sandbox that isolates them both from the user’s computer.

A script cannot access user’s webcam , microphone , location , file , desktop capture without user’s explicit consent. When the user allows access, a red dot will appear on that tab, providing a clear indication to the user, that the tab has media access.

Figure depicting browser asking for user’s consent to access Media devices for WebRTC .

Untitled drawing

Figure depicting Media Capture active on browser with red dot .

Untitled drawing (1)
we know that XMLHttpRequest() API can be used to secretly send data from one origin to other and this can be used to secretly send information without user’s knowledge . However now , SAME ORIGIN POLICY (SOP) in browser’s prevents server A from mounting attacks on server B via the user’s browser, which protects both the user (e.g., from misuse of his credentials) and the server B (e.g., from DoS attack).

Security challenges with Web Server based WebRTC service are many for example :

  1. If the both the peers have WebRTC browser then one can place a WebRTC call to callee anytime this might result in denial of service .
  2. Since the media is p2p and also can override firewalls settings through TURN server , it can result in unwanted data being send to peer .
  3. One may secretly make calls to users through website and extract information .
  4. Threat from screen sharing, for example user might mistakenly share his internet banking screen or some confidential information.
  5. Giving long-term access to the camera and microphone for certain sites is also a concern . for example : since next time you visit a site that has access to your microphone and camera , they can secretly be viewing youe webcam and microphone inputs .
  6. Clever use of User Interface to mask a ongoing call can mislead the user into believing that call has been cut while it is secretly still ongoing.
  7. Network attackers can modify an HTTP connection through my Wifi router or hotspot to inject an IFRAME (or a redirect) and then forge the response to initiate a call to himself.
  8. As WebRTC doesn’t yet have an congestion control mechanism , it can eat up a large chunk of user’s bandwidth.
  9. By visiting chrome://webrtc-internals/ in chrome browser alone , one can view the full traces of all webRTC communication happening through his browser . The traces contain all kinds of details like signalling server used , relay servers , TURN servers , peer IP , frame rates etc .
WebRTC Internals

Ofcourse other challenges that arrive with any other webservice based architecture are also applicable here such as :

  1. Malicious Websites which automatically execute the attacker’s scripts.
  2. User can be induced to download harmful executable files and run them.
  3. Improper use of W3C Cross-Origin Resource Sharing (CORS) to bypass SAME ORIGIN POLICY (SOP) .

How can I make my WebRTC solution secure ?

In the recent months everyone has been trying to get into the WebRTC space but at the same time fearing that hackers might be able to listen in on conferences, access user data, or even private networks. Although development and usage around WebRTC is so simple , the security and encryption aspects of it are in the dim light.

Best practices to make your VOIP Solution more secure

A simple WebRTC architecture is shown in the figure below :

WebRTC media stack Solution Architecture - Google Slides (2)

By following the simple steps described below one can ensure a more secure WebRTC implementation . The same applies to healthcare and banking firms looking forth to use WebRTC as a communication solution for their portals .

Ensure that the signalling platform is over a secure protocol such as SIP / HTTPS / WSS . Also since media is p2p , the media contents like audio video channel are between peers directly in full duplex.

To protect against Man-In-The-Middle (MITM) attack the media path should be monitored regularly for no suspicious relay.

User’s that can participate in a call , should be pre registered / Authenticated with a registrar service. Unauthenticated entities should be kept away from session’s reach .

WebRTC authentication certificate
WebRTC authentication certificate

Make sure that ICE values are masked thereby not rendering the caller/ callee’s IP and location to each other through tracing in chrome://webrtc-internals/ or packet detection in Wirehsark on user’s end.

As the signalling server maintains the number of peers , it should be consistently monitored for addition of suspicious peers in a call session. If the number of peers actually present on signalling server is more that the number of peers interacting on WebRTC page then it means that someone is eavesdropping secretly and should be terminated from session access by force.

It was observed that manyatimes non tech savy users simply agree to all permissions request from browser without actually consciously giving consent . Therefore user’s should be made aware of API in websites which ask for undue permissions . For example permission to :

Screenshot from 2015-04-22 15:22:15

Third party API should be thoroughly verified before sending their data on WebRTC DataChannel.

Before Desktop Sharing user’s should be properly notified and advised to close any screen containing sensitive information .

Keystore and master key protection

For storing logs , recording , file , ssh keys or any others ensitive informaton encrypted by keys , we need a safe storage for keys and these tools are handy for password and key management – Dashlane , Lastpass , Bitwarden, 1Password

Auto sign-in security measure for WebRTC apps

Turn User Authentication On and enable Two-Factor Authentication/Bio-metrics.

OTP based signons and captcha checks are also popular approaches to protect signins

Avoid Public Wi-Fi

Even a WebRTC connection can be tampered with an insecure Wifi . Even if the Man-in-middle cannot decipher message content he can make out the packet size , frequency , end parties in siagnling , time delay etc . Also as a precautionary measure enable Remote Lock and Data Wipe and only use authorized apps with permission to sensitive data such as image storage.

Native WebRTC apps

If you use a native WebRTC native app , there are mulitple thinsg that you need to be wary of.

Avoid All Jailbreaks : Jail-breaking a smartphone can enable the user to run unverified or unsupported apps, many of these apps carry security vulnerabilities. Majority of security exploits for Apple’s iOS only affect jailbroken iPhones.

Add a Mobile Security App : Mobile security reports shows that mobile operating systems such as iOS and (especially) Android are increasingly becoming targets for malware. Select a reputable mobile security app that extends the built-in security features of the device’s mobile operating system. Some well-known third-party security vendors offering mobile security apps for iOS, Android and Windows Phone – Avast , Kaspersky , Symantec

Also as a good practise Turn off the Bluetooth, Wi-Fi and NFC when not needed.

Information security

Forum : Huawei

Information security ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. 

Intentional Security breaches

Although WebRTC already has best secure tools in its spec list which provide end to end encrypted communication over SRTP DTLS as well as media device access mandatory from websites of secure origin over TLS, yet if the endpoints acting as peers themselves are compromised then all this is in vain . Hence security issues arises when

  • the endpoints are recording their media content and storing it on unsafe location such as public file servers or
  • endpoints are inturn re-streaming their incoming their media streams to unsafe streaming servers

Exploiting human vulnerabilities / Unintentional breaches

Phishing , Pretexting , Baiting attacks , Quid pro quo , Tailgating , Water-Holing are soe of the common tactics to steal teh data of a nonsuspecting user . They are as much applicate to WebRTC based communication site as they are to any other trusted website such as banking , customer care contacts , falsh sale portals , cupon / discount sites etc .

Phone phishing – Voice phishing a criminal phone fraud, imporsonating legitimate caller such as a bank or tax agent and using social engineering over the telephone system to gain access to private personal and financial information for the purpose of financial reward.

SMS phishing – SMS is used as a method to do phishing maya tiem to send malicious links posing as legitmate sender

Other social engineering tactics – Trickery , Influencing , Deception , Spying

Impersonation attacks – spear-phishing , emails that attempt to impersonate a trusted individual or a company in an attempt to gain access to corporate finances , Human resource details , sesitive data. Business email compromise (BECs) also known as CEO fraud is a popular example of an impersonation attack. The fake email usually describes a very urgent situation to minimize scrutiny and skepticism.

Network security breaches

Inspite of the fact that webrtc is a p2p streaming framework , there are always signalling server required which do the initial handshake and enable the exchange fo SDP for the media to stream in peer to peer fashion . Some wellknown attacks that compromise networks and remote / cloud server are :

  • Viruses, worms and Trojan horses
  • Zero-day attacks
  • Hacker attacks
  • Denial of service attacks
  • Spyware and adware

It is upto the WebRTC/ VoIP service provider to detect emerging threats before they infiltrate network and compromise data. Some crticial compoenets to enhance security are Firewalls , Access Control Lists , Intrusion detection and prevention systems (IDS/IPS) , Virtual private networks (VPN)

Governance Framework – defines the roles, responsibilities and accountability of each person and ensures that you are meeting compliance.

  • Confidentiality: ensures information is inaccessible to unauthorized people via encryption
  • Integrity: protects information and systems from being modified by unauthorized people; provides accuracy and trustworthyness
  • Availability: ensures authorized people can access the information when needed and that all hardware and software are maintained properly and updated when necessary
  • Authentication, Authorization and Accountability(AAA): 
    validate users autheticity via creds, enforcing policies on network resources after the user has gain access and ensuring accountability by means of monitoring and capturing the events done by the user
  • Non repudiation: is the assurance that someone cannot deny the validity of something. It provides proof of the origin of data and the integrity of the data.

What happens if your VOIP solution is on the verge of being compromised ?

As a first defence tactic , if a orignation ip address is sending malacious or malformed packets which depict an exploitation or attack , trigger and notification for tech team and execute script to block on the origin IP of attacker via security groups in AWS or other ACL list in hosted server . Can also implement temporary firewall block on it and later monitor it for more violations.

Incase a server is compromised beyond repair such as attacker taking control of the file system , drain the ongoing sessions from it and store cached storage with session state variable like CDR enteries . Activate the fallback / standby server and make the current server a honeypot to explore the attackers actions

Commonly attacks involve either

  • exploiting the VoIP system to get free internatoinal calls
  • ransomware activities such as scp the files out of server and leaveing behind a readme.txt file on root location asking for money transfer in return of data
  • bombard brute force DDOS attacks to bring down the system and make it incapible of catering to genuine requests , perhaps with the inetention of giving advantage to competitors .

As the media connections are p2p , even if we kill the signalling server , it will not affect the ongoing media sessions . Only the time duration ( probably 3 – 4 minutes ) it takes to restart the server , is when the users will not be able to connect to signalling server for creating new sessions . therefore incase a system is under attack and non recoverable , just terminate it and respawn other server attaced to the domain name or floating IP or Load balancer.

Most browsers today like Google Chrome and Mozilla Firefox have a good record of auto-updating themselves withing 24 hours of a vulnerability of threat occurring .

If a call is confirmed to be compromised , it should be within the power of Web Application server rendering the WebRTC capable page to cut off the compromised call session by force censing termation request to endpoints or via turning off the TURN services if in use .

References :

TURN server for WebRTC – RFC5766-TURN , Coturn , Xirsys , Twillio

STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) are protocols that can be used to provide NAT traversal for VoIP and WebRTC. These projects provide a VoIP media traffic NAT traversal server and gateway.

TURN Server is a VoIP media traffic NAT traversal server and gateway.

This article describes working with some of the TURN servers 

  • rfc5766-turn-server ( legacy )
  • Xirsys TURN service
  • Twillio’s TURN service
  • Asterisk ICE resolution
  • Kamailio NAT and STUN modules in Server

We know that a STUN only server such as the one below , may not work under firewalls like in enterperises or even in universities resulting in black video , one way video , inconcsistent streaming or even no video experience .

// google STUN 
var iceservers_array=[{urls: ''}] ;

To overcome this we rely on a publicaly avaiable TURN server which can step in and do ICE exchange to seup relay routes for the media stream. Some of the options for both self hosted and TURN as a service are described below .


It is a legacy project mainly archives for reference( links at the end of section) . This is a VoIP gateway for inter network communication which is opensource  MIT licensed .

platforms supported : Any client platform is supported, including Android, iOS, Linux, OS X, Windows, and Windows Phone. This project can be successfully used on other *NIX platforms ( Aamazon EC2) too. It supports flat file or Database based user management system ( MySQL , postgress , redis ). The source code project contains ,  TURN server ,  TURN client messaging library and some sample scripts to test various modules like protocol , relay , security etc .

Protocols : Protocols between the TURN client and the TURN server – UDP, TCP, TLS, and DTLS. Relay protocol – UDP , TCP .

Authentication : The authentication mechanism is using key which is calculated over the user name, the realm, and the user password. Key for the HMAC depends on whether long-term or short-term credentials are in use. For long-term credentials, the key is 16 bytes: key = MD5(username “:” realm “:” SASLprep(password))

Installation :  Since I used my Ubuntu Software center for installing the RFC turn server 5677 .

Screenshot from 2015-03-05 15:22:30

More information is on Ubuntu Manuals :

The content got stored inside /usr/share/rfc5766-turn-server. Also install mysql for record keeping

sudo apt-get install mysql-server

Intall MySQL workbench to monitor the values feed into the turn database server in MysqL. connect to MySQL instance using the following screenshot


The database formed with mysql after successful operation is as follows . We  shall notice that the initial db is absolutely null


Terminal Commands

These terminal command ( binary images ) get stored inside etc/init.d after installing

turnadmin – Its turn relay administration tool used for generating , updating keys and passwords . For generating a key to get long term crdentaial use -k command and for aading or updateing a long -term user use the -a command. Therefore a simple command to generate a key is

format : turnadmin -k -u -r -p

 turnadmin -k -u turnwebrtc -r -p turnwebrtc

The generated key is displayed in console . For example the following screenshot shows this :


To fill in user with long term credentails

Format : turnadmin -a [-b | -e | -M | -N ] -u -r -p

turnadmin -a -M "host=localhost dbname=turn user=turn password=turn" -u altanai -r -p 123456

Check the values reflected in MySQL workbench for long term user table . ( screenshot depicts two entries for altanai and turnwebrtc user )


you can also check it on console using the -l command

format :turnadmin -l –mysql-userdb=””

turnadmin -l --mysql-userdb="host= dbname=turn user=turnwebrtc password=turnwebrtc connect_timeout=30"

or we can also check using the terminal based mySQL client

mysql> use turn;
Database changed
mysql> select * from turnusers_lt;
| name | hmackey |
| altanai | 57bdc681481c4f7626bffcde292c85e7 |
| turnwebrtc | 6066cbe0b5ee14439b2ddfc177268309 |
2 rows in set (0.00 sec)

turnserver – Its command to handle the turnserver itself . We can use the simple turnserver command to start it without any db support using just turnserver. Screenshot for this is


We can use a database like mysql to start it with db connection string

Format : turnserver –mysql-userdb=””

turnserver --mysql-userdb="host= dbname=turn user=turnwebrtc password=turnwebrtc connect_timeout=30"


turnutils_uclient: emulates multiple UDP,TCP,TLS or DTLS clients.

turnutils_peer: simple stateless UDP-only “echo” server. For every incoming UDP packet, it simply echoes it back.

turnutils_stunclient: simple STUN client example that implements RFC 5389 ( using STUN as endpoint to determine the IP address and port allocated to it , keep-alive , check connectivity etc) and RFC 5780 (experimental NAT Behavior Discovery STUN usage) .

turnutils_rfc5769check: checks the correctness of the STUN/TURN protocol implementation. This program will perform several checks and print the result on the screen. It will exit with 0 status if everything is OK, and with (-1) if there was an error in the protocol implementation.


1. Test vectors from RFC 5769 to double-check that our STUN/TURN message encoding algorithms work properly. Run the utility to check all protocols :

$ cd examples
$ ./scripts/

2. TURN functionality test (bare minimum TURN example).

If everything compiled properly, then the following programs must run together successfully, simulating TURN network routing in local loopback networking environment:

console 1 :

$ ./scripts/basic/

console2 :

$ ./scripts/

If the client application produces output and in approximately 22 seconds prints the jitter, loss and round-trip-delay statistics, then everything is fine.


 { 'url': 'stun:'},
 { 'url': 'turn:', 'credential': '123456'}]

Insert the above piece of code on peer connection config . Now call from one network environment to another . For example call from a enterprise network behind a Wifi router to a public internet datacard webrtc agent . The call should connect with video flowing smoothly between the two .

TURN working accross Enterprise firrewall on a WebRTC video platform written with SimpeWebRTC lbrary . Project tango FX

website :

Download the executable from :


Project Coturn evolved from rfc5766-turn-server project with many new advanced TURN specs beyond the original RFC 5766 document. The databses supported are : SQLite , MySQL , PostgreSQL , Redis , MongoDB

Protocols :

The implementation fully supports the following client-to-TURN-server protocols: UDP  , TCP  , TLS  SSL3/TLS1.0/TLS1.1/TLS1.2; ECDHE , DTLS versions 1.0 and 1.2. Supported relay protocols UDP (per RFC 5766) and TCP (per RFC 6062)

Authetication :

Supported message integrity digest algorithms:

  • HMAC-SHA1, with MD5-hashed keys (as required by STUN and TURN standards)
  • HMAC-SHA256, with SHA256-hashed keys (an extension to the STUN and TURN specs)

Supported TURN authentication mechanisms:


Install libopenssl and libevent plus its dev or extra libraries .
OpenSSL has to be installed before libevent2 for TLS beacuse When libevent builds it checks whether OpenSSL has been already installed, and its version.

Download coturn readonly  from

svn checkout coturn-read-only

extract the tar contents
$ tar xvfz turnserver-.tar.gz

go inside the extracted folder and run the following command to build
$ ./configure
$ make
$ make install

Adding users in the format using turnadmin
$ Sudo turnadmin -a -u -r -p

$ Sudo turnadmin -a -u altanai -r -p 123456

Start the turn Server using turnserver from inside of /etc/init.d using the start command

$ sudo /etc/init.d/coturn start
Screenshot from 2015-01-06 12-08-15

The logs are usually stored in /var/log . Screenshot of log file


The default configured port is 3478.If other port is needed, change the file /etc/turnserver.conf


Specify the  values in Peer Connection

{ ‘url’: ‘stun:’},
{ ‘url’: ‘turn:’, ‘credential’: ‘123456’}]


TURN specs:

STUN specs:

  • RFC 3489 – STUN – Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)
  • RFC 5389 – Session Traversal Utilities for NAT (STUN)
  • RFC 5769 – test vectors for STUN protocol testing
  • RFC 5780 – NAT behavior discovery support
  • RFC 7443 – Application-Layer Protocol Negotiation (ALPN) Labels for STUN and TURN


  • RFC 5245 – ICE
  • RFC 5768 – ICE–SIP
  • RFC 6336 – ICE–IANA Registry
  • RFC 6544 – ICE–TCP
  • RFC 5928 – TURN Resolution Mechanism

website :

Chorme WebRTC Internals Page for COTURN connection


Xirsys is a provider for WebRTC infrastructure which included stun and turn server hosting as well .

The process of using their services includes singing up for a account and choosing whether you want a paid service capable of handling more calls simultaneously or free one handling only upto 10 concurrent turn connections .

The dashboard appears like this :


To receive the api one need to make a one time call to their service , the result of which contains the keys to invoke the turn services from webrtc script .

window.addEventListener("load", function (e) {
    let xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function ($evt) {
        if (xhr.readyState == 4 && xhr.status == 200) {
            let res = JSON.parse(xhr.responseText);
            console.log("response: ", res);
           alert( iceservers_array);
    };"PUT", "", true);
    xhr.setRequestHeader("Authorization","Basic " + btoa("altanai:<sec rettoken>"));
    xhr.send(JSON.stringify({"format": "urls"}));

The resulting output should look like ( my keys are hidden with a red rectangle ofcourse )


The process of adding a TURN / STUN to your webrtc script in JS is as follows :

 {"username":"< put your API username>","url":"","credential":"< put your API credentail>"},
 {"username":"< put your API username>","url":"","credential":"< put your API credentail>"}]

website :

Twillio TURN Server

// before unload update status on main site
window.addEventListener("load", function (e) {
    let xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function ($evt) {
        if (xhr.readyState == 4 && xhr.status == 200) {
            let res = JSON.parse(xhr.responseText);
            console.log("response: ", res);
            iceservers_array = res.iceServers;
            console.log("iceservers_array: ", iceservers_array);
    };"POST", "https://<ourdomain>:3000/token", true);
    xhr.send(JSON.stringify({"format": "urls"}));

Asterisk TURN module and service

Asterisk is an open source carrer grade SIP server which also provides Firewall traversal . A github repo containing some asterisk dialplan examples is An article discussing Asterisk and its implementation

Kamailio NAT and STUN modules in Server

It is also interesting to note that majority of popular open source SIP servers also have an implementation , libabary or module for STUN/TURN such as Kamailio’s STUN module

You can read more about the kamailio DNS and NATing here

NAT traversal using STUN and TURN

We know that WebRTC is web based real-time communications on browser-based platform using the browser’s media application programming interface (API) and adding our JavaScript & HTML5 t control the media flow .
WebRTC has enabled developers to build apps/ sites / widgets / plugins capable of delivering simultaneous voice/video/data/screen-sharing capability in a peer to peer fashion.

But something which escapes our attention is the way in which media ia traversing across the network. Ofcourse the webrtc call runs very smoothly when both the peers are on open public internet without any restrictions or firewall blocks . But the real problem begins when one of the peer is behind a Corporate/Enterprise network or using a different Internet service provider with some security restrictions . In such a case the normal ICE capability of WebRTC is not enough , what is required is a NAT traversal mechanism .

STUN and TURN server protocols handle session initiations with handshakes between peers in different network environments . In case of a firewall blocking a STUN peer-to-peer connection, the system fallback to a TURN server which provides the necessary traversing mechanism through the NAT.

Lets study from the start ie ICE . What is it and why is it used ?

ICE (Interactive Connectivity Establishment )  framework ( mandatory by WebRTC standards  ) find network interfaces and ports in Offer / Answer Model to exchange network based information with participating communication clients. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN)

ICE is defined by RFC 5245 – Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols.

Sample WebRTC offer holding ICE candidates :

type: offer, sdp: v=0
o=- 3475901263113717000 2 IN IP4
t=0 0
a=group:BUNDLE audio video data
a=msid-semantic: WMS dZdZMFQRNtY3unof7lTZBInzcRRylLakxtvc
m=audio 9 RTP/SAVPF 111 103 104 9 0 8 106 105 13 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 F1:A8:2E:71:4B:4E:FF:08:0F:18:13:1C:86:7B:FE:BA:BD:67:CF:B1:7F:19:87:33:6E:10:5C:17:42:0A:6C:15
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
m=video 9 RTP/SAVPF 100 116 117 96
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 F1:A8:2E:71:4B:4E:FF:08:0F:18:13:1C:86:7B:FE:BA:BD:67:CF:B1:7F:19:87:33:6E:10:5C:17:42:0A:6C:15
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=rtpmap:96 rtx/90000
a=fmtp:96 apt=100
m=application 9 DTLS/SCTP 5000
c=IN IP4
a=fingerprint:sha-256 F1:A8:2E:71:4B:4E:FF:08:0F:18:13:1C:86:7B:FE:BA:BD:67:CF:B1:7F:19:87:33:6E:10:5C:17:42:0A:6C:15
a=sctpmap:5000 webrtc-datachannel 1024

Notice the ICE candidates under video and audio . Now take a look at the SDP answer

type: answer, sdp: v=0
o=- 6931590438150302967 2 IN IP4
t=0 0
a=group:BUNDLE audio video data
a=msid-semantic: WMS R98sfBPNQwC20y9HsDBt4to1hTFeP6S0UnsX
m=audio 1 RTP/SAVPF 111 103 104 0 8 106 105 13 126
c=IN IP4
a=rtcp:1 IN IP4
a=fingerprint:sha-256 7B:9A:A7:43:EC:17:BD:9B:49:E4:23:92:8E:48:E4:8C:9A:BE:85:D4:1D:D7:8B:0E:60:C2:AE:67:77:1D:62:70
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
m=video 1 RTP/SAVPF 100 116 117 96
c=IN IP4
a=rtcp:1 IN IP4
a=fingerprint:sha-256 7B:9A:A7:43:EC:17:BD:9B:49:E4:23:92:8E:48:E4:8C:9A:BE:85:D4:1D:D7:8B:0E:60:C2:AE:67:77:1D:62:70
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=rtpmap:96 rtx/90000
a=fmtp:96 apt=100
m=application 1 DTLS/SCTP 5000
c=IN IP4
a=fingerprint:sha-256 7B:9A:A7:43:EC:17:BD:9B:49:E4:23:92:8E:48:E4:8C:9A:BE:85:D4:1D:D7:8B:0E:60:C2:AE:67:77:1D:62:70
a=sctpmap:5000 webrtc-datachannel 1024

Call Flow for ICE

STUN call flow for WebRTC Offer Answer
STUN call flow for WebRTC Offer Answer

WebRTC needs SDP Offer to be send to the clientB Javascript code from clientA Javascript code . Client B uses this SDP offer to generate an SDP Answer for client A. The SDP ( as seen on chrome://webrtc-internals/ ) includes ICE candidates which punchs open ports in the firewalls.
However incase both sides are symmetric NATs the media flow gets blocked. For such a case TURN is used which tries to give a public ip and port mapped to internal ip and port so as to provide an alternative routing mechanism like a packet-mirror. It can open a DTLS connection and use it to key the SRTP-DTLS media streams, and to send DataChannels over DTLS.

In order to Understand this better consider various scenarios

1 . No Firewall present on either peer . Both connected to open public internet .

Diagrammatic representation of  this shown as follows :

WebRTC signalling and media flow on Open public network
WebRTC signalling and media flow on Open public network

In this case there is no restriction to signal or media flow and the call takes places smoothly in p2p fashion.

2.  Either one or both the peer ( could be many in case of multi conf call ) are present behind a firewall  or  restrictive connection or router configured for intranet .

In such a case the signal may pass with the use of default ICE candidates or simple ppensource google Stun server such as

{ ‘url’: “”}]

Diagram :

WebRTC signalling when peers are behind  firewalls
WebRTC signalling when peers are behind firewalls

However the media is restricted resulting in a black / empty / no video situation for both peers  . To combat such situation a relay mechanism such as TURN is required which essentially maps public ip to private ips thus creating a alternative route for media and data to flow through .

WebRTC media flow when peers are behind NAT . Uses TURN relay mechanism
WebRTC media flow when peers are behind NAT . Uses TURN relay mechanism

Peer config should look like :

var configuration =  {
iceServers: [
{ “url’:”stun::”},
{ “url”:”turn::”}

var pc = new RTCPeerConnection(configuration);

3. When the TURN server is also behind a firewall .  The config file of the turn server need to be altered to map the public and private IP

The diagrammatic description of this is as follows :

WebRTC media flow when peers are behind NAT and TURN server is behind NAT as well . TURN config files bind a public interface to private interface address.
WebRTC media flow when peers are behind NAT and TURN server is behind NAT as well . TURN config files bind a public interface to private interface address .

continue : Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

This blog is in continuation to the attempts / outcomes and problems in building a WebRTC to RTP media framework that successfully stream / broadcast WebRTC content to non webrtc supported browsers ( safari / IE ) / media players ( VLC )

Attempt 4: Stream the content to a WebRTC endpoint which is hidden in a video call . Pick the stream from vp8 object URL send to a streaming server

This process involved the following components :

  • WebRTC API : simplewebrtc on Chrome
  • Transfer mechanism from client to Streaming server:  webrtc media channel

Problems : No streaming server is qualified to handle a direct webrtc input and stream it on network .

Attempt 4.1 : Stream the content to a WebRTC endpoint . Do WebRTC Endpoint to RTP Endpoint bridge using Kurento APIs. 

Use the RTP port and ip address to input into a ffmpeg or gstreamer or VLC terminal command and out put a live H264 stream on another ip and port address .  

This process involved the following components :

  • API : Kurento
  • Transfer mechanism : HTML5 webrtc client -> application server hosting java -> media server -> application for webrtc media to RTP media conversation -> RTP player

Screenshots of attempts with Wowza to stream from a ip and port


problems :

  • The stream was black ie no video content .

Attempt 4.2 : Build a WebRTC Endpoint to Http endpoint in kurento and force the video audio encoding to be that of H264 and PCMU.

code for adding constraints to output media and forcing choice of codecs

MediaPipeline pipeline = kurento.createMediaPipeline();
WebRtcEndpoint webRtcEndpoint = new WebRtcEndpoint.Builder(pipeline).build();
HttpGetEndpoint httpEndpoint=new HttpGetEndpoint.Builder(pipeline).build();

org.kurento.client.Fraction fr= new org.kurento.client.Fraction(1, 30);
VideoCaps vc= new VideoCaps(VideoCodec.H264,fr);

AudioCaps ac= new AudioCaps(AudioCodec.PCMU, 65536);


code for using gstreamer filter to force the output in raw format . It is a alternate solution to above

//basic media operation of 1 pipeline and 2 endpoinst
MediaPipeline pipeline = kurento.createMediaPipeline();
WebRtcEndpoint webRtcEndpoint = new WebRtcEndpoint.Builder(pipeline).build();
RtpEndpoint rtpEndpoint = new RtpEndpoint.Builder(pipeline).build();

//adding Gstream filters
GStreamerFilter filter1 = new GStreamerFilter.Builder(pipeline, &quot;videorate max-rate=30&quot;).withFilterType(FilterType.VIDEO).build();
GStreamerFilter filter2 = new GStreamerFilter.Builder(pipeline, &quot;capsfilter caps=video/x-h264,width=1280,height=720,framerate=30/1&quot;).withFilterType(FilterType.VIDEO).build();
GStreamerFilter filter3 = new GStreamerFilter.Builder(pipeline, &quot;capsfilter caps=audio/x-mpeg,layer=3,rate=48000&quot;).withFilterType(FilterType.AUDIO).build();

//connecting all poin ts to one another
webRtcEndpoint.connect (filter1);
filter1.connect (filter2);
filter2.connect (filter3);
filter3.connect (rtpEndpoint);

// RTP SDP offer and answer
String requestRTPsdp = rtpEndpoint.generateOffer();

problem : The output is still webm

Attempt 5  : Use a RTP SDP Endpoint ( ie a SDP file valid for a given session ) and use it to play the WebRTC media over Wowza streaming server

This process involved the following components

  1. WebRTC Stream and object URL of the blob containing VP8 media
  2. Kurento  WebRTC Endpoint  bridge to generate SDP
  3. Wowza Streaming server

code for kurento to generate a SDP file from WebRTC to RTP bridge

@RequestMapping(value = &quot;/rtpsdp&quot;, method = RequestMethod.POST)
private String processRequestrtpsdp(@RequestBody String sdpOffer)
throws IOException, URISyntaxException, InterruptedException {

//basic media operation of 1 pipeline and 2 endpoinst
MediaPipeline pipeline = kurento.createMediaPipeline();
WebRtcEndpoint webRtcEndpoint = new WebRtcEndpoint.Builder(pipeline).build();
RtpEndpoint rtpEndpoint = new RtpEndpoint.Builder(pipeline).build();

//connecting all poin ts to one another
webRtcEndpoint.connect (rtpEndpoint);

// RTP SDP offer and answer
String requestRTPsdp = rtpEndpoint.generateOffer();

// write the SDP conector to an external file
PrintWriter out = new PrintWriter(&quot;/tmp/test.sdp&quot;);

HttpGetEndpoint httpEndpoint = new HttpGetEndpoint.Builder(pipeline).build();
PlayerEndpoint player = new PlayerEndpoint.Builder(pipeline, requestRTPsdp).build();

// Playing media and opening the default desktop browser;
String videoUrl = httpEndpoint.getUrl();
System.out.println(&quot; ------- video URL -------------&quot;+ videoUrl);

// send the response to front client
String responseSdp = webRtcEndpoint.processOffer(sdpOffer);

return responseSdp;

problems : wowza doesnt not recognize the WebRTC SDP and play the video

screenshot of wowza with SDP input

Screenshot from 2015-01-30 15:28:59

Attempt 5.1 : Use a RTP SDP Endpoint ( ie a SDP file valid for a given session ) and use it to play the WebRTC media over Default Ubuntu media player 

SDP file formed contains contents such as :

o=- 3631611195 3631611195 IN IP4
s=Kurento Media Server
c=IN IP4
t=0 0
m=audio 42802 RTP/AVP 98 99 0
a=rtpmap:98 OPUS/48000/2
a=rtpmap:99 AMR/8000/1
a=rtpmap:0 PCMU/8000
a=ssrc:2713728673 cname:user59375791@host-ad1117df
m=video 35946 RTP/AVP 96 97 100 101
a=rtpmap:96 H263-1998/90000
a=rtpmap:97 VP8/90000
a=rtpmap:100 MP4V-ES/90000
a=rtpmap:101 H264/90000
a=ssrc:93449274 cname:user59375791@host-ad1117df

problem : deformed media

screenshot of playing from a SDP file

Screenshot from 2015-01-29 17:42:21

Attempt 5.2 : Use a RTP SDP Endpoint ( ie a SDP file valid for a given session ) and use it to play the WebRTC media over VLC using socket input

problem : nothing plays

screenshot of VLC connected to play from socket and failure to play anything

Screenshot from 2015-01-21 17:49:52

Attempt 5.3: Create a WebRTC endpoint and connected it to RTP endpoint via media pipelines . Also make the RTP SDP offer and answering the same . Play with ffnpeg / ffplay / gst playbin

String requestRTPsdp = rtpEndpoint.generateOffer();

Write the requestRTPsdp to a file and obtain a RTP connector endpoint with Application/SDP .It plays okay with gst playbin ( 10 secs without audio )

Successful attempt to play from a gst playbin

gst-launch -vvv playbin uri=file:///tmp/test.sdp 

donekurento streaming

but refuses to be played by VLC , ffplay and even wowza . The error generated with

ffmpeg -i test.sdp -vcodec copy -acodec copy -f mpegts output-file.ts


ffmpeg -re -i test.sdp -vcodec h264 -acodec mp3 -f mpegts “udp://”


Could not find codec parameter for stream1 ( video:h263, none ) .Other errors types are , Could not write header for output file output file is empty nothing was encoded

Error screenshots of trying to play the RTP SDP file with ffmpeg

ffmpeg error kurebto1
ffmpeg error kurebto2

Attempt 6 : Use a WebRTC capable media and streaming server ( eg Kurento )  to pick a live stream of VP8 .

Convert the VP8 to H264  ( ffmpeg / RTP endpoint ) .

Convert H264 to Mp4 using MP4 parser and pass to a streaming server  ( wowza)

In process to be updated .

Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

As the title of this article suggests I am going to pen my attempts of streaming / broadcasting Live Video WebRTC call to non WebRTC supported browsers and media players such as VLC , ffplay , default video player in Linux etc .

I am currently attempting to do this by making my own MP4 engine from WebRTC feed . However I am sharing my past experiments in hope of helping someone whose objective is not the same as mine and might get some help from these threads .

Attempt 1 : use one to many brodcasting API :

<table class=”visible”> 
<td style=”text-align: right;”> 
<input type=”text” id=”conference-name” placeholder=”Broadcast Name”> </td> 
<td> <select id=”broadcasting-option”> <option>Audio + Video</option> <option>Only Audio</option> <option>Screen</option> </select> </td> 
<td> <button id=”start-conferencing”>Start Broadcasting</button> </td> </tr> 
<table id=”rooms-list” class=”visible”></table> 
<div id=”participants”></div> 
<script src=”RTCPeerConnection-v1.5.js”>
</script> <script src=”firebase.js”></script> 
<script src=”broadcast.js”></script> 
<script src=”broadcast-ui.js”></script>

It uses API The broadcast is in one direction only where the viewrs are never asked for their mic / webcam permission .

problem : The broadcast is for WebRTC browsers only and doesnt support non webrtc players / browsers

Attempt 1.1: Stream the media directly to nodejs through websocket

window.addEventListener('DOMContentLoaded', function() {

var v = document.getElementById('v');
navigator.getUserMedia = (navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia ||

if (navigator.getUserMedia) {
// Request access to video only
function(stream) {
var url = window.URL || window.webkitURL;
v.src = url ? url.createObjectURL(stream) : stream;;

var ws = new WebSocket('ws://localhost:3000', 'echo-protocol');
waitForSocketConnection(ws, function(){

console.log(" url.createObjectURL(stream)-----", url.createObjectURL(stream))

console.log("message sent!!!");

function(error) {
alert('Something went wrong. (error code ' + error.code + ')');
else {
alert('Sorry, the browser you are using doesn\'t support getUserMedia');

//Make the function wait until the connection is made...
function waitForSocketConnection(socket, callback){
function () {
if (socket.readyState === 1) {
console.log("Connection is made")
if(callback != null){

} else {
console.log("wait for connection...")
waitForSocketConnection(socket, callback);

}, 5); // wait 5 milisecond for the connection...

problem : The video is in form of buffer and doesnot play

Attempt 2: Record the WebRTC media ( 5 secs each ) into chunks of webm format->  transfer them to other end -> append the chunks together like a regular file 

This process involved the following components :

  • Recorder Javascript library : RecordJs
  • Transfer mechanism : Record using RecordRTC.js -> send to other end for media server -> stitching together the small webm files into big one at runtime and play
  • Programs :

Code for video recorder

navigator.getUserMedia(videoConstraints, function(stream) {

video.onloadedmetadata = function() {
video.width = 320;
video.height = 240;

var options = {
type: isRecordVideo ? 'video' : 'gif',
video: video,
canvas: {
width: canvasWidth_input.value,
height: canvasHeight_input.value

recorder = window.RecordRTC(stream, options);
video.src = URL.createObjectURL(stream);
}, function() {
if (document.getElementById('record-screen').checked) {
if (location.protocol === 'http:')
alert('&amp;amp;amp;amp;amp;lt;https&amp;amp;amp;amp;amp;gt; is mandatory to capture screen.');
alert('Multi-capturing of screen is not allowed. Capturing process is denied. Are you enabled flag: "Enable screen capture support in getUserMedia"?');
} else
alert('Webcam access is denied.');

Code for video append-er

var FILE1 = '1.webm';
var FILE2 = '2.webm';
var FILE3 = '3.webm';
var FILE4 = '4.webm';
var FILE5 = '5.webm';

var NUM_CHUNKS = 5;
var video = document.querySelector('video');

window.MediaSource = window.MediaSource || window.WebKitMediaSource;
if (!!!window.MediaSource) {
alert('MediaSource API is not available');

var mediaSource = new MediaSource();

video.src = window.URL.createObjectURL(mediaSource);

function callback(e) {

var sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vorbis,vp8"');

GET(FILE1, function(uInt8Array) {

var file = new Blob([uInt8Array], {type: 'video/webm'});
var i = 1;

(function readChunk_(i) {

var reader = new FileReader();

reader.onload = function(e) {

sourceBuffer.appendBuffer(new Uint8Array(;

if (i == NUM_CHUNKS) mediaSource.endOfStream();

else {
if (video.paused) {; // Start playing after 1st chunk is appended.



})(i); // Start the recursive call by self calling.

mediaSource.addEventListener('sourceopen', callback, false);
mediaSource.addEventListener('webkitsourceopen', callback, false);
mediaSource.addEventListener('webkitsourceended', function(e) {
logger.log('mediaSource readyState: ' + this.readyState);
}, false);

// function get the video via XHR
function GET(url, callback) {

var xhr = new XMLHttpRequest();'GET', url, true);
xhr.responseType = 'arraybuffer';

xhr.onload = function(e) {

if (xhr.status != 200) {
alert("Unexpected status code " + xhr.status + " for " + url);
return false;

callback(new Uint8Array(xhr.response));

Shortcoming of this approach

  1. The webm files failed to play on most of the media players
  2. The recorder can only either record video or audio file at a time .

Attempt 2.1: Record the WebRTC media ( 5 secs each ) into chunks of webm format ( RecordRTC.js) >  Use Kurento JS script ( kws-media-api,js) to make a HTTP Endpoint to recorded Webm files  -> append the chunks together like a regular file at runtime 

function getByID(id) {
return document.getElementById(id);

var recordAudio = getByID('record-audio'),
recordVideo = getByID('record-video'),
stopRecordingAudio = getByID('stop-recording-audio'),
stopRecordingVideo = getByID('stop-recording-video'),

var canvasWidth_input = getByID('canvas-width-input'),
canvasHeight_input = getByID('canvas-height-input');

var video = getByID('video');
var audio = getByID('audio');

var videoConstraints = {
audio: false,
video: {
mandatory: {},
optional: []

var audioConstraints = {
audio: true,
video: false

const ws_uri = 'ws://localhost:8888/kurento';
var URL_SMALL="http://localhost:8080/streamtomp4/approach1/5561840332.webm";

var audioStream;
var recorder;

recordAudio.onclick = function() {
if (!audioStream)
navigator.getUserMedia(audioConstraints, function(stream) {

if (window.IsChrome) stream = new window.MediaStream(stream.getAudioTracks());
audioStream = stream;

audio.src = URL.createObjectURL(audioStream);
audio.muted = true;;

// "audio" is a default type
recorder = window.RecordRTC(stream, {
type: 'audio'
}, function() {});
else {
audio.src = URL.createObjectURL(audioStream);
audio.muted = true;;
if (recorder) recorder.startRecording();

window.isAudio = true;

this.disabled = true;
stopRecordingAudio.disabled = false;

stopRecordingAudio.onclick = function() {
this.disabled = true;
recordAudio.disabled = false;
audio.src = '';

if (recorder)
recorder.stopRecording(function(url) {
audio.src = url;
audio.muted = false;;

document.getElementById('audio-url-preview').innerHTML = '&amp;amp;amp;amp;amp;lt;a href="' + url + '" target="_blank"&amp;amp;amp;amp;amp;gt;Recorded Audio URL&amp;amp;amp;amp;amp;lt;/a&amp;amp;amp;amp;amp;gt;';

recordVideo.onclick = function() {

function recordVideoOrGIF(isRecordVideo) {
navigator.getUserMedia(videoConstraints, function(stream) {

video.onloadedmetadata = function() {
video.width = 320;
video.height = 240;

var options = {
type: isRecordVideo ? 'video' : 'gif',
video: video,
canvas: {
width: canvasWidth_input.value,
height: canvasHeight_input.value

recorder = window.RecordRTC(stream, options);
video.src = URL.createObjectURL(stream);
}, function() {
if (document.getElementById('record-screen').checked) {
if (location.protocol === 'http:')
alert('&amp;amp;amp;amp;amp;lt;https&amp;amp;amp;amp;amp;gt; is mandatory to capture screen.');
alert('Multi-capturing of screen is not allowed. Capturing process is denied. Are you enabled flag: "Enable screen capture support in getUserMedia"?');
} else
alert('Webcam access is denied.');

window.isAudio = false;

if (isRecordVideo) {
recordVideo.disabled = true;
stopRecordingVideo.disabled = false;
} else {
recordGIF.disabled = true;
stopRecordingGIF.disabled = false;

stopRecordingVideo.onclick = function() {
this.disabled = true;
recordVideo.disabled = false;

if (recorder)
recorder.stopRecording(function(url) {
video.src = url;;
document.getElementById('video-url-preview').innerHTML = '&amp;amp;amp;amp;amp;lt;a href="' + url + '" target="_blank"&amp;amp;amp;amp;amp;gt;Recorded Video URL&amp;amp;amp;amp;amp;lt;/a&amp;amp;amp;amp;amp;gt;';


/*----broadcasting -------------*/

function onerror(error)
console.log( " error occured");

broadcast.onclick = function() {
var videoOutput = document.getElementById("videoOutput");

KwsMedia(ws_uri, function(error, kwsMedia)
if(error) return onerror(error);

// Create pipeline
kwsMedia.create('MediaPipeline', function(error, pipeline)
if(error) return onerror(error);

// Create pipeline media elements (endpoints &amp;amp;amp;amp;amp;amp; filters)
pipeline.create('PlayerEndpoint', {uri: URL_SMALL},
function(error, player)
if(error) return console.error(error);

pipeline.create('HttpGetEndpoint', function(error, httpGet)
if(error) return onerror(error);

// Connect media element between them
player.connect(httpGet, function(error, pipeline)
if(error) return onerror(error);
// Set the video on the video tag
httpGet.getUrl(function(error, url)
if(error) return onerror(error);

videoOutput.src = url;


// Start player
if(error) return onerror(error);


// Subscribe to HttpGetEndpoint EOS event
httpGet.on('EndOfStream', function(event)
console.log("EndOfStream event:", event);


problem : dissecting the live video into small the files and appending to each other on reception is an expensive , time and resource consuming process . Also involves heavy buffering and other problems pertaining to real-time streaming .

Attempt 2.2 : Send the recorded chunks of webm to a port on linux server . Use socket programming to pick up these individual files and play using  VLC player from UDP port of the Linux Server

Screenshot from 2015-01-22 15:32:51

Attempt 2.3: Send the recorded chunks of webm to a port on linux server socket . Use socket programming to pick up these individual webm files and convert to H264 format so that they can be send to a media server. 

This process involved the following components :

  • Recorder Javascript library : RecordJs
  • Transfer mechanism :WebRTC endpoint -> Call handler ( Record in chunks ) -> ffmpeg / gstreamer to put it on RTP -> streaming server like wowza – > viewers
  • Programs : Use HTML webpage Webscoket connection -> nodejs program to write content from websocket to linux socket -> nodejs program to read that socket and print the content on console

Program to transfer the webm recorder files over websocket to nodejs program

//Make the function wait until the connection is made...
function waitForSocketConnection(socket, callback){
function () {
if (socket.readyState === 1) {
console.log("Connection is made")
if(callback != null){

} else {
console.log("wait for connection...")
waitForSocketConnection(socket, callback);

}, 5); // wait 5 milisecond for the connection...

function previewFile() {
var preview = document.querySelector('img');
var file = document.querySelector('input[type=file]').files[0];
var reader = new FileReader();

reader.onloadend = function () {

preview.src = reader.result;
console.log(" reader result ", reader.result);

var video=document.getElementById("v");
console.log(" video played ");

var ws = new WebSocket('ws://localhost:3000', 'echo-protocol');

waitForSocketConnection(ws, function(){
console.log("message sent!!!");


if (file) {
// converts to base64 encoded string of the file data


} else {
preview.src = "";

Program for Linux Sockets sender which creates the socket for the webm files

var net = require('net');
var fs = require('fs');
var socketPath = '/tmp/tfxsocket';
var http = require('http');
var stream = require('stream');
var util = require('util');

var WebSocketServer = require('ws').Server;
var port = 3000;
var serverUrl = "localhost";

var socket;
/*----------http server -----------*/
var server= http.createServer(function (request, response) {


server.listen(port, serverUrl);

console.log('HTTP Server running at ',serverUrl,port);

/*------websocket server ----------*/

var wss = new WebSocketServer({server: server});

wss.on("connection", function(ws) {
console.log("websocket connection open");

ws.on('message', function (message) {
console.log(" stream recived from broadcast client on port 3000 ");

var s = require('net').Socket();

console.log(" send the stream to socketPath",socketPath);

ws.on("close", function() {
console.log("websocket connection close")


Program for Linux Socket Listener using nodejs and socket . Here the socket is in node /tmp/mysocket

var net = require('net');

var client = net.createConnection("/tmp/mysocket");

client.on("connect", function() {
console.log("connected to mysocket");

client.on("data", function(data) {

client.on('end', function() {
console.log('server disconnected');

Output 1: Video Buffer displayed

Screenshot from 2015-01-22 15:35:06 (copy)

Output 2 : Random data from Video displayed

Screenshot from 2015-01-23 12:57:35

ffmpeg format of transfering the content from socket to UDP IP and port

ffmpeg -i unix://tmp/mysocket -f format udp://

problems of this approach : The video was on a passing stage from the socket and contained no information as such when tried to play / show console

Attempt 3 : Send the live WebRTC stream from Kurento WebRTC endpoint to Kurento HTTP endpoint . play using  Mozilla VLC web plugin

VLC mozilla plugin can be embedded by :

autoplay=”yes” loop=”no” hidden=”no”
target=”rtp://@″ />

screenshot of failure on part of Mozilla VLC plugin to play from a WebRTC endpoint

Screenshot from 2015-01-29 10:37:06
Screenshot from 2015-01-29 10:37:17
Screenshot from 2015-01-29 12:06:14

problem : VLC mozilla plugin was unable to play the video


The 4th , 5th and 6th sections of this article are in the next blog :

continue : Streaming / broadcasting Live Video call to non webrtc supported browsers and media players