Webrtc handshake

Interfaces of webrtc and tracks to stream addition

Process to perform webrtc handshake

1.Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

2.caller creates SDP offer for the callee


3.Callee process the offer


4.Callee generates an SDP answer for the caller


5.Caller receives and prcesses the answer from callee


6.Proceed to Add stream

7. Proceed to do ICE for NAT

Webrtc call setup and incoming call callflow between remote peer , peerconnection actory , peerconnection and application

setup a call
receive a call

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN
also allows for address selection for multi-homed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

Gathering Candidate Addresses

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    1.translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    2.addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address

Agent sends the TURN Allocate request from IP address and port X:x,
NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

Only STUN based Binding

agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

TRIGGERED CHECKS – accelerates the process of finding a valid candidate
ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first

Checks ensure maintaining frozen candidates and pairs with some foundation for media stream

Each candidate pair in the check list has a foundation and a state. States for candidates pairs

1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.

2. In-Progress: A check has been sent for this pair, but the transaction is in progress.

3. Succeeded: A check for this pair was already done and produced a successful result.

4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.

5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

Example of ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 37542 typ srflx raddr rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 27190 typ relay raddr rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 28049 typ relay raddr rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Exmaple Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement
controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. Ways
regular nomination and aggressive nomination


Ref :

http://w3c.github.io/webrtc-pc/ WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019
RFC 5245 Inter

WebRTC Security

Unlike most conventional  real-time systems (e.g., SIP-based soft phones) WebRTC communications  are directly controlled by a Web server over some signalling protocol which may be XMPP , websockets , socket.io , Ajax etc . This poses new challenges such as

  • Web browser might expose a JavaScript APIs which allows  web server to place a video call itself.This may cause web pages to secretly record and stream the webcam activity from user’s computer
  • malicious calling services can record the user’s conversation and misuse
  • malicious webpages can lure users via advertising and execute auto calling services .
  • Since JavaScript calling APIs are implemented as browser built-ins , un authorized access to these can also make user’s audio and camera streams vulnerable
  • If program and APIs allow the server to instruct the browser to send arbitrary content, then they can be used to bypass firewalls or mount denial of service attacks.

The general goal of security is to identify and resolve security issues during the design phase so they do not cost service provider time, money, and reputation at a later phase. Security for a large architecture project involves many aspects, there is no one device or methodology to guarantee that an architecture is now “secure” Areas that malicious individuals will attempt to attack include but are not limited to:

  • Improperly coded applications
  • Incorrectly implemented protocols
  • Operating System bugs
  • Social engineering and phishing attacks

As security is a broad topic touching on many sections of WebRTC this section is not meant to address all topics but instead to focus on specific “hot spots”, areas that require special attention due to the unique properties of the WebRTC service. There are several security-related topics that are of particular interest with respect to WebRTC.  They can be grouped into the following areas:

  1. Identity Management
  2. Browser Security
  3. Authentication
  4. Media encryption

The are discussed in detail below :

Identity Management

Support of WebRTC should not increase security risk to telecom network. Any device or software that is in the hands of the customer will be compromised, it is just a mater of time

  • All data received from untrusted sources (i.e. all data from customer controlled devices or software) must be validated.
  • Any data sent to the client will be obtained by malicious users

Provide exceptional protection for our customer’s data and make all reasonable attempts at protecting the customer from their own mistakes that may compromise their own systems. —Ensure that the new service does not adversely impact the data security, privacy, or service of existing customers.

Browser Security

Specific security concerns include:

Cross-site scripting (XSS)

A type vulnerability typically found in Web applications (such as web browsers through breaches of browser security) that enables attackers to inject client-side script into Web pages viewed by other users.

  • A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same origin policy.
  • Cross-site scripting carried out on websites accounted for roughly 80.5% of all security vulnerabilities documented by Symantec as of 2007 according to Wikipedia.
  • Their effect may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site’s owner.

As the primary method for accessing WebRTC is expected to be using HTML5 enabled browsers there are specific security considerations concerning their use such as; protecting keys and sensitive data from cross-site scripting or cross-domain attacks, websocket use, iframe security, and other issues. —Because the client software will be controlled by the user and because the browser does not, in most cases, run in a protected environment there are additional chances that the WebRTC client will become compromised. This means all data sent to the client could be exposed.

  • keys
  • hashes
  • registration elements (PUID etc.)

Therefore additional care needs to be taken when considering what information is sent to the client, and additional scrutiny needs to be performed on any data coming from the client.


(User Interface redress attack, UI redress attack, UI redressing) is a malicious technique of tricking a Web user into clicking on something different to what the user perceives they are clicking on, thus potentially revealing confidential information or taking control of their computer while clicking on seemingly innocuous web pages. It is a browser security issue that is a vulnerability across a variety of browsers and platforms, a clickjack takes the form of embedded code or a script that can execute without the user’s knowledge, such as clicking on a button that appears to perform another function. —Compromised personal computer with installed adware, viruses, spyware such as trojan horses, etc. can also compromise the browser and obtain anything the browser sees.


Authentication happens on different levels

End user Authentication

Through UID ( unique ID ) of USER

Device Authentication

  • SIM enabled devices follow standard IMS-AKA authentication
  • Non-SIM enabled “devices” are authenticated using user authentication

Application Authentication

  • Model mirrors current application onboarding procedures.
  • Application developers need to establish service agreement
  • Client_Id secrets are exchanged as part of this process.
  • Use  security gateway for authenticating applications

The Browser Threat Model

The browser acts as a TRUSTED COMPUTING BASE (TCB) both from the user’s perspective and to some extent from the  server’s.  HTML and JavaScript (JS) provided by the web server can execute scripts on browser and generate actions and events . However browser  operates in a sandbox that isolates these scripts both from the user’s computer and from server .

Access to Local Resources

The users computer may have lot of private and confidential data on the disk . Browser do make it mandatory that user must explicitly select the file and consent to its upload before doing file upload and transfer transactions . However still it is not very rare that misleading text and buttons can make users click files .  

Another way of accessing local resources is through downloading malicious files to users computer which are executable and may harm users computer.

SOP or Same Origin Policy

SOP  forces scripts from each site to run in their own, isolated, sandboxes.  It enables webpages and scripts from the same origin server to interact with each other’s JS variables, but prevents pages from the different origins or even iframes on the same page to not exchange information.

As part of SOP scripts are allowed to make HTTP requests via the  XMLHttpRequest() API to only those server which have same ORIGIN/domain as that of the originator .

CORS [Cross-Origin Resource Sharing ]

CORS enables multiple web services to intercommunicate . Therefore when a script from origin A executes what would otherwise be a forbidden cross-origin request, the browser instead contacts the target server B to determine whether it is willing to allow cross-origin requests from A.  If it is so willing, the browser then allows the request.  This consent verification process is designed to safely allow cross-origin requests.


Even websockets overcome SOP and establish cross origin transport channels .

Once a WebSockets connection has been established from a script to a site, the script can exchange any traffic it likes without being required to frame it as a series of HTTP request/response transactions.

WebSockets use masking technique to randomize the bits that are being transmitted , thus making it more difficult to generate traffic which resembles a given protocol , thus making it difficult for inspection from flowing traffic .


Jsonp is a hack designed to bypass origin restriction through script tag injection. A JSONp enabled server passes the response in user specified function

when we use <script> tags the domain limitation is ignored ie we can load scripts from any domain .  So when we need to fetch get exchange data just pass callback parameters through scripts . For example

function mycallback(data){
// this is the callback function executed when script returns
alert("hi"+ data);</span>
var script = document.createElement('script');
script.src = '//serverb.com/v1/getdata?callback=mycallback'

There have been found vulnerabilities in the existing Java and Flash consent verification techniques and handshake.

Security around ICE and TURN


Sender and receiver are able to share media stream after a offer answer handshake. But we already need one in order to do NAT hole-punching. Presuming the ICE server is malicious , in absence of transaction IDs by stun unknow to call scripts , it is not possible for the webpage of receiver to ascertain is the data is forged or original . Thus to prevent this the browser must generate hidden transaction Id’s and should not sharing with call scripts ,even via a diagnostic interface.

IP Location Privacy

As soon as the callee sends their ICE candidates, the caller learns the callee’s IP addresses.  The callee’s server reflexive address reveals a lot of information about the callee’s location.

To prevent server should suppress the start of ICE negotiation until the callee has answered.

Also user may hide their location entirely by forcing all traffic through a TURN server.

Communications Security

Goal of webrtc based call services should be to create channel which is secure  against both message recovery and message modification for all audio / video and data .

Threats from Screen Sharing

With the increasing requirement of screen sharing in web app and communication systems there is always a high threat of oversharing / exposing confidential passwords , pins , security details etc . This may either through some part of screen or some notification whihc pops up .

There is always the case when the user may believe he is sharing a window when in fact they are the entire desktop.

The attacker may request screensharing and make user open his webmail , payment settings or even net-banking accounts .

Long term access to camera and microphone

When user frequently uses a site he / she may want to give the site a long-term access to the camera and microphone ( indicated by ” Always allow on this site ” in chrome ). However the site may be hacked and thus initiate call on users’ computer automatically to secretly listen-in .

False UI shows cut off call while still being active

Unless the user checks his laptops glowing camera light LED or goes and monitors the traffic himself he would not know if there is active call in background, which according to him he had cut off . In such a case an attacker may pretend to cut a call shows red phone signs and supportive text but still keep the session and media stream active placing himself on mute .

During-Call Attack

Even if the calling service cannot directly access keying material ,it  can simply mount a man-in-the-middle attack on the connection. The idea is to mount a bridge capturing all the traffic.

To protect against this it is now mandatory to use https for using getusermedia and otherwise also recommended to keep webrtc comm services on https or use strict fingerprinting .
This section is derived from Security Considerations for WebRTC draft-ietf-rtcweb-security-08

So does existing WebRTC model offer security ?

We know that the forces behind WebRTC standardization are WHATWG, W3C, IETF and strong internet working groups . WebRTC security was already taken into consideration when standards were being build for it . The encryption methods and technologies like DTLS and SRTP were included to safeguard users from intrusions so that the information stays protected.

WebRTC media stack has native built-in features that address security concerns. The peer-to-peer media is already encrypted for privacy . Figure below:

WebRTC media stack Solution Architecture - Google Slides (1)
WebRTC media stack

Media Encryption

Primary issue with supporting DTLS is it can put a heavy load on the SBC’s handling encryption/decryption duties. —Interworking DTLS-SRTP to SDES is CPU intensive

  • SRTP from DTLS-SRTP end flows easily
  • SRTP from SDESC end requires auth+decrypt, and encrypt+auth

Reason:  DTLS-SRTP handshake has both ends choose “half” of the SRTP key The Encrypted Key Transport (EKT) proposed by Cisco solves this problem and provides additional security. Recommendation is to use DTLS-SRTP with EKT enhancements . Note: In order to avoid potential security issues, the SRTP authentication tag length used by the base authentication method must be at least ten octets.

For WebRTC to transfer real time data, the data is first encrypted using the DTLS (Datagram Transport Layer Security) method. This is a protocol built into all the WebRTC supported browsers from the start (Chrome, Firefox and Opera). On a DTLS encrypted connection, eavesdropping and information tampering cannot take place.

Other than DTLS, WebRTC also encrypts video and audio data via the SRTP (Secure Real-Time Protocol) method ensuring that IP communications – your voice and video traffic – can not be heard or seen by unauthorized parties.

What is SRTP ?

The Secure Real-time Transport Protocol (or SRTP) defines a profile of RTP (Real-time Transport Protocol), intended to provide encryption, message authentication and integrity, and replay protection to the RTP data in both unicast and multicast applications.

Earlier models of VOIP communication such as SIP based calls had an option to use only RTP for communication thereby subjecting the endpoint users to lot of problem like compromising media Confidentiality  . However the WebRTC model mandates the use of SRTP hence ruling out insecurities of RTP completely. For encryption and decryption of the data flow SRTP utilizes the Advanced Encryption Standard (AES) as the default cipher.

What is DTLS ?

DTLS allows datagram-based applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. The DTLS protocol is based on the stream-oriented Transport Layer Security (TLS) protocol .

Together DTLS and SRTP enables the exchange of the cryptographic parameters so that the key exchange takes place in the media plane and are multiplexed on the same ports as the media itself without the need to reveal crypto keys in the SDP.

Today the browser acts as a TRUSTED COMPUTING BASE (TCB) where the HTML and JS act inside of a sandbox that isolates them both from the user’s computer.

A script cannot access user’s webcam , microphone , location , file , desktop capture without user’s explicit consent. When the user allows access, a red dot will appear on that tab, providing a clear indication to the user, that the tab has media access.

Figure depicting browser asking for user’s consent to access Media devices for WebRTC .

Untitled drawing

Figure depicting Media Capture active on browser with red dot .

Untitled drawing (1)
we know that XMLHttpRequest() API can be used to secretly send data from one origin to other and this can be used to secretly send information without user’s knowledge . However now , SAME ORIGIN POLICY (SOP) in browser’s prevents server A from mounting attacks on server B via the user’s browser, which protects both the user (e.g., from misuse of his credentials) and the server B (e.g., from DoS attack).

Security challenges with Web Server based WebRTC service are many for example :

  1. If the both the peers have WebRTC browser then one can place a WebRTC call to callee anytime this might result in denial of service .
  2. Since the media is p2p and also can override firewalls settings through TURN server , it can result in unwanted data being send to peer .
  3. One may secretly make calls to users through website and extract information .
  4. Threat from screen sharing, for example user might mistakenly share his internet banking screen or some confidential information.
  5. Giving long-term access to the camera and microphone for certain sites is also a concern . for example : since next time you visit a site that has access to your microphone and camera , they can secretly be viewing youe webcam and microphone inputs .
  6. Clever use of User Interface to mask a ongoing call can mislead the user into believing that call has been cut while it is secretly still ongoing.
  7. Network attackers can modify an HTTP connection through my Wifi router or hotspot to inject an IFRAME (or a redirect) and then forge the response to initiate a call to himself.
  8. As WebRTC doesn’t yet have an congestion control mechanism , it can eat up a large chunk of user’s bandwidth.
  9. By visiting chrome://webrtc-internals/ in chrome browser alone , one can view the full traces of all webRTC communication happening through his browser . The traces contain all kinds of details like signalling server used , relay servers , TURN servers , peer IP , frame rates etc .
WebRTC Internals

Ofcourse other challenges that arrive with any other webservice based architecture are also applicable here such as :

  1. Malicious Websites which automatically execute the attacker’s scripts.
  2. User can be induced to download harmful executable files and run them.
  3. Improper use of W3C Cross-Origin Resource Sharing (CORS) to bypass SAME ORIGIN POLICY (SOP) .

How can I make my WebRTC solution secure ?

In the recent months everyone has been trying to get into the WebRTC space but at the same time fearing that hackers might be able to listen in on conferences, access user data, or even private networks. Although development and usage around WebRTC is so simple , the security and encryption aspects of it are in the dim light.

Best practices to make your VOIP Solution more secure

A simple WebRTC architecture is shown in the figure below :

WebRTC media stack Solution Architecture - Google Slides (2)

By following the simple steps described below one can ensure a more secure WebRTC implementation . The same applies to healthcare and banking firms looking forth to use WebRTC as a communication solution for their portals .

Ensure that the signalling platform is over a secure protocol such as SIP / HTTPS / WSS . Also since media is p2p , the media contents like audio video channel are between peers directly in full duplex.

To protect against Man-In-The-Middle (MITM) attack the media path should be monitored regularly for no suspicious relay.

User’s that can participate in a call , should be pre registered / Authenticated with a registrar service. Unauthenticated entities should be kept away from session’s reach .

WebRTC authentication certificate
WebRTC authentication certificate

Make sure that ICE values are masked thereby not rendering the caller/ callee’s IP and location to each other through tracing in chrome://webrtc-internals/ or packet detection in Wirehsark on user’s end.

As the signalling server maintains the number of peers , it should be consistently monitored for addition of suspicious peers in a call session. If the number of peers actually present on signalling server is more that the number of peers interacting on WebRTC page then it means that someone is eavesdropping secretly and should be terminated from session access by force.

It was observed that manyatimes non tech savy users simply agree to all permissions request from browser without actually consciously giving consent . Therefore user’s should be made aware of API in websites which ask for undue permissions . For example permission to :

Screenshot from 2015-04-22 15:22:15

Third party API should be thoroughly verified before sending their data on WebRTC DataChannel.

Before Desktop Sharing user’s should be properly notified and advised to close any screen containing sensitive information .

Keystore and master key protection

For storing logs , recording , file , ssh keys or any others ensitive informaton encrypted by keys , we need a safe storage for keys and these tools are handy for password and key management – Dashlane , Lastpass , Bitwarden, 1Password

Auto sign-in security measure for WebRTC apps

Turn User Authentication On and enable Two-Factor Authentication/Bio-metrics.

OTP based signons and captcha checks are also popular approaches to protect signins

Avoid Public Wi-Fi

Even a WebRTC connection can be tampered with an insecure Wifi . Even if the Man-in-middle cannot decipher message content he can make out the packet size , frequency , end parties in siagnling , time delay etc . Also as a precautionary measure enable Remote Lock and Data Wipe and only use authorized apps with permission to sensitive data such as image storage.

Native WebRTC apps

If you use a native WebRTC native app , there are mulitple thinsg that you need to be wary of.

Avoid All Jailbreaks : Jail-breaking a smartphone can enable the user to run unverified or unsupported apps, many of these apps carry security vulnerabilities. Majority of security exploits for Apple’s iOS only affect jailbroken iPhones.

Add a Mobile Security App : Mobile security reports shows that mobile operating systems such as iOS and (especially) Android are increasingly becoming targets for malware. Select a reputable mobile security app that extends the built-in security features of the device’s mobile operating system. Some well-known third-party security vendors offering mobile security apps for iOS, Android and Windows Phone – Avast , Kaspersky , Symantec

Also as a good practise Turn off the Bluetooth, Wi-Fi and NFC when not needed.

Information security

Forum : Huawei

Information security ensures that both physical and digital data is protected from unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction. 

Intentional Security breaches

Although WebRTC already has best secure tools in its spec list which provide end to end encrypted communication over SRTP DTLS as well as media device access mandatory from websites of secure origin over TLS, yet if the endpoints acting as peers themselves are compromised then all this is in vain . Hence security issues arises when

  • the endpoints are recording their media content and storing it on unsafe location such as public file servers or
  • endpoints are inturn re-streaming their incoming their media streams to unsafe streaming servers

Exploiting human vulnerabilities / Unintentional breaches

Phishing , Pretexting , Baiting attacks , Quid pro quo , Tailgating , Water-Holing are soe of the common tactics to steal teh data of a nonsuspecting user . They are as much applicate to WebRTC based communication site as they are to any other trusted website such as banking , customer care contacts , falsh sale portals , cupon / discount sites etc .

Phone phishing – Voice phishing a criminal phone fraud, imporsonating legitimate caller such as a bank or tax agent and using social engineering over the telephone system to gain access to private personal and financial information for the purpose of financial reward.

SMS phishing – SMS is used as a method to do phishing maya tiem to send malicious links posing as legitmate sender

Other social engineering tactics – Trickery , Influencing , Deception , Spying

Impersonation attacks – spear-phishing , emails that attempt to impersonate a trusted individual or a company in an attempt to gain access to corporate finances , Human resource details , sesitive data. Business email compromise (BECs) also known as CEO fraud is a popular example of an impersonation attack. The fake email usually describes a very urgent situation to minimize scrutiny and skepticism.

Network security breaches

Inspite of the fact that webrtc is a p2p streaming framework , there are always signalling server required which do the initial handshake and enable the exchange fo SDP for the media to stream in peer to peer fashion . Some wellknown attacks that compromise networks and remote / cloud server are :

  • Viruses, worms and Trojan horses
  • Zero-day attacks
  • Hacker attacks
  • Denial of service attacks
  • Spyware and adware

It is upto the WebRTC/ VoIP service provider to detect emerging threats before they infiltrate network and compromise data. Some crticial compoenets to enhance security are Firewalls , Access Control Lists , Intrusion detection and prevention systems (IDS/IPS) , Virtual private networks (VPN)

Governance Framework – defines the roles, responsibilities and accountability of each person and ensures that you are meeting compliance.

  • Confidentiality: ensures information is inaccessible to unauthorized people via encryption
  • Integrity: protects information and systems from being modified by unauthorized people; provides accuracy and trustworthyness
  • Availability: ensures authorized people can access the information when needed and that all hardware and software are maintained properly and updated when necessary
  • Authentication, Authorization and Accountability(AAA): 
    validate users autheticity via creds, enforcing policies on network resources after the user has gain access and ensuring accountability by means of monitoring and capturing the events done by the user
  • Non repudiation: is the assurance that someone cannot deny the validity of something. It provides proof of the origin of data and the integrity of the data.

What happens if your VOIP solution is on the verge of being compromised ?

As a first defence tactic , if a orignation ip address is sending malacious or malformed packets which depict an exploitation or attack , trigger and notification for tech team and execute script to block on the origin IP of attacker via security groups in AWS or other ACL list in hosted server . Can also implement temporary firewall block on it and later monitor it for more violations.

Incase a server is compromised beyond repair such as attacker taking control of the file system , drain the ongoing sessions from it and store cached storage with session state variable like CDR enteries . Activate the fallback / standby server and make the current server a honeypot to explore the attackers actions

Commonly attacks involve either

  • exploiting the VoIP system to get free internatoinal calls
  • ransomware activities such as scp the files out of server and leaveing behind a readme.txt file on root location asking for money transfer in return of data
  • bombard brute force DDOS attacks to bring down the system and make it incapible of catering to genuine requests , perhaps with the inetention of giving advantage to competitors .

As the media connections are p2p , even if we kill the signalling server , it will not affect the ongoing media sessions . Only the time duration ( probably 3 – 4 minutes ) it takes to restart the server , is when the users will not be able to connect to signalling server for creating new sessions . therefore incase a system is under attack and non recoverable , just terminate it and respawn other server attaced to the domain name or floating IP or Load balancer.

Most browsers today like Google Chrome and Mozilla Firefox have a good record of auto-updating themselves withing 24 hours of a vulnerability of threat occurring .

If a call is confirmed to be compromised , it should be within the power of Web Application server rendering the WebRTC capable page to cut off the compromised call session by force censing termation request to endpoints or via turning off the TURN services if in use .

References :

TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys

STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) are protocols that can be used to provide NAT traversal for VoIP and WebRTC. These projects provide a VoIP media traffic NAT traversal server and gateway.

TURN Server is a VoIP media traffic NAT traversal server and gateway.

I come accross the question of difference between turn and stun a lot . Here I wanted to specify in very clear words that TURN is an extension of STUN .


This is a VoIP gateway for inter network communication which is popular and MIT based .

platforms supported :

Any client platform is supported, including Android, iOS, Linux, OS X, Windows, and Windows Phone. This project can be successfully used on other *NIX platforms ( Aamazon EC2) too. It supports flat file or Database based user management system ( MySQL , postgress , redis ). The source code project contains ,  TURN server ,  TURN client messaging library and some sample scripts to test various modules like protocol , relay , security etc .

Protocols :

protocols between the TURN client and the TURN server – UDP, TCP, TLS, and DTLS. Relay protocol – UDP , TCP .


The authentication mechanism is using key which is calculated over the user name, the realm, and the user password. Key for the HMAC depends on whether long-term or short-term credentials are in use. For long-term credentials, the key is 16 bytes:
key = MD5(username “:” realm “:” SASLprep(password))


Since I used my Ubuntu Software center for installing the RFC turn server 5677 .

Screenshot from 2015-03-05 15:22:30

More information is on Ubuntu Manuals : http://manpages.ubuntu.com/manpages/trusty/man1/turnserver.1.html

The content got stored inside /usr/share/rfc5766-turn-server.

Also install mysql for record keeping

sudo apt-get install mysql-server



Intall MySQL workbench to monitor the values feed into the turn database server in MysqL. connect to MySQL instance using the following screenshot


The database formed with mysql after successful operation is as follows . We  shall notice that the initial db is absolutely null


Terminal Commands

These terminal command ( binary images ) get stored inside etc/init.d after installing

turnadmin –

Its turn relay administration tool used for generating , updating keys and passwords . For generating a key to get long term crdentaial use -k command and for aading or updateing a long -term user use the -a command. Therefore a simple command to generate a key is

format : turnadmin -k -u -r -p
examples : turnadmin -k -u turnwebrtc -r mycompany.com -p turnwebrtc

The generated key is displayed in console . For example the following screenshot shows this :


To fill in user with long term credentails

Format : turnadmin -a [-b | -e | -M | -N ] -u -r -p

exmaple : turnadmin -a -M “host=localhost dbname=turn user=turn password=turn” -u altanai -r mycompany.com -p 123456

Check the values reflected in MySQL workbench for long term user table . ( screenshot depicts two entries for altanai and turnwebrtc user )


you can also check it on console using the -l command

format :turnadmin -l –mysql-userdb=””

example :  turnadmin -l –mysql-userdb=”host= dbname=turn user=turnwebrtc password=turnwebrtc connect_timeout=30″


or we can also check using the terminal based mySQL client

mysql> use turn;
Database changed

mysql> select * from turnusers_lt;
| name | hmackey |
| altanai | 57bdc681481c4f7626bffcde292c85e7 |
| turnwebrtc | 6066cbe0b5ee14439b2ddfc177268309 |
2 rows in set (0.00 sec)

turnserver –

Its command to handle the turnserver itself . We can use the simple turnserver command to start it without any db support using just turnserver. Screenshot for this is


We can use a database like mysql to start it with db connection string

Format : turnserver –mysql-userdb=””

Example : turnserver –mysql-userdb=”host= dbname=turn user=turnwebrtc password=turnwebrtc connect_timeout=30″



emulates multiple UDP,TCP,TLS or DTLS clients.


simple stateless UDP-only “echo” server. For every incoming UDP packet, it simply echoes it back.


simple STUN client example that implements RFC 5389 ( using STUN as endpoint to determine the IP address and port allocated to it , keep-alive , check connectivity etc) and RFC 5780 (experimental NAT Behavior Discovery STUN usage) .


checks the correctness of the STUN/TURN protocol implementation. This program will perform several checks and print the result on the screen. It will exit with 0 status if everything is OK, and with (-1) if there was an error in the protocol implementation.

Specifications :

TURN specifications include :

  • RFC 5766 – base TURN specs
  • RFC 6062 – TCP relaying TURN extension
  • RFC 6156 – IPv6 extension for TURN
  • DTLS
  • Mobile ICE (MICE)

STUN specifications :

  • RFC 3489 – Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) (STUN) to discover the presence and public IP
  • RFC 5389 – STUN serves as a tool for other protocols in NAT traversal. It can be used by an endpoint to determine the IP address and port allocated to it , keep-alive  , check connectivity etc .
  • RFC 5769 – test vectors for STUN protocol . FINGERPRINT, MESSAGE-INTEGRITY, and XOR-MAPPED-ADDRESS involving binary-logical operations (hashing, xor)
  • RFC 5780 – experimental NAT Behavior Discovery STUN usage

ICE specifications :

  • RFC 5245 – ICE
  • RFC 5768 – ICE–SIP
  • RFC 6336 – ICE–IANA Registry
  • RFC 6544 – ICE–TCP
  • RFC 5928 – TURN Resolution Mechanism

Test :

1. Test vectors from RFC 5769 to double-check that our
STUN/TURN message encoding algorithms work properly. Run the utility to check all protocols :

$ cd examples
$ ./scripts/rfc5769.sh

2. TURN functionality test (bare minimum TURN example).

If everything compiled properly, then the following programs must run
together successfully, simulating TURN network routing in local loopback
networking environment:

console 1 :

$ cd examples
$ ./scripts/basic/relay.sh

console2 :

$ cd examples
$ ./scripts/peer.sh

If the client application produces output and in approximately 22 seconds
prints the jitter, loss and round-trip-delay statistics, then everything is


{ ‘url’: ‘stun: altanai@mycompany.com’},
{ ‘url’: ‘turn: altanai@mycompany.com’, ‘credential’: ‘123456’}]

Insert the above piece of code on peer connection config .

Now call from one network environment to another . For example call from a enterprise network behind a Wifi router to a public internet datacard webrtc agent . The call should connect with video flowing smoothly between the two .


website : https://code.google.com/p/rfc5766-turn-server/

Download the executable from : http://turnserver.open-sys.org/downloads/v3.2.5.4/

you can read about setting a carrier grade TURN infrastructure on amazon EC2 here –


Project Coturn evolved from rfc5766-turn-server project with many new advanced TURN specs beyond the original RFC 5766 document.
Here the databses supported are : SQLite , MySQL , PostgreSQL , Redis , MongoDB

Protocols :

The implementation fully supports the following client-to-TURN-server protocols: UDP  , TCP  , TLS  SSL3/TLS1.0/TLS1.1/TLS1.2; ECDHE , DTLS versions 1.0 and 1.2. Supported relay protocols UDP (per RFC 5766) and TCP (per RFC 6062)

Authetication :

Supported message integrity digest algorithms:

  • HMAC-SHA1, with MD5-hashed keys (as required by STUN and TURN standards)
  • HMAC-SHA256, with SHA256-hashed keys (an extension to the STUN and TURN specs)

Supported TURN authentication mechanisms:

Installation :

Install libopenssl and libevent plus its dev or extra libraries .
OpenSSL has to be installed before libevent2 for TLS beacuse When libevent builds it checks whether OpenSSL has been already installed, and its version.

Download coturn readonly  from

svn checkout http://coturn.googlecode.com/svn/trunk/ coturn-read-only

extract the tar contents
$ tar xvfz turnserver-.tar.gz

go inside the extracted folder and run the following command to build
$ ./configure
$ make
$ make install

Adding users in the format using turnadmin
$ Sudo turnadmin -a -u -r -p

$ Sudo turnadmin -a -u altanai -r myserver.com -p 123456

Start the turn Server using turnserver from inside of /etc/init.d using the start command

$ sudo /etc/init.d/coturn start

Screenshot from 2015-01-06 12-08-15

The logs are usually stored in /var/log . Screenshot of log file


The default configured port is 3478.If other port is needed, change the file /etc/turnserver.conf


Specify the  values in Peer Connection

iceServers: [
{ ‘url’: ‘stun: @: ‘},
{ ‘url’: ‘turn: @: ‘, ‘credential’: ”}]


{ ‘url’: ‘stun: altanai@myserver.com’},
{ ‘url’: ‘turn: altanai@myserver.com’, ‘credential’: ‘123456’}]


TURN specs:

STUN specs:

  • RFC 3489 – STUN – Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)
  • RFC 5389 – Session Traversal Utilities for NAT (STUN)
  • RFC 5769 – test vectors for STUN protocol testing
  • RFC 5780 – NAT behavior discovery support
  • RFC 7443 – Application-Layer Protocol Negotiation (ALPN) Labels for STUN and TURN


  • RFC 5245 – ICE
  • RFC 5768 – ICE–SIP
  • RFC 6336 – ICE–IANA Registry
  • RFC 6544 – ICE–TCP
  • RFC 5928 – TURN Resolution Mechanism

website : https://code.google.com/p/coturn/


Xirsys is a provider for WebRTC infrastructure which included stun and turn server hosting as well .

The process of using their services includes singing up for a account and choosing whether you want a paid service capable of handling more calls simultaneously or free one handling only upto 10 concurrent turn connections .

The dashboard appears like this :


To receive the api one need to make a one time call to their service , the result of which contains the keys to invoke the turn services from webrtc script .

&lt;script src=&quot;http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js&quot;&gt;&lt;/script&gt;&lt;script&gt;// &lt;![CDATA[

$.post(&quot;https://api.xirsys.com/getIceServers&quot;, {
ident: &quot;altanai&quot;,
secret: &quot;&lt; your secret key &gt;&quot;,
domain: &quot; &lt; your doemain &gt;&quot;,
application: &quot;default&quot;,
room: &quot;default&quot;,
secure: 1
function(data, status) {
alert(&quot;Data: &quot; + data + &quot;n Status: &quot; + status);
console.log(&quot;Data: &quot; + data + &quot;nnStatus: &quot; + status);

The resulting output should look like ( my keys are hidden with a red rectangle ofcourse )


The process of adding a TURN / STUN to your webrtc script in JS is as follows :

{“username”:”< put your API username>”,”url”:”turn:turn2.xirsys.com:443?transport=udp”,”credential”:”< put your API credentail>”},
{“username”:”< put your API username>”,”url”:”turn:turn2.xirsys.com:443?transport=tcp”,”credential”:”< put your API credentail>”}]

website : http://xirsys.com/technology/

NAT traversal using STUN and TURN

We know that WebRTC is web based real-time communications on browser-based platform using the browser’s media application programming interface (API) and adding our JavaScript & HTML5 t control the media flow .
WebRTC has enabled developers to build apps/ sites / widgets / plugins capable of delivering simultaneous voice/video/data/screen-sharing capability in a peer to peer fashion.

But something which escapes our attention is the way in which media ia traversing across the network. Ofcourse the webrtc call runs very smoothly when both the peers are on open public internet without any restrictions or firewall blocks . But the real problem begins when one of the peer is behind a Corporate/Enterprise network or using a different Internet service provider with some security restrictions . In such a case the normal ICE capability of WebRTC is not enough , what is required is a NAT traversal mechanism .

STUN and TURN server protocols handle session initiations with handshakes between peers in different network environments . In case of a firewall blocking a STUN peer-to-peer connection, the system fallback to a TURN server which provides the necessary traversing mechanism through the NAT.

Lets study from the start ie ICE . What is it and why is it used ?

ICE (Interactive Connectivity Establishment )  framework ( mandatory by WebRTC standards  ) find network interfaces and ports in Offer / Answer Model to exchange network based information with participating communication clients. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN)

ICE is defined by RFC 5245 – Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols.

Sample WebRTC offer holding ICE candidates :

type: offer, sdp: v=0
o=- 3475901263113717000 2 IN IP4
t=0 0
a=group:BUNDLE audio video data
a=msid-semantic: WMS dZdZMFQRNtY3unof7lTZBInzcRRylLakxtvc
m=audio 9 RTP/SAVPF 111 103 104 9 0 8 106 105 13 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 F1:A8:2E:71:4B:4E:FF:08:0F:18:13:1C:86:7B:FE:BA:BD:67:CF:B1:7F:19:87:33:6E:10:5C:17:42:0A:6C:15
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
m=video 9 RTP/SAVPF 100 116 117 96
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 F1:A8:2E:71:4B:4E:FF:08:0F:18:13:1C:86:7B:FE:BA:BD:67:CF:B1:7F:19:87:33:6E:10:5C:17:42:0A:6C:15
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=rtpmap:96 rtx/90000
a=fmtp:96 apt=100
m=application 9 DTLS/SCTP 5000
c=IN IP4
a=fingerprint:sha-256 F1:A8:2E:71:4B:4E:FF:08:0F:18:13:1C:86:7B:FE:BA:BD:67:CF:B1:7F:19:87:33:6E:10:5C:17:42:0A:6C:15
a=sctpmap:5000 webrtc-datachannel 1024

Notice the ICE candidates under video and audio . Now take a look at the SDP answer

type: answer, sdp: v=0
o=- 6931590438150302967 2 IN IP4
t=0 0
a=group:BUNDLE audio video data
a=msid-semantic: WMS R98sfBPNQwC20y9HsDBt4to1hTFeP6S0UnsX
m=audio 1 RTP/SAVPF 111 103 104 0 8 106 105 13 126
c=IN IP4
a=rtcp:1 IN IP4
a=fingerprint:sha-256 7B:9A:A7:43:EC:17:BD:9B:49:E4:23:92:8E:48:E4:8C:9A:BE:85:D4:1D:D7:8B:0E:60:C2:AE:67:77:1D:62:70
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
m=video 1 RTP/SAVPF 100 116 117 96
c=IN IP4
a=rtcp:1 IN IP4
a=fingerprint:sha-256 7B:9A:A7:43:EC:17:BD:9B:49:E4:23:92:8E:48:E4:8C:9A:BE:85:D4:1D:D7:8B:0E:60:C2:AE:67:77:1D:62:70
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=rtpmap:96 rtx/90000
a=fmtp:96 apt=100
m=application 1 DTLS/SCTP 5000
c=IN IP4
a=fingerprint:sha-256 7B:9A:A7:43:EC:17:BD:9B:49:E4:23:92:8E:48:E4:8C:9A:BE:85:D4:1D:D7:8B:0E:60:C2:AE:67:77:1D:62:70
a=sctpmap:5000 webrtc-datachannel 1024

Call Flow for ICE

STUN call flow for WebRTC Offer Answer
STUN call flow for WebRTC Offer Answer

WebRTC needs SDP Offer to be send to the clientB Javascript code from clientA Javascript code . Client B uses this SDP offer to generate an SDP Answer for client A. The SDP ( as seen on chrome://webrtc-internals/ ) includes ICE candidates which punchs open ports in the firewalls.
However incase both sides are symmetric NATs the media flow gets blocked. For such a case TURN is used which tries to give a public ip and port mapped to internal ip and port so as to provide an alternative routing mechanism like a packet-mirror. It can open a DTLS connection and use it to key the SRTP-DTLS media streams, and to send DataChannels over DTLS.

In order to Understand this better consider various scenarios

1 . No Firewall present on either peer . Both connected to open public internet .

Diagrammatic representation of  this shown as follows :

WebRTC signalling and media flow on Open public network
WebRTC signalling and media flow on Open public network

In this case there is no restriction to signal or media flow and the call takes places smoothly in p2p fashion.

2.  Either one or both the peer ( could be many in case of multi conf call ) are present behind a firewall  or  restrictive connection or router configured for intranet .

In such a case the signal may pass with the use of default ICE candidates or simple ppensource google Stun server such as

{ ‘url’: “stun:stun.l.google.com:19302”}]

Diagram :

WebRTC signalling when peers are behind  firewalls
WebRTC signalling when peers are behind firewalls

However the media is restricted resulting in a black / empty / no video situation for both peers  . To combat such situation a relay mechanism such as TURN is required which essentially maps public ip to private ips thus creating a alternative route for media and data to flow through .

WebRTC media flow when peers are behind NAT . Uses TURN relay mechanism
WebRTC media flow when peers are behind NAT . Uses TURN relay mechanism

Peer config should look like :

var configuration =  {
iceServers: [
{ “url’:”stun::”},
{ “url”:”turn::”}

var pc = new RTCPeerConnection(configuration);

3. When the TURN server is also behind a firewall .  The config file of the turn server need to be altered to map the public and private IP

The diagrammatic description of this is as follows :

WebRTC media flow when peers are behind NAT and TURN server is behind NAT as well . TURN config files bind a public interface to private interface address.
WebRTC media flow when peers are behind NAT and TURN server is behind NAT as well . TURN config files bind a public interface to private interface address .

SIP VoIP system Architecture

SIP solutioning and architectures  is a subsequent article after SIP introduction, which can be found here.

A VOIP Solution is designed to accommodate the signalling and media both along with integration leads to various external endpoints such as various SIP phones ( desktop, softphones , webRTC ) ,  telecom carriers  , different voip network providers  , enterprise applications  ( Skype , Microsoft Lync  ), Trunks etc .

A sufficiently capable SIP platform should consist of following features :

  • audio calls ( optionally video )
  • media services such as conferencing, voicemail, and IVR,
  • messaging as IM and presence based on SIMPLE,
  • programmable services through standardized APIs and development of new modules
  • near-end and far-end NAT traversal for signalling and media flows
  • interconnectivity with other IP multimedia systems, VoLTE ( optional interconnection with other types of communications networks as GSM or PSTN/ISDN)
  • registry , location and lookup service
  • Backend support like Redis, MySQL, PostgreSQL, Oracle, Radius, LDAP, Diameter
  • serial and parallel forking
  • support for Voip signalling protocols (SIP, H,323, SCCP, MGCP, IAX) and telephony signalling protocols ( ISDN/SS7, FXS/FXO, Sigtran ) either internally via pluggable modules or externally via gateways

Performnace factors :

  • High availability using redundant servers in standby
  • Load balancing
  • IPv4 and IPv6 network layer support
  • TCP , UDP , SCTP transport layer protocol support
  • DNS lookups and hop by hop connectvity

Security considerations :

  • authentication, authorization, and accounting (AAA)
  • Digest authentication and credentials fetched from backend
  • Media Encryption
  • TLS and SRTP support
  • Topology hidding to prevent disclosing IP form internal components in via and route headers
  • Firewalls , blacklist, filters , peak detectors to prevent Dos and Ddos attacks

The article only outlines SIP system architecture  from 3 viewpoints :

  • from Infrastructure standpoint
  • from core voice engineering perspective
  • and accompanying external components required to run and system

Infrastructure Requirements

  • Data Centers with BCP ( Business Continuity Planning ) and DR ( Disaster Recovery )
  • Servers and Clusters for faster and parallel calculating
  • Virtualization
    VMs to make a distributed computing environment with HA ( high availability ) and DRS ( Distributed Resource Scheduling )
  • Storage
    SAN with built-in redundancy for the resiliency of data.
    WORM compliant NAS for storing voice archives over a retention period.
  • Racks, power supplies, battery backups, cages etc.
  • Networking
    DMZs ( Demilitarized Zones)  which are interfacing areas between internal servers in the green zone and outside network
    VLANs for segregation between tenants.
    Connectivity through the public Internet as well as through VPN or dedicated optical fibre network for security.
  • Firewall configuration
  • Load Balancer ( Layer 7 )
  • Reverse Proxies for the security of internal IPs and port
  • Security controls In compliance with ISO/IEC 27000 family – Information security management systems
  • PKI Infrastructure to manage digital certificates
  • Key management with HSM ( hardware security module )
  • truster CA ( Certificate Authority ) to issue publicly signed certificate for TLS ( Https, wss etc)
  • OWASP ( Open Web Application Security Project )  rules compliance

Integral Components of a VOIP SIP based architecture

  • Call Controller
  • Media Manager
  • Recording
  • Softclients
  • logs and PCAP archives
  • CDR generators
  • Session Borer Controllers ( SBCs)

Types of SIP servers are listed below . It is important to understand the roles a SIP server can be moulded to take up which in turn defines its placement in overall voip communication platform such as stateless proxy servers on the border , application and B2BUA server at the core etc

SIP Gateways:

sip entities
SIP platform components

A SIP gateway is an application that interfaces a SIP network to a network utilising another signalling protocol. In terms of the SIP protocol, a gateway is just a special type of user agent, where the user agent acts on behalf of another protocol rather than a human. A gateway terminates the signalling path and can also terminate the media path .

sip gaeways

To PSTN for telephony inter-working
To H.323 for IP Telephony inter-working
Client – originates message
Server – responds to or forwards message

Logical SIP entities are:

  • User Agent Client (UAC): Initiates SIP requests  ….
  • User Agent Server (UAS): Returns SIP responses ….
  • Network Servers ….

Registrar Server

A registrar server accepts SIP REGISTER requests; all other requests receive a 501 Not Implemented response. The contact information from the request is then made available to other SIP servers within the same administrative domain, such as proxies and redirect servers. In a registration request, the To header field contains the name of the resource being registered, and the Contact header fields contain the contact or device URIs.

regsitrar server

Proxy Server

A SIP proxy server receives a SIP request from a user agent or another proxy and acts on behalf of the user agent in forwarding or responding to the request. Just as a router forwards IP packets at the IP layer, a SIP proxy forwards SIP messages at the application layer.

Typically proxy server ( inbound or outbound) have no media capabilities and ignore the SDP . They are mostly bypassed once dialog is established but can add a record-route .
A proxy server usually also has access to a database or a location service to aid it in processing the request (determining the next hop).

proxy server

 1. Stateless Proxy Server
A proxy server can be either stateless or stateful. A stateless proxy server processes each SIP request or response based solely on the message contents. Once the message has been parsed, processed, and forwarded or responded to, no information (such as dialog information) about the message is stored. A stateless proxy never retransmits a message, and does not use any SIP timers

2. Stateful Proxy Server
A stateful proxy server keeps track of requests and responses received in the past, and uses that information in processing future requests and responses. For example, a stateful proxy server starts a timer when a request is forwarded. If no response to the request is received within the timer period, the proxy will retransmit the request, relieving the user agent of this task.

  3 . Forking Proxy Server
A proxy server that receives an INVITE request, then forwards it to a number of locations at the same time, or forks the request. This forking proxy server keeps track of each of the outstanding requests and the response. This is useful if the location service or database lookup returns multiple possible locations for the called party that need to be tried.

Redirect Server

A redirect server is a type of SIP server that responds to, but does not forward, requests. Like a proxy server, a redirect server uses a database or location service to lookup a user. The location information, however, is sent back to the caller in a redirection class response (3xx), which, after the ACK, concludes the transaction. Contact header in response indicates where request should be tried .

redirect server

Application Server

The heart of all call routing setup. It loads and executes scripts for call handling at runtime and maintains transaction states and dialogs for all ongoing calls . Usually the one to rewrite SIP packets adding media relay servers, NAT . Also connects external services like Accounting , CDR , stats to calls .

Developing SIP based applications

Basic SIP methods

SIP defines basic methods such as INVITE, ACK and BYE which can pretty much handle simple call routing with some more advanced processoes too like call forwarding/redirection, call hold with optional Music on hold, call parking, forking, barge etc.

Extending SIP headers

Newer SIP headers defined by more updated SIP RFC’s contina INFO, PRACK, PUBLISH, SUBSCRIBY, NOTIFY, MESSAGE, REFER, UPDATE. But more methods or headers can be added to baseline SIP packets for customization specific to a particular service provider. In case where a unrecognized SIP header is found on a SIP proxy which it either does not suppirt or doesnt understand, it will simply forward it to the specified endpoint.

Call routing Scripts

Interfaces for programming SIP call routing include :
– Call Processing Language—SIP CPL,
– Common Gateway Interface—SIP CGI,
– SIP Servlets,
– Java API for Integrated Networks—JAIN APIs etc .

Some known SIP stacks :

SailFin – SIP servlet container uses GlassFish open source enterprise Application Server platform (GPLv2), obsolete since merger from Sun Java to Oracle.

Mobicents – supports both JSLEE 1.1 and SIP Servlets 1.1 (GPLv2)

Cipango – extension of SIP Servlets to the Jetty HTTP Servlet engine thus compliant with both SIP Servlets 1.1 and HTTP Servlets 2.5 standards.

WeSIP – SIP and HTTP ( J2EE) converged application server build on OpenSER SIP platform

Additionally SIP stacks are supported on almost all popular SIP programming lanaguges which can be imported as lib as used for building call routing scripts to be mounted on SIP servers or endpoints such as :


JSSIP Javascript

Sofia in kamailio , Freswitch

Some popular SIP server also have proprietary scripting language such as
Asterisk Gateway Interface (AGI) , application interface for extending the dialplan with your functionality in the language you choose – PHP, Perl, C, Java, Unix Shell and others

Adding Media Management

Media processing is usually provided by media servers in accordance to the SIP signalling. Bridges, call recording, Voicemail, audio conferencing, and interactive voice response (IVR) are commomly used.

Read more about Media Architecture here

RFC 6230 Media Control Channel Framework decribes framework and protocol for application deployment where the application programming logic and media processing are distributed

Any one such service could be a combination of many smaller services within such as Voicemail is a combitional of prompt playback, runtime controls, Dual-Tone Multi-Frequency (DTMF) collection, and media recording. RFC 6231 Interactive Voice Response (IVR) Control Package for the Media Control Channel Framework.

RTP ( Real Time Transport Protocol )

RTP handles realtime multimedia transport between end to end network components . RFC 3550 .

Image result for RTP packet structure

Packet structure of RTP     

RTP Header contain timestamp , name of media source , codec type and sequence number .

Image result for RTP header structure


– tbd

DTMF( Dual tone Multi Frequency )

delivery options:

  • Inband –  With Inband digits are passed along just like the rest of your voice as normal audio tones with no special coding or markers using the same codec as your voice does and are generated by your phone.
  • Outband  – Incoming stream delivers DTMF signals out-of-audio using either SIP-INFO or RFC-2833 mechanism, independently of codecs – in this case, the DTMF signals are sent separately from the actual audio stream.

TTS ( Text to Speech )

 Alexa Text-to-Speech (TTS) + Amazon Polly

Ivona – multiple language text to speech converter with ssml scripts such as below

              <s><prosody rate="slow">IVONA</prosody> means highest quality speech
              synthesis in various languages.</s>
              <s>It offers both male and female radio quality voices <break/> at a
              sampling rate of 22 kHz <break/> which makes the IVONA voices a
              perfect tool for professional use or individual needs.</s>

check ivona status

service ivona-tts-http status
 tail -f /var/log/tts.log

Collecting and Processing PCAPS

  • VoIP monitor – network packet sniffer with commercial frontend for SIP RTP RTCP SKINNY(SCCP) MGCP WebRTC VoIP protocols

it uses a passive network sniffer (like tcpdump or wireshark) to analyse packets in realtime and transforms all SIP calls with associated RTP streams into database CDR record which is sent over the TCP to MySQL server (remote or local). If enabled saving SIP / RTP packets the sniffer stores each VoIP call into separate files in native pcap format (to local storage).

voip monitor
  • sngrep
  • tcpdump
  • custom made pcap capture and uploader

SIP platform Development

A sufficiently capable SIP platform shoudl consist of following features :

  • audio calls ( optionally video )
  • media services such as conferencing, voicemail, and IVR,
  • messaging as IM and presence based on SIMPLE,
  • programmable services through standardized APIs and development of new modules
  • near-end and far-end NAT traversal for signalling and media flows
  • interconnectivity with other IP multimedia systems, VoLTE ( optional interconnection with other types of communications networks as GSM or PSTN/ISDN)
  • registry , location and lookup service
  • serial and parallel forking

Performance factors :

  • High availability using redundant servers in standby
  • Load balancing
  • IPv4 and IPv6 support

Security considerations :

  • digest authentication and credentials fetched from backend
  • Media Encryption
  • TLS and SRTP support
  • Topology hiding to prevent disclosng IP form internal components in via and route headers
  • Firewalls , blacklist, filters , peak detectors to prevent Dos and Ddos attacks

Add NAT and DNS components

To adapt SIP to modern IP networks with inter network traversal ICE, far and near-end NAT traversal solutions are used. Network Address traversal is crtical to traffic flow between private public network and from behind firewalls and policy controlled networks
One can use any of the VOVIDA-based STUN server, mySTUN , TurnServer, reStund , CoTURN , NATH (PJSIP NAT Helper), ReTURN, or ice4j

Near-end NAT traversal

STUN (session traversal utilities for NAT) – UA itself detect presence of a NAT and learn the public IP address and port assigned using Nating. Then it replaces device local private IP address with it in the SIP and SDP headers. Implemented via STUN, TURN, and ICE.
limitations are that STUN doesnt work for symmetric NAT (single connection has a different mapping with a different/randomly generated port) and also with situations when there are multiple addresses of a end point.

TURN (traversal using relay around NAT) or STUN relay – UA learns the public IP address of the TURN server and asks it to relay incoming packets. Limitatiosn since it handled all incoming and outgong traffic , it must scale to meet traffic requirments and should not become the bottle neck junction or single point of failure.

ICE (interactive connectivity establishment) – UA gathers “candidates of communication” with priorities offered by the remote party. After this client pairs local candidates with received peer candidates and performs offer-answer negotiating by trying connectivity of all pairs, therefore maximising success. The types of candidates :
– host candidate who represents clients’ IP addresses,
– server reflexive candidate for the address that has been resolved from STUN
– and a relayed candidate for the address which has been allocated from a TURN relay by the client.

Far-end NAT traversal

UA is not concerned about NAT at all and communicated using its local IP port. The border controller implies a NAT handling components such as an application layer gateway (ALG) or universal plug and play (UPnP) etc which resolves the private and public network address mapping by act as a back to back user agent (B2BUA).
Far end NAT can also be enabled by deploying a public SIP server which performs media relay (RTP Proxy/Media proxy).

Limitations of this approach
– security risks as they are operating in the public network
– enabling reverse traffic from UAS to UAC behind NAT.

A keep-alive mechanism is used to keep NAT translations of communications between SIP endpoint and its serving SIP servers opened , so that this NAT translation can be reused for routing. It contains client-to-server “ping” keep-alive and corresponding server-to-client “pong” messages. The 2 keep-alive mechanisms: a CRLF keep-alive and a STUN keep-alive message exchange.

The 3 types of SIP URIs,

  • address of record (AOR)
  • fully qualified domain name (FQDN)
  • globally routable user agent (UA) URI
    SIP uniform resource identifiers (URIs) are identified based on DNS resolution since the URI after @ symbol contains hostname , port and protocl for the next hop.

Adding record route headers for locating the correct SIP server for a SIP message can be done by :
– DNS service record (DNS SRV)
– naming authority pointer (NAPTR) DNS resource record

Steps for SIP endpoints locating SIP server

  1. From SIP packet get the NAPTR record to get the protocl to be used
  2. Inspect SRV record to fetch port to use
  3. Inspect A/AAA record to get IPv4 or IPv6 addresses
    ref : RFC 3263 – Locating SIP Servers
    Can use BIND9 server for DNS resolution supports NAPTR/SRV, ENUM, DNSSEC, multidomains, and private trees or public trees.

Cross platform and integration to External Telecommunication provider landscape

connection to IMS such as openIMS
support for Voip signalling protocols (SIP, H,323, SCCP, MGCP, IAX) and telephony signalling protocls ( ISDN/SS7, FXS/FXO, Sigtran ) either internally via pluggable modules or externally via gateways

Database Integration

Need backend , cache , databse integration to npt only store routing rules with temporary varaible values but also account details , call records details, access control lists etc. Should therefore extend integartion with text based db, redis, MySQL, PostrgeSQL, OpenLDAP, and OpenRadius.

The obvious starting milestone before making a full scale carrier grade, SIP based VoIP system is to start by building a PBX for intra enterprise communication. There are readily available solutions to make a IP telephony PBX kamailio , freeswitch , asterisk , Elastix , SipXecs

Call Rate and Accounting

Generally data streams proecssing are used for crtical and voluminious service usage like for
– metering/billing
– server activity,
– website clicks,
– geo-location of devices, people, and physical goods

Call Rates are very crticial for billing and charging the calls . Any updates from the customer or carriers or individuals need to propagate automatically and quickly to avoid discrpencies and neagtive margins. CDRs need to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and used for a wide variety of analytics including correlations, aggregations, filtering, and sampling.

To acheieve this the follow setup is ideal to use the new input rate sheet values via web UI console or POST API and propagate it quickly to main DB via AWS SQS which is a queing service and AWS lamda which is a serverless trigger based system . This ensures that any new input rates are updates in realtime and maintin fallback values in s3 bucket too

CDR Processing and Billing

CDR store call detail records along with proof of call with tiemstamps , orignation , destination , duaration , rate etc. At the end of month or any other term , the aggregated CDR are cumulatively processed to generate the bill for a user . This heavy data stream needs to be accurately processed and this can be achiveed by using datapipelines like AWS kinesis or Kafka eventstore .

The prime requirnment for the system is to handle enormous amount of call records data in relatime , cater to a number of producers and consumers .

For security the data is obfuscated into blob using base 64 encoding

AWS kinesis – Kinesis Data Streams is sued for for rapid and continuous data intake and aggregation. The type of data used can include IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data

Pros of data streams

This system can handle high volume of data in realtime and produce call uuid specfic reults which can be consumed by consumers waiting for the processed results

Cons of data streams

If not consumed with a pre-specified time duration the processed results expire and are irretrivable . Self implement publisher to store teh processed reults from kisesis stream to data stores like Redis / RDBMS or other storge locations like s3 , dynamo DB. If pieline crashes during operation , data is lost

Data stream should have low latency igesting contnous data from producer and presenting data to consumer .

It should support data sharding ie number of call records grouped and uses a partition Key ( string MD5 hash) to determine which shard the record goes to. 

There are other external components to setup a VOIP solution apart from Core voice Servers and gateways like the ones listed below, I will try to either add a detailed overall architecture diagram here or write about them in an seprate article . Keep watching this space for updates

  • Payment Gateways
  • Billing and Invoice
  • Fraud Prevention
  • Contacts Integration
  • Call Analytics
  • API services
  • Admin Module
  • Number Management ( DIDs ) and porting
  • Call Tracking
  • Single Sign On and User Account Management with Oauth and SAML
  • Dashboards and Reporting
  • Alert Management
  • Continuous Deployment
  • Automated Validation
  • Queue System
  • External cache

Read about VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform which includes :

  • Scalable and Flexible SIP platform building
  • Cluster SIP telephony Server for High Availability
  • Failure Recovery
  • Multi-tier cluster architecture
  • Role Abstraction / Micro-Service based architecture
  • Distributed Event management and Event-Driven architecture
  • Containerization
  • Autoscaling Cloud Servers
  • Open standards and Data Privacy
  • Flexibility for inter-working – NextGen911 , IMS , PSTN
  • security and Operational Efficiencies

References :

AWS kinesis –https://docs.aws.amazon.com/streams/latest/dev/introduction.html

AWazon docs streaming data – https://aws.amazon.com/streaming-data/

VOIP monitor Archietcture – https://www.voipmonitor.org/doc/Architecture

TTS Ivona – http://developer.ivona.com/en/ttsresources/ssml/ssml.html