This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.
As a communication agent become a single HTML page driven client, a lot of authentication, heartbeat sync, web workers, signalling event-driven flow management resides on the same page along with the actual CPU consumption for the audio-video resources and media streams processing. This in turn can make the webpage heavy and many a time could result in a crash due to being ” unresponsive”.
Here are some my best to-dos for making sure the webrtc communication client page runs efficiently
CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.
To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .
Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.
Some effective tips to spedding up JS execution include
SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar.
Set-Cookie: promo_shown=1; SameSite=Strict
You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.
Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .
Other page interaction crtiteria includes the frames their inetraction and timings for the same.
In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.
The non critical compoenents could then be loaded on async .
However, still, the security challenges with Web Server based WebRTC service are many for example :
If both the peers have a WebRTC browser then one can place a WebRTC call to callee anytime with an auto-answer. This might result in a denial of service(DoS) for the receiver.
Since the media is p2p and also can override firewalls settings through the TURN server, it can result in unwanted/ prohibited data being sent on the network.
Websocket packets are untraceable to detect whether they are used for normal web navigation or to share SDP hence one may secretly make no RTP calls to users through the web server and exchange information.
Threat from screen sharing, for example, a user might mistakenly share his internet banking screen or some confidential information / PII present on the desktop.
Giving long-term access to the camera and microphone for certain sites is also a concern. for example: in an unclosed tab on a site that has access to your microphone and camera, the remote peer can secretly be viewing your webcam and microphone inputs.
Clever use of User Interface to mask an ongoing call can mislead the user into believing that call has been cut while it is secretly still ongoing.
Network attackers can modify an HTTP connection through my Wifi router or hotspot to inject an IFRAME (or a redirect) and then forge the response to initiate a call to themselves.
As WebRTC doesn’t have a strong congestion control mechanism, it can eat up a large chunk of the user’s bandwidth.
By visiting chrome://webrtc-internals/ in chrome browser alone, one can view the full traces of all webRTC communication happening through his browser. The traces contain all kinds of details like signalling server used, relay servers, TURN servers, peer IP, frame rates etc which can jeopardise the security of VoIP service providers.
Ofcourse other challenges that arrive with any other webservice based architecture are also applicable here such as :
Malicious Websites which automatically execute the attacker’s scripts.
User can be induced to download harmful executable files and run them.
Improper use of W3C Cross-Origin Resource Sharing (CORS) to bypass SAME ORIGIN POLICY (SOP)
Unlike most conventional real-time systems (e.g., SIP-based softphones) WebRTC communications are directly controlled by a Web server over some signalling protocol which may be XMPP, WebSockets, socket.io, Ajax etc. This poses new challenges such as
malicious calling services can record the user’s conversation and misuse.
malicious webpages can lure users via advertising and execute auto calling services.
If programs and APIs allow the server to instruct the browser to send arbitrary content, then they can be used to bypass firewalls or mount denial of service attacks.
The general goal of security is to identify and resolve security issues during the design phase so they do not cost service provider time, money, and reputation at a later phase. Security for a large architecture project involves many aspects, there is no one device or methodology to guarantee that an architecture is now “secure” Areas that malicious individuals will attempt to attack include but are not limited to:
Improperly coded applications
Incorrectly implemented protocols
Operating System bugs
Social engineering and phishing attacks
As security is a broad topic touching on many sections of WebRTC this section is not meant to address all topics but instead to focus on specific “hot spots”, areas that require special attention due to the unique properties of the WebRTC service. There are several security-related topics that are of particular interest with respect to WebRTC. The are discussed in detail in sections below.
Today the browser acts as a TRUSTED COMPUTING BASE (TCB) where the HTML and JS act inside of a sandbox that isolates them both from the user’s computer.
With the latest tightening of patches around security concerns in webRTC platforms, a script cannot access a user’s webcam, microphone, location, file, desktop capture without the user’s explicit consent. When the user allows access, a red dot will appear on that tab, providing a clear indication to the user, that the tab has media access.
A type vulnerability typically found in Web applications (such as web browsers through breaches of browser security) that enables attackers to inject client-side script into Web pages viewed by other users.
A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same origin policy.
Cross-site scripting carried out on websites accounted for roughly 80.5% of all security vulnerabilities documented by Symantec as of 2007 according to Wikipedia.
Their effect may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site’s owner.
As the primary method for accessing WebRTC is expected to be using HTML5 enabled browsers there are specific security considerations concerning their use such as; protecting keys and sensitive data from cross-site scripting or cross-domain attacks, websocket use, iframe security, and other issues. Because the client software will be controlled by the user and because the browser does not, in most cases, run in a protected environment there are additional chances that the WebRTC client will become compromised. This means all data sent to the client could be exposed.
registration elements (PUID etc.)
Therefore additional care needs to be taken when considering what information is sent to the client, and additional scrutiny needs to be performed on any data coming from the client.
(User Interface redress attack, UI redress attack, UI redressing) is a malicious technique of tricking a Web user into clicking on something different to what the user perceives they are clicking on, thus potentially revealing confidential information or taking control of their computer while clicking on seemingly innocuous web pages. It is a browser security issue that is a vulnerability across a variety of browsers and platforms, a clickjack takes the form of embedded code or a script that can execute without the user’s knowledge, such as clicking on a button that appears to perform another function. Compromised personal computer with installed adware, viruses, spyware such as trojan horses, etc. can also compromise the browser and obtain anything the browser sees.
The users computer may have lot of private and confidential data on the disk . Browser do make it mandatory that user must explicitly select the file and consent to its upload before doing file upload and transfer transactions . However still it is not very rare that misleading text and buttons can make users click files .
Another way of accessing local resources is through downloading malicious files to users computer which are executable and may harm users computer.
We know that XMLHttpRequest() API can be used to secretly send data from one origin to other and this can be used to secretly send information without user’s knowledge. However now , SAME ORIGIN POLICY (SOP) in browser’s prevents server A from mounting attacks on server B via the user’s browser, which protects both the user (e.g., from misuse of his credentials) and the server B (e.g., from DoS attack).
SOP forces scripts from each site to run in their own, isolated, sandboxes. It enables webpages and scripts from the same origin server to interact with each other’s JS variables, but prevents pages from the different origins or even iframes on the same page to not exchange information.
As part of SOP scripts are allowed to make HTTP requests via the XMLHttpRequest() API to only those server which have same ORIGIN/domain as that of the originator .
CORS enables multiple web services to intercommunicate . Therefore when a script from origin A executes what would otherwise be a forbidden cross-origin request, the browser instead contacts the target server B to determine whether it is willing to allow cross-origin requests from A. If it is so willing, the browser then allows the request. This consent verification process is designed to safely allow cross-origin requests.
Once a WebSockets connection has been established from a script to a site, the script can exchange any traffic it likes without being required to frame it as a series of HTTP request/response transactions.
Even websockets overcome SOP and establish cross origin transport channels, they pose some challenging scenarios for a secure application deisgn.
WebSockets use masking technique to randomize the bits that are being transmitted , thus making it more difficult to generate traffic which resembles a given protocol , thus making it difficult for inspection from flowing traffic .
Jsonp is a hack designed to bypass origin restriction through script tag injection. A JSONp enabled server passes the response in user specified function
when we use <script> tags the domain limitation is ignored ie we can load scripts from any domain . So when we need to fetch get exchange data just pass callback parameters through scripts . For example
// this is the callback function executed when script returns
var script = document.createElement('script');
script.src = '//serverb.com/v1/getdata?callback=mycallback'
There have been found vulnerabilities in the existing Java and Flash consent verification techniques and handshake.
Sender and receiver are able to share media stream after a offer answer handshake. But we already need one in order to do NAT hole-punching. Presuming the ICE server is malicious , in absence of transaction IDs by stun unknow to call scripts , it is not possible for the webpage of receiver to ascertain is the data is forged or original . Thus to prevent this the browser must generate hidden transaction Id’s and should not sharing with call scripts ,even via a diagnostic interface.
With the increasing requirement of screen sharing in web app and communication systems there is always a high threat of oversharing / exposing confidential passwords , pins , security details etc . This may either through some part of screen or some notification whihc pops up .
There is always the case when the user may believe he is sharing a window when in fact they are the entire desktop.
The attacker may request screensharing and make user open his webmail , payment settings or even net-banking accounts .
When user frequently uses a site he / she may want to give the site a long-term access to the camera and microphone ( indicated by ” Always allow on this site ” in chrome ). However the site may be hacked and thus initiate call on users’ computer automatically to secretly listen-in .
Unless the user checks his laptops glowing camera light LED or goes and monitors the traffic himself he would not know if there is active call in background, which according to him he had cut off . In such a case an attacker may pretend to cut a call shows red phone signs and supportive text but still keep the session and media stream active placing himself on mute .
Even if the calling service cannot directly access keying material ,it can simply mount a man-in-the-middle attack on the connection. The idea is to mount a bridge capturing all the traffic.
To protect against this it is now mandatory to use https for using getusermedia and otherwise also recommended to keep webrtc comm services on https or use strict fingerprinting . This section is derived from Security Considerations for WebRTC draft-ietf-rtcweb-security-08
We know that the forces behind WebRTC standardization are WHATWG, W3C, IETF and strong internet working groups. WebRTC security was already taken into consideration when standards were being build for it . The encryption methods and technologies like DTLS and SRTP were included to safeguard users from intrusions so that the information stays protected.
WebRTC media stack has native built-in features that address security concerns. The peer-to-peer media is already encrypted for privacy . Figure below:
WebRTC encrypts video and audio data via the SRTP (Secure Real-Time Protocol) method ensuring that IP communications – your voice and video traffic – can not be heard or seen by unauthorized parties.
What is SRTP ?
The Secure Real-time Transport Protocol (or SRTP) defines a profile of RTP (Real-time Transport Protocol), intended to provide encryption, message authentication and integrity, and replay protection to the RTP data in both unicast and multicast applications.
Earlier models of VOIP communication such as SIP based calls had an option to use only RTP for communication thereby subjecting the endpoint users to lot of problem like compromising media Confidentiality . However the WebRTC model mandates the use of SRTP hence ruling out insecurities of RTP completely. For encryption and decryption of the data flow SRTP utilizes the Advanced Encryption Standard (AES) as the default cipher.
For such end to end media encryption the shared secret is exchanged between the endpoints.
SDES ( SDP Security Description for Media Stream) ensures that plaintext containing SDP inside a SIP packet can flow end to end securily over TLS. This was a common practise in SIP endpoints in IMS and telco eco-systems to share SRTP secret key. How inview JS stack in browser and open code access SDES is not applicable to Webrtc systesm adn are largely outdated.
Currently DTLS (Datagram Transport Layer Security) is used by webrtc endpoints to multiplex a cryptographic key exchange. For WebRTC to transfer real time data, the data is first encrypted using the DTLS method. DTLS-SRTP handshake has both ends choose “half” of the SRTP key.
(+) Already built into all the WebRTC supported browsers from the start (Chrome, Firefox and Opera).
(+) On a DTLS encrypted connection, eavesdropping and information tampering cannot take place.
(-) Primary issue with supporting DTLS is it can put a heavy load on the SBC’s handling encryption/decryption duties.
(-) Interworking DTLS-SRTP to SDES is CPU intensive
SRTP from DTLS-SRTP end flows easily
SRTP from SDESC end requires auth+decrypt, and encrypt+auth
What is DTLS ?
DTLS allows datagram-based applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. The DTLS protocol is based on the stream-oriented Transport Layer Security (TLS) protocol .
Together DTLS and SRTP enables the exchange of the cryptographic parameters so that the key exchange takes place in the media plane and are multiplexed on the same ports as the media itself without the need to reveal crypto keys in the SDP.
The media relay points which proxy media stream between endpoints can expose traffic and meta data howver due to end to end encrypted nature of WebRTC they cannot be used to decipher and listen in media packets.
It is important that WebRTC’s SRTP stream is linkedin to another SRTP endpoint and RTP-SRTP gateways should be avoided.
In the recent months everyone has been trying to get into the WebRTC space but at the same time fearing that hackers might be able to listen in on conferences, access user data, or even private networks. Although development and usage around WebRTC is so simple , the security and encryption aspects of it are in the dim light.
A simple WebRTC architecture is shown in the figure below :
By following the simple steps described below one can ensure a more secure WebRTC implementation . The same applies to healthcare and banking firms looking forth to use WebRTC as a communication solution for their portals .
Ensure that the signalling platform is over a secure protocol such as SIP / HTTPS / WSS . Also since media is p2p , the media contents like audio video channel are between peers directly in full duplex.
To protect against Man-In-The-Middle (MITM) attack the media path should be monitored regularly for no suspicious relay.
User’s that can participate in a call , should be pre registered / Authenticated with a registrar service. Unauthenticated entities should be kept away from session’s reach .
Make sure that ICE values are masked thereby not rendering the caller/ callee’s IP and location to each other through tracing in chrome://webrtc-internals/ or packet detection in Wirehsark on user’s end.
As the signalling server maintains the number of peers , it should be consistently monitored for addition of suspicious peers in a call session. If the number of peers actually present on signalling server is more that the number of peers interacting on WebRTC page then it means that someone is eavesdropping secretly and should be terminated from session access by force.
It was observed that many a times non tech savy users simply agree to all permissions request from browser without actually consciously giving consent. Therefore user’s should be made aware of API in websites which ask for undue permissions . For example permission to :
Third party API should be thoroughly verified before sending their data on WebRTC DataChannel.
Before Desktop Sharing user’s should be properly notified and advised to close any screen containing sensitive information.
For storing logs , recording , file , ssh keys or any others ensitive informaton encrypted by keys , we need a safe storage for keys and these tools are handy for password and key management – Dashlane , Lastpass , Bitwarden, 1Password so on.
Auto sign-in for WebRTC apps
Turn User Authentication On and enable Two-Factor Authentication/Bio-metrics. OTP based sign-on and captcha checks are also popular approaches to protect sign-in.
Even a WebRTC e2e encrypted connection can be tampered with on an insecure Wifi. Even though Man-in-middle cannot decipher message content, they can make out intelligible information from the packet size, frequency, end parties’ IP and ports in signalling, time delay for network detection of remote etc. For native clients, a precautionary measure is to enable Remote Lock and Data Wipe. Also advised to only use authorized apps to permit sensitive data such as image storage.
If you use a native WebRTC native app, there are mulitple thinsg that you need to be wary of.
Avoid All Jailbreaks : Jail-breaking a smartphone can enable the user to run unverified or unsupported apps, many of these apps carry security vulnerabilities. Majority of security exploits for Apple’s iOS only affect jailbroken iPhones.
Add a Mobile Security App : Mobile security reports shows that mobile operating systems such as iOS and (especially) Android are increasingly becoming targets for malware. Select a reputable mobile security app that extends the built-in security features of the device’s mobile operating system. Some well-known third-party security vendors offering mobile security apps for iOS, Android and Windows Phone – Avast, Kaspersky, Symantec
Also as a good practise Turn off the Bluetooth, Wi-Fi and NFC when not needed.
Although WebRTC already has best secure tools in its spec list which provide end to end encrypted communication over SRTP DTLS as well as media device access mandatory from websites of secure origin over TLS, yet if the endpoints acting as peers themselves are compromised then all this is in vain . Hence security issues arises when
Endpoints are recording their media content and storing it on unsafe location such as public file servers
Endpoints are inturn re-streaming their incoming their media streams to unsafe streaming servers
Phishing , Pretexting , Baiting attacks , Quid pro quo , Tailgating , Water-Holing are soe of the common tactics to steal teh data of a nonsuspecting user . They are as much applicate to WebRTC based communication site as they are to any other trusted website such as banking , customer care contacts , falsh sale portals , cupon / discount sites etc .
Phonephishing – Voice phishing a criminal phone fraud, imporsonating legitimate caller such as a bank or tax agent and using social engineering over the telephone system to gain access to private personal and financial information for the purpose of financial reward.
Phishing – WebRTC data channel messages can be used as a method to do phishing by send malicious links posing as legitmate sender. It is hard to track such attacks since the data channels are p2p.
Impersonation attacks – spear-phishing , emails that attempt to impersonate a trusted individual or a company in an attempt to gain access to corporate finances , Human resource details , sesitive data. Business email compromise (BECs) also known as CEO fraud is a popular example of an impersonation attack. The fake email usually describes a very urgent situation to minimize scrutiny and skepticism.
Other social engineering tactics – Trickery , Influencing , Deception , Spying
Network security breaches
Inspite of the fact that webrtc is a p2p streaming framework , there are always signalling server required which do the initial handshake and enable the exchange fo SDP for the media to stream in peer to peer fashion . Some wellknown attacks that compromise networks and remote / cloud server are :
Viruses, worms and Trojan horses
Denial of service attacks
Spyware and adware
It is upto the WebRTC/ VoIP service provider to detect emerging threats before they infiltrate network and compromise data. Some crticial compoenets to enhance security are Firewalls , Access Control Lists , Intrusion detection and prevention systems (IDS/IPS) , Virtual private networks (VPN)
GovernanceFramework – defines the roles, responsibilities and accountability of each person and ensures that you are meeting compliance.
Confidentiality: ensures information is inaccessible to unauthorized people via encryption
Integrity: protects information and systems from being modified by unauthorized people; provides accuracy and trustworthyness
Availability: ensures authorized people can access the information when needed and that all hardware and software are maintained properly and updated when necessary
Authentication, Authorization and Accountability(AAA): validate users autheticity via creds, enforcing policies on network resources after the user has gain access and ensuring accountability by means of monitoring and capturing the events done by the user
Non repudiation: is the assurance that someone cannot deny the validity of something. It provides proof of the origin of data and the integrity of the data.
As a first defence tactic , if a orignation ip address is sending malacious or malformed packets which depict an exploitation or attack , trigger and notification for tech team and execute script to block on the origin IP of attacker via security groups in AWS or other ACL list in hosted server . Can also implement temporary firewall block on it and later monitor it for more violations.
Incase a server is compromised beyond repair such as attacker taking control of the file system, drain the ongoing sessions from it and store cached storage with session state variable like CDR enteries. Activate the fallback / standby server and make the current server a honeypot to explore the attackers actions. Common attacks involve either of below techniques:
exploiting the VoIP system to get free internatoinal calls
ransomware activities such as scp the files out of server and leaveing behind a readme.txt file on root location asking for money transfer in return of data
bombard brute force DDOS attacks to bring down the system and make it incapible of catering to genuine requests , perhaps with the inetention of giving advantage to competitors.
As the media connections are p2p, even if we kill the signalling server, it will not affect the ongoing media sessions. Only the time duration ( probably 3 – 4 minutes ) it takes to restart the server , is when the users will not be able to connect to signalling server for creating new sessions. Therefore incase a system is under attack and non recoverable, just terminate it and respawn other server attaced to the domain name or floating IP or Load balancer.
Most browsers today like Google Chrome and Mozilla Firefox have a good record of auto-updating themselves withing 24 hours of a vulnerability of threat occurring.
Third party Call Control ( 3PCC)
If a call is confirmed to be compromised, it should be within the power of Web Application server rendering the WebRTC capable page to cut off the compromised call session by force censing termation request to endpoints or via turning off the TURN services if in use.