In the course of evolution of RAN ( Radio Access layer) technologies 5G outsmarts 4G-2010 which comes in succession after 3G-2000 , 2.5G, 2G -1990 and 1G / PSTN -1980 respectively . Among the mosy striking features of 5G –

  • entirely IP based
  • ability to connect 100x more devices ( IOT favourable )
  • speed upto 10 Gbit/s
  • high peak bit rate
  • high data volume per unit area
  • virtually 0 latency hence high response time

Thus it can accomodate the rapid growth of rich mulimedia application like OTT streaming of HD content, gaming , Augmented reality so on while enabling devices connected to Internet of Things sto onboard the telecommunication backbone with high system spectral efficiency and ubiquitious connectivity .

Infact 5G has seen maximum investment in year 2020 in revamping infrastrcuture as compared to other technologies such as IoT or even Cloud. This could be partly due to high rise in high speed communication for streaming and remote communication owining to steep rise in remote learning adn working from home scenarious.

img source statista – global-telecom-industry-priority-investment-areas


5G is specified to operate over range 1 GHz to 100 GHz.

  • Low-band spectrum (below 2.5 GHz) provides excellent coverage,
  • mid- band spectrum (2.5–10 GHz) provides a combination of good coverage and very high bitrates,
  • high band-spectrum (10–100 GHz) provides the bandwidths needed for the highest bitrates (up to 20 Gb/s) and lowest latencies

Workplan for 5G standardisation and release

The Workplan started in 2014 and is ongoing as of now (2018)

image source : 3GPP “Getting ready for 5G”

3GPP is the standard defining body for telecom and has specified almost all RAN technologies like GSM , GPRS , W-CDMA , UMTS , EDGE , HSPAand LTE before .

Applications of 5G

5G targets three main use case

  • enhanced mobile broadband (eMBB),
  • massive machine type communications (mMTC)
  • ultra-reliable low latency communications (URLLC) (also called critical machine type communications (cMTC))
sources : whitepaper ericsson

General Data Protection Regulation (GDPR) in VoIP

GDPR , Europe’s digital privacy legislation passed in 2018, replaces the 1995 EU Data Protection Directive. It is rules designed to give EU citizens more control over their personal data & strengthen privacy rights. It aims to simplify the regulatory environment for business and citizens.

To read about other Certificates , compliances and Security in VoIP which summaries

  • HIPAA (Health Insurance Portability and Accountability Act) ,
  • SOX( Sarbanes Oxley Act of 2002),
  • Privacy Related Compliance certificates like COPPA (Children’s Online Privacy Protection Act ) of 1998,
  • CPNI (Customer Proprietary Network Information) 2007,
  • GDPR (General Data Protection Regulation)  in European Union 2018,
  • California Consumer Privacy Act (CCPA) 2019,
  • Personal Data Protection Bill (PDP) – India 2018 and
  • also specifications against Robocalls and SPIT ( SPAM over Internet Telephony) among others

Multinational companies will predominantly be regulated by the supervisory authority where they have their “main establishment” or headquarter. However, the issue concerning GDPR is that it not only applies to any organisation operating within the EU, but also to any organisations outside of the EU which offer goods or services to customers or businesses in the EU.

Key Principles of GDPR are

  • Lawfulness, fairness and transparency
  • Purpose limitation
  • Data minimisation
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality (security)
  • Accountability

GDPR consists of 7 projects (DPO, Impact assessment, Portability, Notification of violations, Consent, Profiling, Certification and Lead authority) that will strengthen the control of personal data throughout the European Union.


stakeholders of data protection regulation are
Data Subject – an individual, a resident of the European Union, whose personal data are to be protected

Data Controller – an institution, business or a person processing the personal data e.g. e-commerce website.

Data Protection Officer – a person appointed by the Data Controller responsible for overseeing data protection practices.

Data Processor – a subject (company, institution) processing a data on behalf of the controller. It can be an online CRM app or company storing data in the cloud.

Data Authority – a public institution monitoring implementation of the regulations in the specific EU member country.

Extra-Territorial Scope

Any VoIP service provider may feel that since they are not based out of EU such as officially headquartered in the Asia Pacific or US region they may not be legally binding to GDPR. However, GDPR expands the territorial and material scope of EU data protection law.  It applies to both controllers and processors established in the EU, and those outside the EU, who offer goods or services to or monitor EU data subject.

VoIP service providers as Data Processors

A processor is a “person, public authority, agency or other body which processes personal data on behalf of the controller”.
Most VoIP service providers are multinational in nature with services offered directly or indirectly to all regions. The GDPR imposes direct statutory obligations on data processors, which means they will be subject to direct enforcement by supervisory authorities, fines, and compensation claims by data subjects. However, a processor’s liability will be limited to the extent that it has not complied with it’s statutory and contractual obligations.

Data minimization – It is now a good practise to store and process as less user’s personal data as necessary to render our services effectively. Also to maintain data for only a stipulated time ( approx 90 days of CDR for call details and logs )

Record Keeping, Accountability and governance

To show compliance with GDPR, a service provider maintain detailed records of processing activities. Also, they must implement technological and organisational measures to ensure, and be able to demonstrate, that processing is performed in accordance with the GDPR. Some ways to apply these are :

  • Contracts: putting written contracts in place with organisations that process personal data on your behalf
  • maintaining documentation of your processing activities
  • Organisational policies focus on Data protection by design and default – two-factor auth, strong passwords to guard against brute-force, encryption, focus on security in architecture
  • Rish analysis and impact assessments: for uses of personal data that are likely to result in a high risk to individuals’ interests
  • Audit by Data protection officer
  • Clear Codes of conduct
  • Certifications

As for a VOIP landscape thankfully every call or message session is followed by a CDR ( Calld Detail Record ) or MDR ( Message Detail Record).

Additionally, assign a unique signature to every data-access client the VoIP system and log every read/write operation carried out on data stores whether persistent datastores or system caches.

Privacy Notices to Subjects

User profile data such as :

  • Basic identity information, name, address and ID numbers
  • Web data such as location, IP address, cookie data and RFID tags
  • Health and genetic data
  • Bio-metric data
  • Racial or ethnic data
  • Political opinions
  • Sexual orientation

is protected strictly under GDPR rules

A service provider should provide indepth information to data subjects when collecting their personal data, to ensure fairness and transparency. They must provide the information in an easily accessible form, using clear and plain language.


The GDPR introduces a higher bar for relying on consent , requiring clear affirmative action. Silence, pre ticked boxes or inactivity will not be sufficient to constitute consent. Data subjects can withdraw their consent at any time, and it must be easy for them to do so.

Lawful basis for processing Data now include

In Article 6 of the GDPR , there are six available lawful bases for processing.

(a) Consent: the individual has given clear consent for you to process their personal data for a specific purpose.

(b) Contract: the processing is necessary for a contract you have with the individual, or because they have asked you to take specific steps before entering into a contract.

(c) Legal obligation: the processing is necessary for you to comply with the law (not including contractual obligations).

(d) Vital interests: the processing is necessary to protect someone’s life.

(e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law.

(f) Legitimate interests: the processing is necessary for your legitimate interests or the legitimate interests of a third party, unless there is a good reason to protect the individual’s personal data which overrides those legitimate interests.

File such as PCAPS , Recordings and transcripts of calls hold sensitive information from end users , these should be encryoted and inaccssible to even the dev teams within the org without explicit consent of end user .

Individuals’ Rights

The GDPR provides individuals with new and enhanced rights to Data subjects who will have more control over the processing of their personal data. A data subject access request can only be refused if it is manifestly unfounded or excessive, in particular because of its repetitive character.

Rights of Data Subjets include

  • Right of Access
  • Right to Rectification
  • Right to Be Forgotten
  • Right to Restriction of Processing
  • Right to Data Portability
  • Right to Object
  • Right to Object to Automated Decisionmaking

For a VoIP service provider if a user opts for redaction then none of his calls or messages should be traced in logs . Also replace distinguishable end user identifier such as phone number and sip uri with *** charecters

Provide option for “Account Deletion” and purge account – If a user wished to close his/her account , his/her detaisl should be deleted form the sustem except for the bare bones detaisl which are otherwise required for legal , taxation and accounting requirnments

Breach Notification

A controller is a “person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of processing of personal data”,

A controller will have a mandatory obligation to notify his supervisory authority of a data breach within 72 hours unless the breach is unlikely to result in a risk to the rights of data subjects. Will also have to notify affected data subjects where the breach is likely to result in a “high risk” to their rights. A processor, however, will only be obliged to report data breaches to controllers

International Data Transfers

Data transfers to countries outside the EEA(European Economic Area) continue to be prohibited unless that country ensures an adequate level of protection. The GDPR retains existing transfer mechanisms and provides for additional mechanisms, including approved codes of conduct and certification schemes.

The GDPR prohibits any non-EU court, tribunal or regulator from ordering the disclosure of personal data from EU companies unless it requests such disclosure under an international agreement, such as a mutual legal assistance treaty.

One of the biggest challenges for a service provider is the identification & categorization of GDPR impacted data sets in disparate locations across the enterprise. A dev team must flag tables, attributes and other data objects that are categorically covered under GDPR regulations and then ensure that they are not transferred to a server outside of EU.

In the present age of Virtual shared server instance, cloud computing and VoIP protocol it is operational a very tough task for a communication service provider to ensure that data is not transferred outside of EU such as a VoIP call from origination in US and destination in EU will require information exchanges via SDP, vcard , RTP stream via media proxies etc.


The GDPR provides supervisory authorities with wide-ranging powers to enforce compliance, including the power to impose significant fines. You will face fines of up to €20m or 4% of your total worldwide annual turnover of the preceding financial year. In addition, data subjects can sue you for pecuniary or non-pecuniary damages (i.e. distress). Supervisory authorities will have a discretion as to whether to impose a fine and the level of that fine.

Data Protection officer (DPO)

Under the terms of GDPR, an organisation must appoint a Data Protection Officer (DPO) if it carries out large-scale processing of special categories of data, carries out large scale monitoring of individuals such as behaviour tracking or is a public authority.

Reference :

Media Architecture

To learn about the difference between

  • centralized vs decentralised
  • SFU vs MCU ,
  • multicast vs unicast ,

read – SIP conferecning and Media Bridge

To read more about buildinga scalable VoIP Server Side architecture and

  • Clustering the Servers with common cache for High availiability and prompt failure recovery
  • Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
  • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
  • Containerization and Autoscalling

Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Media Server Topology

Point to Point

endpoints communicating over unicast
RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session
only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports

Point to Point via Middlebox

Same as above but with a middle-box involved


mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion
does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

subtypes :

Transport/Relay Achoring

Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server
Middleboxes for auditing or privacy control of particpant’s IP
Other SBC ( Session Border Gateways) like charecteristics are also part of this topology setup

Transport translator

interconnecting networs like mutlicast to unicast
media repacktization to allow other media to connect tgo the session like non RTP protocols

Media transaltor

modified the media inside of RTP streams commonly known as transcoding
can do uptp full encoding / decoding of RTP streams
in many cases it can also act of behalf of non RTP supported endpoints , receivinga nd repsosnidng to feedback reports ad performing FEC ( forward error corrected )

Back-To-Back RTP Session

Mostly like middlebox like trnslator but establishes separte legs RTP session with the endpoints , bridging the two sessions.

takes compleet repsososibility of forwarding teh correct RTP payload and maianting the realtion between the SSRC and CNAMEs

Point to Point using Multicast

Any-Source Multicast (ASM)

traffic from any particpant sent to the multicat group address reaches all other partcipants

Source-Specific Multicast (SSM)

Selective Sender stream to the multicast group which streams it to the recibers

Point to Multipoint using Mesh

many unicast RTP streams making a mesh

Point to Multipoint + Translator

some more varients of this topoplogy are Point to Multi point with Mixer

Media Mixing Mixer

receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through
static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

Media Swicthing Mixer

RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

The Mixer can reduce bitrtae or switch between sources like active speaker.

SFU ( selective Forwarding Unit)

middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This gtramsission is set up is independant RTP Session.

extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

On a high level , one can safely assume that for no of peers between 3-6 mesh archietctures make sense however any number above it require centralized media archietcture .

Among teh centralized media archietctures , SFU makes sense for atmomst 6-15 people in a confernece however is teh number of participants exceed that it may need to switch to MCU mode. However there is another architecture which works on Hybrid mode

Point to Multipoint Using Video-Switching MCUs

much like MCU but MCU can switch the bitrate and resilution stream based on active speaker , host or ppt presenter , floor control like charecteristics
This setup can embed the charecteristics of trabnslator , sleector and can ecen do congesyion congrol based on RTCP

To handle a multipoint confernece scenario it acts as a transaltor forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modify the RTCP RRs it forwards between the domains

Transport Protocols

Before getting into indepth discussion of all possible types of Media Archietctures in VoIP system , lets learn about TCP vs UDP

TCP is a reliable connection oriented protocol which sends REQ and receives ACK to establish connection between cmmunicating parties . It sequeentiallys ends packets which can be resent inidvidually when the receiver reciognizes out of order packets . It is thus used for session creation due to its errorx correction and congestion control features .

Once a session is established it automatically shifts to RTP over UDP . UDP even though not as reliable , not guarrenting non-duplication and delivery error correction is used due to its tunneling methodds where packets of other protcols are encapsulated isnide of UDP packet. However to provide ened to end security other methods for Auth and encryption

Audio PCAP storage and Privacy constraints for Media Servers

A Call session produces various traces for offtime monitoring and analysis which can include

CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

References :

RFC 7667 RTP Topologieshttps://tools.ietf.org/html/rfc7667

WebRTC Audio/Video Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.

RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

Quick links : If you are new to WebRTC read : Introduction to WebRTC is at https://telecom.altanai.com/2013/08/02/what-is-webrtc/ and Layers of WebRTC at https://telecom.altanai.com/2013/07/31/webrtc/

WebRTC Media Stack

Media Stream Trcaks in WebRTC

The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

Media Flow in VoIP system
Media Flow in WebRTC Call


Video Capture insync with hardware’s capabilities

WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

Video Capture Resolution

Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

SDP attributes for resolution, frame rate, and bitrate

SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

Dynamic FPS control based on actual hardware encoding :

video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

Stream Orientation

support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer


WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

WebRTC Browsers MUST implement the VP8 video codec as described in
RFC6386] and H.264 Constrained Baseline as described in [H264].

RFC 7442 WebRTC Video Codec and Processing Requirements

most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding.

Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123

a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100

a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=102

a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=127

a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125

a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 red/90000
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=124


Developed by on2 and then acquired and opensource by google . Now free of royality fees.

Supported conatiner – 3GP, Ogg, WebM

No limit on frame rate or data rate and provides maximum resolution of 16384×16384 pixels.

libvpx encoder library.

VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.
encode and decode pixels with an implied 1:1 (square) aspect ratio.

supported simulcast


Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates .

Open and free of royalties and any other licensing requirements
Its supported Containers are – MP4, Ogg, WebM

H264/AVC constrained

AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC

Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

Other profiles , which are not supporedt are Baseline (BP) , Extended (XP), Main (MP) , High (HiP) , Progressive High (ProHiP) , High 10 (Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

Its supported containers are 3GP, MP4, WebM

Parameter settings:

  • packetization-mode
  • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
  • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
  • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

It is a propertiary , patented codec , mianted by MPEG / ITU

AV1 (AOMedia Video 1)

open format designed by the Alliance for Open Media
royality free
especially designed for internet video HTML element and WebRTC
higher data compression rates than VP9 and H.265/HEVC

offers 3 profiles in increasing support for color depths and chroma subsampling.
high, and

supports HDR
supports Varible Frame Rate

Supported container are ISOBMFF, MPEG-TS, MP4, WebM

Stats for Video based media stream track

timestamp 04/05/2020, 14:25:59
ssrc 3929649593
isRemote false
mediaType video
kind video
trackId RTCMediaStreamTrack_sender_2
transportId RTCTransport_0_1
codecId RTCCodec_1_Outbound_96
[codec] VP8 (payloadType: 96)
firCount 0
pliCount 9
nackCount 476
qpSum 912936
[qpSum/framesEncoded] 32.86666666666667
mediaSourceId RTCVideoSource_2
packetsSent 333664
[packetsSent/s] 29.021823604499957
retransmittedPacketsSent 0
bytesSent 342640589
[bytesSent/s] 3685.7715977714947
headerBytesSent 8157584
retransmittedBytesSent 0
framesEncoded 52837
[framesEncoded/s] 30.022576142586164
keyFramesEncoded 31
totalEncodeTime 438.752
[totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
totalEncodedBytesTarget 335009905
[totalEncodedBytesTarget/s] 3602.7091371103397
totalPacketSendDelay 20872.8
[totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
qualityLimitationReason bandwidth
qualityLimitationResolutionChanges 20
encoderImplementation libvpx
Graph for Video Track in chrome://webrtc-internals

Other RTP parameters

RTX(regtranmission ) – packet loss recovery technique for real-time applications with relaxed delay bounds.

Non WebRTC supported Video codecs

Need active realtime media transcoding


Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.
suited for low bandwidth networks
Although it is not comaptible with WebRTC but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

H.265 / HEVC

proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

Container – Mp4

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints


Audio Level

audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

normalization considering frequencies above 300 Hz, regardless of the sampling rate used.

adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

GAIN calculation

  • If the endpoint has control over the entire audio-capture path like a regular phone
    the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
  • If the endpoint does not have control over the entire audio capture like software endpoint
    then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
  • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

Acoustic Echo Cancellation (AEC)

Endpoints shoudl allow echo control mechsnisms


WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000


stabdardised by IETF

container- Ogg, WebM, MPEG-TS, MP4

supportes multiple comptression algorithms

For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommenda that Opus be offered before PCMA/PCMU.

AAC (Advanvced Audio Encoding)

part of the MPEG-4 (H.264) standard
supported congainers – MP4, ADTS, 3GP

Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

G.711 (PCMA and PCMU)

ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
vital to interface with the standard teelcom network and carriers

Fixed 64Kbpd bit rate

supports 3GP container formats

G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU


ncoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression
conatiners used 3GP, AMR-WB

Comfort noise (CN)

artificial background noise which is used to fill gaps in a transmission instead of using pure silence

avoids jarring or RTP Timeout

for streams encoded with G.711 or any other supported codec that does not provide its own CN.
Use of Discontinuous Transmission (DTX) / CN by senders is optional

Internet Low Bitrate Codec (iLBC)

opensource narrow band codec
designed specifically for streaming voice audio

Internet Speech Audio Codec (iSAC)

designed for voice transmissions which are encapsulated within an RTP stream.

DTMF and ‘audio/telephone-event’ media type

endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

DTMF events list
| 0 | DTMF digit “0”
| 1 | DTMF digit “1”
| 2 | DTMF digit “2”
| 3 | DTMF digit “3”
| 4 | DTMF digit “4”
| 5 | DTMF digit “5”
| 6 | DTMF digit “6”
| 7 | DTMF digit “7”
| 8 | DTMF digit “8”
| 9 | DTMF digit “9”
| 10 | DTMF digit “*”
| 11 | DTMF digit “#”
| 12 | DTMF digit “A”
| 13 | DTMF digit “B”
| 14 | DTMF digit “C”
| 15 | DTMF digit “D”

Stats for Audio Media track

timestamp 04/05/2020, 14:25:59
ssrc 3005719707
isRemote fals
mediaType audio
kind audio
trackId RTCMediaStreamTrack_sender_1
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
[codec] opus (payloadType: 111)
mediaSourceId RTCAudioSource_1
packetsSent 88277
[packetsSent/s] 50.03762690431027
retransmittedPacketsSent 0
bytesSent 1977974
[bytesSent/s] 150.11288071293083
headerBytesSent 2118648
retransmittedBytesSent 0
Graphs in chrome://webrtc-internals for Audio


m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE

Stats for Datachannel

Statistics RTCDataChannel_1
timestamp 04/05/2020, 14:25:59
label sctp
datachannelid 1
state open
messagesSent 1
[messagesSent/s] 0
bytesSent 228
[bytesSent/s] 0
messagesReceived 1
[messagesReceived/s] 0
bytesReceived 228
[bytesReceived/s] 0

Refrenecs :

Attacks on SIP Networks

Major standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling protocol for services such as LTE , VoIP, conferencing, Video on Demand (VoD), IPTV (Internet Television), presence, and Instant Messaging (IM) etc. With the continous evolution of SIP as the defacto VoIP protocol , we need to underatdn the risk mitigartion practices around it .

I have written about VoIP and security in these blogs before

For Security around web browser based calling via webrtc i have written

  • Webrtc Security , which describes browser threat modal , access to local resource , Same Orogin Policy (SOP) and Cross Resource Sharing ( CORS) as well as Location sharing , ICE , TUEN and threats to privacy with screen sharing , microgone camera long term access and probable mid call attacks .
  • Genric secrutity of web Application build around hosting platform of webrtc. Includs concepts like Identity management , browser security – cross site security amd clickjacking , Authetication of devices and applications , Media Encryption and regex checking.

Also Written about VoIP security at protocol level with SRTP /DTLS using TLS and specifically using available pre added security modules on kamailio SIP server. It describes Sanity checks , ACL lists with permissions , hiding topology details , countering Flood using pike and Fail2Ban as well as Traffic monitoring and detection .

In this article we will cover types of attacks on SIP systems

Types of attacks on SIP based systems

Registration Hijacking

malicious registrations on registrar by a third party who modifies From header field of a SIP request.

exmaple implementation :
attacker de-registers all existing contacts for a URI
attacker can also register their own device as the appropriate contact address, thereby directing all requests for the affected user to him

solution – Autheticaion of user

Impersonating a Server

attacker impersonates the remote server
user’s request can now be intercepted by some other party
user’s request may be forwarded to insecure locations

Solution –

confidentiality, integrity, and authentication of proxy servers

Proxy/redirect sever, and registrars SHOULD possess a site certificate issued by CA which could be validated by UA

Temparing Message bodies

If users are relying on SIP message bodies to communicate either of

  • session encryption keys for a media session
  • MIME bodies
  • SDP
  • encapsulated telephony signals
    Then the atackers on proxy server can modify the session key or can act as a man-in-the-middle and do eaves droppng

exmaple implementation :
attacker can point RTP media streams to a wiretapping device
can changes Subject header field to appear to users as spam

solution – end to end ecryption over TLS + Digest Authorization

mid-session threats like tearing down session

Request forging
attacker learns the params of the session like To , From tags etc then he can alter ongoing session parameters and even bring it down

example implementation :
attacker inserts a BYE in a ongoing session thereby tearing it down
can insert re INVITE and redierct the stream to wiretaping device

solution – authetication on every request
signing and encrypting of MIME bodies, and transference of credentials with S/MIME

Denial of Service and Amplification

DOS attacks – rendering a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.
dDOS – multiple network hosts to flood a target host with a large amount of network traffic.

Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

exmaple implementation :
attackers creates a falsified source IP address and a corresponding Via header field that identify a targeted host as the originator of the request. Then send this to large number of SIP network element . This geneerates DOS aimed at target.

attackers uses falsified Route header field values in a request that identify the target host and then send such messages to forking proxies that will amplify messaging sent to the target.

Flooding with register attacks can deplete available memory and disk resources of a registrar by registering huge numbers of bindings.
Flooding a stateful proxy server causes it to consume computational expense associated with processing a SIP transaction

Solution –
detect flooding and pike in traffic and use ipban to block
challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required), forgoing the normal response retransmission algorithm, and thus behaving statelessly towards unauthenticated requests.

Security mchanisms

Full encryption vs hop by hop encrption

SIP mssages cannot be encrypted end-to-end in their entirety since
message fields such as the Request-URI, Route, and Via need to be visible to proxies in most network architectures
so that SIP requests are routed correctly.
proxy servers need to also update the message with via headers

Thus SIP uses low level security along with hop by hop encrption and auth headers to verify the identity of proxy servers

Transport and Network Layer Security

IPsec – used where set of hosts or administrative domains have an existing trust relationship with one another.

TLS – used where hop-by-hop security is required between hosts with no pre-existing trust association.


Used as an address-of-record for a particular user, signifies that each hop over which the request is forwarded, must be secured with TLS

HTTP Authentication

Reuse of the HTTP Digest authentication via 401 and 407 response codes that implement challenge for autehtication
provides replay protection and one-way authentication.


allows SIP UAs to encrypt MIME bodies within SIP, securing these bodies end-to-end without affecting message headers.
provides end-to-end confidentiality and integrity for message bodies


provides replay protection

SIP over TLS

SIP messages can be secured using TLS. There is also TLS for Datagrams called DTLS.

Security of SIP signalling is different from security of protocols used in concert with SIP like RTP , RTCP. and that will be covered in later topics of this article.

TLS operation consists of two phases: handshake phase and bulk data encryption phase

Handshake phase

Prepare algorithm to be used during TLS session

Server Authentication

server sends its certificate to the client, which then verifies the certificate using a certificate authority’s (CA’s) public key.

Client Authentication

Server sends an additional CertificateRequest message to request the client’s certificate. The client responds with

  1. Certificate message containing the client certificate with the client public key and
  2. CertificateVerify message containing a digest signature of the handshake messages signed by clients private key

Server authenticates client by client’s public key , since only client holding correct private key can sign the message.

prepare the shared secret for bulk data encryption

client generate a pre_master_secret, and encrypt it using the server’s public key obtained from the server’s certificate. The server decrypts the pre_master_secret using its own private key.
Both the server and client then compute a master_secret they share based on the same pre_master_secret. The master_secret is further used to generate the shared symmetric keys for bulk data encryption and message authentication

Public key cryptographic operations such as RSA are much more expensive than shared key cryptography. This is why TLS uses public key cryptography to establish the shared secret key in the handshake phase, and then uses symmetric key cryptography with the negotiated shared secret as the data encryption key.

Stateless proxy servers do not maintain state information about the SIP session and therefore tend to be more scalable. However, many standard application functionalities, such as authentication, authorization, accounting, and call forking require the proxy server to operate in a stateful
mode by keeping different levels of session state information.

Steps :

  1. The SIP proxy server enforces proxy authentication with
    407 Proxy Authentication Required challenge.
  2. UAC provides credentials that verify its claimed identity (e.g., based on MD5 [34] digest algorithm) and retransmits in authorization header

Security of RTP

confidentiality protection of the RTP session and integrity protection of the RTP/RTCP packets requires source authentication of all the packets to ensure no man-in-the-middle (MITM) attack is taking place.

end to end media encryption – SRTP ( Secure RTP )

encodes the voice into encrypted IP packages and transport those via the internet from the transmitter  to receive


  • The Impact of TLS on SIP Server Performance – Charles Shen† Erich Nahum‡ Henning Schulzrinne† Charles Wright , Department of Computer Science, Columbia University,IBM T.J. Watson Research Center

HTTP/2 – offer/answer signaling for WebRTC call

HTTP ( Hyper Text Transfer Protocol ) is the top application layer protocol atop the Tarnsport layer ( TCP ) and the Network layer ( IP )


release in 1997. Since HTTP/1 allowed only 1 req at a time , HTTP/1.1

Allows one one outstanding connection on a TCP session but allowed request pieplinig to achieve concurency.


In 2015, HTTP/2 was released which aimed at reducing latency while delivering heavy graphics, videos and other media cpmponents on web page especially on mobile sites .
optimizes server push and service workers


A key differenet between Http/1.1 and HTTP/2 is the fact that former transmites requests and reponses in plaintext whereas the later encapsulates them into binary format , proving more features and scope for optimzation.

Thus at protocol level , it is all about frames of bytes which are part of stream.

“enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.”

Hypertext Transfer Protocol Version 2 (HTTP/2) draft-ietf-httpbis-http2-latest

It is important to know that Browsers only implement HTTP/2 under HTTPS, thus TLS connection is must for whichw e need certs ad keys signed by CA ( either self signed using openssl , signed by public CA like godaddy , verisign or letsencrypt)

Compatibility Layer between HTTP1.1 and HTTP2.0 in node

Nodejs >9 provides http2 as native module. Exmaple of using http2 with compatibility layer

const http2 = require('http2');
const options = {
 key: 'ss/key', // path to key
 cert: 'ssl/cert' // path to cert

const server = http2.createSecureServer(options, (req, res) => {
    req.addListener('end', function () {
        file.serve(req, res);

in replacement for existing server http/https server

const https = require('https');
app = https.createServer(options, function (request, response) {
    request.addListener('end', function () {
        file.serve(request, response);


Socket.io/ Websocket over HTTP2

The WebSocket Protocol uses the HTTP/1.1 Upgrade mechanism to transition a TCP connection from HTTP into a WebSocket connection

Due to its multiplexing nature, HTTP/2 does not allow connection-wide header fields or status codes, such as the Upgrade and Connection request-header fields or the 101 (Switching Protocols) response code. These are all required for opening handshake.

Ideally the code shouldve looekd like this with backward compatiability layer , but continue reading update ..

var app = http2
    .createSecureServer(options, (req, res) => {
        req.addListener('end', function () {
            file.serve(req, res);

var io = require('socket.io').listen(app);
io.on('connection', onConnection); // evenet handler onconnection

Error during WebSocket handshake: Unexpected response code: 403

update May 2020 : I tried using the http2 server with websocket like mentioned above ,h owever many many hours of working around WSS over HTTP2 secure server , I consistencly kept faccing the ECONNRESET issues after couple of seconds , which would crash the server

client 403

Therefore leaving the web server to server htmll conetnt I reverted the siganlling back to HTTPs/1.1 given the reasons for sticking with WSS is low latency and existing work that was already put in.

Example Repo : https://github.com/altanai/webrtcdevelopment/tree/htt2.0

Reading Further of exploring HTTP CONNECT methods for setting WS handshake . Will update this section in future if it works .


A “stream” is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection.
A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.

Core http2 module provides new core API (Http2Stream), accessed via a “stream” listener:

const http2 = require('http2');
const options = {
 key: 'ss/key', // path to key
 cert: 'ssl/cert' // path to cert

const server = http2.createSecureServer(options, (stream, headers) => {
    stream.respond({ ':status': 200 });
    stream.end('some text!');

Other features

  • stream multiplexing
  • stream Prioritization
  • header compression
  • Flow Control
  • support for trailer

Persistent , one connection per origin.

With the new binary framing mechanism in place, HTTP/2 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP/2 connections are persistent, and only one connection per origin is required,

Server Push

bundling multiple assets and resources into a single HTTP/2  and lets the srever proactively push resources to client’s cache .

Server issues PUSH_PROMISE , client validates whether it needs the resource of not. If the client matches it then they will load like regular GET call

The PUSH_PROMISE frame includes a header block that contains a complete set of request header fields that the server attributes to the request.

After sending the PUSH_PROMISE frame, the server can begin delivering the pushed response as a response on a server-initiated stream that uses the promised stream identifier.

Client receives a PUSH_PROMISE frame and can either chooses to accept the pushed response or if it does not wish to receive the pushed response from the server it can can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream’s identifier.

Push Stream Support


respondWithFile() and respondWithFD() APIs can send raw file data that bypasses the Streams API.

Related technologies


Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.

Email messages + MIME : transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

MIME in HTTP in WWW : servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.

MIME header fields

MIME version

MIME-Version: 1.0

Content Type

Content-Type: text/plain

multipart/mixed , text/html, image/jpeg, audio/mp3, video/mp4, and application/msword

content disposition

Content-Disposition: attachment; filename=genome.jpeg;
  modification-date="Wed, 12 Feb 1997 16:29:51 -0500";


References :

Certificates, compliances and Security in VoIP

This article describes various Certificates and compliances, Bill and Acts on data privacy, Security and prevention of Robocalls as adopted by countries around the world pertaining to Interconnected VoIP providers, telecommunications services, wireless telephone companies etc

Compliance certificates by Industry types

HIPAA (Health Insurance Portability and Accountability Act)

Deals with privacy and security of personal medical records and electronic health care transaction

Applicability  : If voip company handles medical information

Includes : 

  • Not allowed Voice mail transcription
  • Should have End-to-End Encryption
  • Restrict  using unsecured WiFi networks to prevent Snooping
  • User security , strong password rules  and mandatory monthly change
  • Secure Firmware on VoIP phones
  • Maintaining Call and Access Logs

SOX( Sarbanes Oxley Act of 2002)

Also known as SOX, SarbOX or Public Company Accounting Reform and Investor Protection Act

Applicability : if managing the communications operations of a regulated, publicly traded company 

Includes : 

  • Retain records which include financial and other sensitive data
  • ways employees are provided or denied access to records or data based on their roles and responsibilities
  • do information audit by a trusted third party. 
  • Retention and deletion of files such as audio files like voicemails, text messages, video clips, declared paper records, storage, and logs of communications activities
  • Physical and digital security controls around cloud-based VoIP applications and the networks

Privacy Related Compliance certificates

COPPA (Children’s Online Privacy Protection Act ) of 1998 

prohibits deceptive marketing to children under the age of 13, or collecting personal information without disclosure to their parents. 

any information is to be passed on to a third party, must be easy for the child’s guardian to review and/or protect

2011 amendment  requires that the data collected was erased after a period of time,

2014 FTC issued guidelines that apps and app stores require “verifiable parental consent.”

CPNI (Customer Proprietary Network Information) 2007

CPNI (Customer Proprietary Network Information) in united states is the information that communication providers  acquire about their subscribers. This Individually identifiable information that is created by a customer’s relationship with a provider, such as data about the frequency, duration, and timing of calls, the information on a customer’s bill, and call identifying information. This processing information is governed strictly by FCC and certification should be renewed on an annual basis

Provider can pass along that information to marketers to sell other services, as long as the customer is notified

In 2007, the FCC explicitly extended the application of the Commission’s CPNI rules of the Telecommunications Act of 1996 to providers of interconnected VoIP service.


Communications Assistance for Law Enforcement Act (CALEA) conduct electronic surveillance by imposing specific obligations on “telecommunications carriers” for assisting law enforcement, including delivering call interception and call identification functionality to the government with a minimum of interference to customer service and privacy.

Read more about CALEA and its roles in VoIP here Regulatory and Legal Considerations with WebRTC development

GDPR (General Data Protection Regulation)  in European Union 2018

Supersedes the 1995 Data Protection Directive

Establishes requirements of organizations that process data, defines the rights of individuals to manage their data, and outlines penalties for those who violate these rights.

No personal data may be processed unless this processing is done under one of six lawful bases specified by the regulation (consent, contract, public task, vital interest, legitimate interest or legal requirement). When the processing is based on consent the data subject has the right to revoke it at any time.

Controllers must notify Supervising Authorities (SA)s of a personal data breach within 72 hours of learning of the breach.

California Consumer Privacy Act (CCPA) 2019

consumer rights relating to the access to, deletion of, and sharing of personal information that is collected by businesses. 

Allows consumers to know whether their personal data is sold or disclosed , to whom .

Allows opt-out right for sales of personal information

Right to deletion – to request a business to delete any personal information about a consumer collected from that consumer

Personal Data Protection Bill (PDP) – India 2018

This bill introduces various private and sensitive protection frameworks  like restriction on retention of personal data, Right to correction and erasure (such as right to be forgotten) , Prohibition and transparency of processing of personal data. It also classifies data fiduciaries  including certain social media intermediaries. 

The Bill amends the Information Technology Act, 2000 to delete the provisions related to compensation payable by companies for failure to protect personal data.

Other data privacy acts similar to GDPR 

  • South Korea’s Personal Information Protection Act  2011
  • Brazil’s Lei Geral de Proteçao de Dados (LGPD)  2020
  • Privacy Amendment (Notifiable Data Breaches) to Australia’s Privacy Act 2018
  • Japan’s Act on Protection of Personal Information 2017
  • Thailand Personal Data Protection Act (PDPA) 2020

Features offered by VOIP companies for Data privacy 

  • Access Control & Logging
  • Auto Data Redaction / Account Deletion policy 
  • SIEM (Security information and event management) alerts 
  • Information security , Encrypted Storage For Recordings & Transcripts
  • Disclosing all third party services that are involved in data processing too
  • Role Based Access Control and 2 Factor Authentication
  • Data Security Audits and appointing  data protection officer to oversee GDPR compliance

Against Robocalls and SPIT ( SPAM over Internet Telephony)

 2009 Truth in Caller ID Act 

Telephone Consumer Protection Act of 1991

Implementation of Do not call registry against use of robocalls, automatic dialers, and other methods of communication

Do-Not-Call Implementation Act of 2003

if a business has an established relationship with a customer, it can continue to call them for up to 18 months. If a consumer calls the company, say, to ask for information about the product or service, the company has three months to get back to him.

if the customer asks to not receive calls, the company must stop calling, or be subject to fines.

Exemptions – Calls from a not-for-profit B organisation , informational messages as flight cancellations , Calls from sales and debt collectors etc

Personal Data Privacy and Security Act 2009

Implemented to curb  identity theft and computer hacking. Sensitive personal identifiable information includes : victim’s name, social security number, home address, fingerprint/biometrics data, date of birth, and bank account numbers.

Any company that is breached must notify the affected individuals by mail, telephone, or email, and the message must include information on the company and how to get in touch with credit reporting agencies

If the breach involves government or national security , company must also contact the Secret Service within fourteen days 

TRACED Act (Telephone Robocall Abuse Criminal Enforcement and Deterrence) 2019

Canadian Radio-television and Telecommunications Commission (CRTC) 2018 -32

A solution mechanism has already been standardised and active in adoption called STIR / SHAKEN ( Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs) described in another article here.

Emergency services 

FCC E911 E911 / VoIP E911 rules

Unlike traditional telephone connections, which are tied to a physical location, VOIP’s packet switched technology allows a particular number to be anywhere making it more difficult for it to reach localised services like emergency numbers of Public Safety Answering Points (PSAPs) . Thus FCC regulations as well as the New and Emerging Technologies 911 Improvement Act of 2008 (NET 911 Act), interconnected VoIP providers are required to provide 911 and E911 service. 

Ref : 



  • Media Capture and streams
  • Peer to peer Connection-
    • RTCPeerConnection, RTCConfiguration, ICE, Offer/Answer, states
  • RTP Media API
    • RTCRtpSender
    • RTCRtpReceiver
    • RTCRtpTransreciver
  • RTCDtlsTransport
  • RTCIceCandidate
    • ICE gathering
  • RTCIceTransport Interface
  • Peer-to-peer Data API
  • Peer-to-peer DTMF
  • Statistics

Peer-to-peer connections

creates p2p communication channel

RTCConfiguration Dictionary

dictionary RTCConfiguration {
  sequence<RTCIceServer> iceServers;
  RTCIceTransportPolicy iceTransportPolicy;
  RTCBundlePolicy bundlePolicy;
  RTCRtcpMuxPolicy rtcpMuxPolicy;
  DOMString peerIdentity;
  sequence<RTCCertificate> certificates;
  [EnforceRange] octet iceCandidatePoolSize = 0;

RTCIceCredentialType Enum

enum RTCIceCredentialType {

supports OAuth 2.0 based authentication. The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection setConfiguration method to periodically refresh the TURN credentials.

RTCOAuthCredential Dictionary
describe the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server

dictionary RTCOAuthCredential {
  required DOMString macKey;
  required DOMString accessToken;

RTCIceServer Dictionary

Describe the STUN and TURN servers that can be used by the ICE Agent to establish a connection with a peer.

dictionary RTCIceServer { required (DOMString or sequence<DOMString>) urls; DOMString username; (DOMString or RTCOAuthCredential) credential; RTCIceCredentialType credentialType = "password"; };

Example :

 [{urls: 'stun:stun1.example.net'},
  {urls: ['turns:turn.example.org', 'turn:turn.example.net'],
    username: 'user',
    credential: 'myPassword',
    credentialType: 'password'},
  {urls: 'turns:turn2.example.net',
    username: '22BIjxU93h/IgwEb',
    credential: {
      macKey: 'WmtzanB3ZW9peFhtdm42NzUzNG0=',
      accessToken: 'AAwg3kPHWPfvk9bDFL936wYvkoctMADzQ5VhNDgeMR3+ZlZ35byg972fW8QjpEl7bx91YLBPFsIhsxloWcXPhA=='
    credentialType: 'oauth'}

RTCIceTransportPolicy Enum

ICE candidate policy [JSEP] to select candidates for the ICE connectivity checks

  • relay – use only media relay candidates such as candidates passing through a TURN server. It prevents the remote endpoint/unknown caller from learning the user’s IP addresses
  • all – ICE Agent can use any type of candidate when this value is specified.

RTCBundlePolicy Enum

  • balanced – Gather ICE candidates for each media type (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
  • max-compat – Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
  • max-bundle – Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

If the value of configuration.bundlePolicy is set and its value differs from the connection’s bundle policy, throw an InvalidModificationError.

RTCRtcpMuxPolicy Enum

what ICE candidates are gathered to support non-multiplexed RTCP.

  • negotiate – Gather ICE candidates for both RTP and RTCP candidates. If the remote-endpoint is capable of multiplexing RTCP, multiplex RTCP on the RTP candidates. If it is not, use both the RTP and RTCP candidates separately.
  • require – Gather ICE candidates only for RTP and multiplex RTCP on the RTP candidates. If the remote endpoint is not capable of rtcp-mux, session negotiation will fail.

If the value of configuration.rtcpMuxPolicy is set and its value differs from the connection’s rtcpMux policy, throw an InvalidModificationError. If the value is “negotiate” and the user agent does not implement non-muxed RTCP, throw a NotSupportedError.

Offer/Answer Options – VoiceActivityDetection

dictionary RTCOfferAnswerOptions {
  boolean voiceActivityDetection = true;

capable of detecting “silence”

dictionary RTCOfferOptions : RTCOfferAnswerOptions {
  boolean iceRestart = false;
dictionary RTCAnswerOptions : RTCOfferAnswerOptions {};

An RTCPeerConnection object has a signaling state, a connection state, an ICE gathering state, and an ICE connection state.

RTCSignalingState Enum


RTCIceGatheringState Enum


RTCIceConnectionState Enum


An RTCPeerConnection object has an operations chain which ensures that only one asynchronous operation in the chain executes concurrently.

Also an RTCPeerConnection object MUST not be garbage collected as long as any event can cause an event handler to be triggered on the object. When the object’s internal slot is true ie closed, no such event handler can be triggered and it is therefore safe to garbage collect the object.

RTCPeerConnection Interface

interface RTCPeerConnection : EventTarget {

Promise<RTCSessionDescriptionInit> createOffer(optional RTCOfferOptions options);

Promise<RTCSessionDescriptionInit> createAnswer(optional RTCAnswerOptions options);

Promise<void> setLocalDescription(optional RTCSessionDescriptionInit description);
readonly attribute RTCSessionDescription? localDescription;
readonly attribute RTCSessionDescription? currentLocalDescription;
readonly attribute RTCSessionDescription? pendingLocalDescription;
Promise<void> setRemoteDescription(optional RTCSessionDescriptionInit description);
readonly attribute RTCSessionDescription? remoteDescription;
readonly attribute RTCSessionDescription? currentRemoteDescription;
readonly attribute RTCSessionDescription? pendingRemoteDescription;

Promise<void> addIceCandidate(optional RTCIceCandidateInit candidate);
readonly attribute RTCSignalingState signalingState;
readonly attribute RTCIceGatheringState iceGatheringState;
readonly attribute RTCIceConnectionState iceConnectionState;
readonly attribute RTCPeerConnectionState connectionState;
readonly attribute boolean? canTrickleIceCandidates;
void restartIce();
static sequence<RTCIceServer> getDefaultIceServers();

RTCConfiguration getConfiguration();
void setConfiguration(RTCConfiguration configuration);
void close();

attribute EventHandler onnegotiationneeded;
attribute EventHandler onicecandidate;
attribute EventHandler onicecandidateerror;
attribute EventHandler onsignalingstatechange;
attribute EventHandler oniceconnectionstatechange;
attribute EventHandler onicegatheringstatechange;
attribute EventHandler onconnectionstatechange;

CreateOffer() – generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including

  • descriptions of the local MediaStreamTracks attached to this RTCPeerConnection,
  • codec/RTP/RTCP capabilities
  • ICE agent (usernameFragment, password , local candiadtes etc )
  • DTLS connection
var pc = new RTCPeerConnection();
     mandatory: {
        OfferToReceiveAudio: true,
        OfferToReceiveVideo: true
    optional: [{
        VoiceActivityDetection: false
}).then(function(offer) {
	return pc.setLocalDescription(offer);
.then(function() {
// Send the offer to the remote through signaling server

CreateAnswer() – generates an SDPanswer with the supported configuration for the session that is compatible with the parameters in the remote configuration

var pc = new RTCPeerConnection();
  OfferToReceiveAudio: true
  OfferToReceiveVideo: true
.then(function(answer) {
  return pc.setLocalDescription(answer);
.then(function() {
  // Send the answer to the remote through signaling server

Codec preferences of an m= section’s associated transceiver is said to be the value of the RTCRtpTranceiver with the following filtering applied

  • If direction is “sendrecv”, exclude any codecs not included in the intersection of RTCRtpSender.getCapabilities(kind).codecs and RTCRtpReceiver.getCapabilities(kind).codecs.
  • If direction is “sendonly”, exclude any codecs not included in RTCRtpSender.getCapabilities(kind).codecs.
  • If direction is “recvonly”, exclude any codecs not included in RTCRtpReceiver.getCapabilities(kind).codecs.

Legacy Interface Extensions

partial interface RTCPeerConnection {
Promise<void> createOffer(
 RTCSessionDescriptionCallback successCallback,
 RTCPeerConnectionErrorCallback failureCallback,
 optional RTCOfferOptions options);
Promise<void> setLocalDescription(
optional RTCSessionDescriptionInit description,
VoidFunction successCallback,                                  RTCPeerConnectionErrorCallback failureCallback);

Promise<void> createAnswer(
RTCSessionDescriptionCallback successCallback,
RTCPeerConnectionErrorCallback failureCallback);
Promise<void> setRemoteDescription(
optional RTCSessionDescriptionInit description,
VoidFunction successCallback,
                                     RTCPeerConnectionErrorCallback failureCallback);
Promise<void> addIceCandidate(
RTCIceCandidateInit candidate,
VoidFunction successCallback,
RTCPeerConnectionErrorCallback failureCallback);

Session Description Model

enum RTCSdpType {
interface RTCSessionDescription {
  readonly attribute RTCSdpType type;
  readonly attribute DOMString sdp;
  [Default] object toJSON();
dictionary RTCSessionDescriptionInit {
  RTCSdpType type;
  DOMString sdp = "";

Priority and QoS Model which can be



Send and receive MediaStreamTracks over a peer-to-peer connection.
Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.

The actual encoding and transmission of MediaStreamTracks is managed through objects called RTCRtpSenders. Similarly, the reception and decoding of MediaStreamTracks is managed through objects called RTCRtpReceivers. These are associated with one track.

RTCRtpTransceivers are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack(), or explicitly when the application uses the addTransceiver(). They are also created when a remote description is applied that includes a new media description.

rtpTransceiver = RTCPeerConnection.addTransceiver(trackOrKind, init);

trackOrKind should be either audio or video othereise a TypeError is thrown

init is optiona . It can contain direction , sendEncodings , streams

RTCPeerConnection Interface

partial interface RTCPeerConnection {
sequence<RTCRtpSender> getSenders();
sequence<RTCRtpReceiver> getReceivers();
sequence<RTCRtpTransceiver> getTransceivers();

RTCRtpSender addTrack(MediaStreamTrack track, MediaStream... streams);
void removeTrack(RTCRtpSender sender);
RTCRtpTransceiver addTransceiver((MediaStreamTrack or DOMString) trackOrKind, optional RTCRtpTransceiverInit init);
attribute EventHandler ontrack;


dictionary RTCRtpTransceiverInit {
  RTCRtpTransceiverDirection direction = "sendrecv";
  sequence<MediaStream> streams = [];
  sequence<RTCRtpEncodingParameters> sendEncodings = [];

RTCRtpTransceiverDirection can be either of


RTCRtpSender Interface

Allows an application to control how a given MediaStreamTrack is encoded and transmitted to a remote peer.

interface RTCRtpSender {
readonly attribute MediaStreamTrack? track;
readonly attribute RTCDtlsTransport? transport;
readonly attribute RTCDtlsTransport? rtcpTransport;
static RTCRtpCapabilities? getCapabilities(DOMString kind);
Promise<void> setParameters(RTCRtpSendParameters parameters);
RTCRtpSendParameters getParameters();
Promise<void> replaceTrack(MediaStreamTrack? withTrack);
void setStreams(MediaStream... streams);
Promise<RTCStatsReport> getStats();

RTCRtpParameters Dictionary

dictionary RTCRtpParameters {
  required sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
  required RTCRtcpParameters rtcp;
  required sequence<RTCRtpCodecParameters> codecs;

RTCRtpSendParameters Dictionary

dictionary RTCRtpSendParameters : RTCRtpParameters {
  required DOMString transactionId;
  required sequence<RTCRtpEncodingParameters> encodings;
  RTCDegradationPreference degradationPreference = "balanced";
  RTCPriorityType priority = "low";

RTCRtpReceiveParameters Dictionary

dictionary RTCRtpReceiveParameters : RTCRtpParameters {
  required sequence<RTCRtpDecodingParameters> encodings;

RTCRtpCodingParameters Dictionary

dictionary RTCRtpCodingParameters {
  DOMString rid;

RTCRtpDecodingParameters Dictionary

dictionary RTCRtpDecodingParameters : RTCRtpCodingParameters {};

RTCRtpEncodingParameters Dictionary

dictionary RTCRtpEncodingParameters : RTCRtpCodingParameters {
  octet codecPayloadType;
  RTCDtxStatus dtx;
  boolean active = true;
  unsigned long ptime;
  unsigned long maxBitrate;
  double maxFramerate;
  double scaleResolutionDownBy;

RTCDtxStatus Enum

disabled- Discontinuous transmission is disabled.
enabled- Discontinuous transmission is enabled if negotiated.

RTCDegradationPreference Enum

enum RTCDegradationPreference {

RTCRtcpParameters Dictionary

dictionary RTCRtcpParameters {
  DOMString cname;
  boolean reducedSize;

RTCRtpHeaderExtensionParameters Dictionary

dictionary RTCRtpHeaderExtensionParameters {
  required DOMString uri;
  required unsigned short id;
  boolean encrypted = false;

RTCRtpCodecParameters Dictionary

dictionary RTCRtpCodecParameters {
  required octet payloadType;
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;

payloadType – identify this codec.
mimeType – codec MIME media type/subtype. Valid media types and subtypes are listed in [IANA-RTP-2]
clockRate – expressed in Hertz
channels – number of channels (mono=1, stereo=2).
sdpFmtpLine – “format specific parameters” field from the “a=fmtp” line in the SDP corresponding to the codec

RTCRtpCapabilities Dictionary

dictionary RTCRtpCapabilities {
  required sequence<RTCRtpCodecCapability> codecs;
  required sequence<RTCRtpHeaderExtensionCapability> headerExtensions;

RTCRtpCodecCapability Dictionary

dictionary RTCRtpCodecCapability {
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;

RTCRtpHeaderExtensionCapability Dictionary

dictionary RTCRtpHeaderExtensionCapability {
  DOMString uri;

Example JS code to RTCRtpCapabilities

const pc = new RTCPeerConnection();
const transceiver = pc.addTransceiver('audio');
const capabilities = RTCRtpSender.getCapabilities('audio');

Output :

codecs: Array(13)
0: {channels: 2, clockRate: 48000, mimeType: "audio/opus", sdpFmtpLine: "minptime=10;useinbandfec=1"}
1: {channels: 1, clockRate: 16000, mimeType: "audio/ISAC"}
2: {channels: 1, clockRate: 32000, mimeType: "audio/ISAC"}
3: {channels: 1, clockRate: 8000, mimeType: "audio/G722"}
4: {channels: 1, clockRate: 8000, mimeType: "audio/PCMU"}
5: {channels: 1, clockRate: 8000, mimeType: "audio/PCMA"}
6: {channels: 1, clockRate: 32000, mimeType: "audio/CN"}
7: {channels: 1, clockRate: 16000, mimeType: "audio/CN"}
8: {channels: 1, clockRate: 8000, mimeType: "audio/CN"}
9: {channels: 1, clockRate: 48000, mimeType: "audio/telephone-event"}
10: {channels: 1, clockRate: 32000, mimeType: "audio/telephone-event"}
11: {channels: 1, clockRate: 16000, mimeType: "audio/telephone-event"}
12: {channels: 1, clockRate: 8000, mimeType: "audio/telephone-event"}
length: 13

headerExtensions: Array(6)
0: {uri: "urn:ietf:params:rtp-hdrext:ssrc-audio-level"}
1: {uri: "http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time"}
2: {uri: "http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01"}
3: {uri: "urn:ietf:params:rtp-hdrext:sdes:mid"}
4: {uri: "urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id"}
5: {uri: "urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id"}
length: 6

RTCRtpReceiver Interface

allows an application to inspect the receipt of a MediaStreamTrack.

interface RTCRtpReceiver {
  readonly attribute MediaStreamTrack track;
  readonly attribute RTCDtlsTransport? transport;
  readonly attribute RTCDtlsTransport? rtcpTransport;
  static RTCRtpCapabilities? getCapabilities(DOMString kind);
  RTCRtpReceiveParameters getParameters();
  sequence<RTCRtpContributingSource> getContributingSources();
  sequence<RTCRtpSynchronizationSource> getSynchronizationSources();
  Promise<RTCStatsReport> getStats();

dictionary RTCRtpContributingSource

dictionary RTCRtpContributingSource {
  required DOMHighResTimeStamp timestamp;
  required unsigned long source;
  double audioLevel;
  required unsigned long rtpTimestamp;

dictionary RTCRtpSynchronizationSource

dictionary RTCRtpSynchronizationSource : RTCRtpContributingSource {
  boolean voiceActivityFlag;

voiceActivityFlag of type boolean – Only present for audio receivers. Whether the last RTP packet, delivered from this source, contains voice activity (true) or not (false).

RTCRtpTransceiver Interface

Each SDP media section describes one bidirectional SRTP (“Secure Real Time Protocol”) stream. RTCRtpTransceiver describes this permanent pairing of an RTCRtpSender and an RTCRtpReceiver, along with some shared state. It is uniquely identified using its mid property.

Thus it is combination of an RTCRtpSender and an RTCRtpReceiver that share a common mid. An associated transceiver( with mid) is one that’s represented in the last applied session description.

interface RTCRtpTransceiver {

readonly attribute DOMString? mid;
[SameObject] readonly attribute RTCRtpSender sender;
[SameObject] readonly attribute RTCRtpReceiver receiver;

attribute RTCRtpTransceiverDirection direction;
readonly attribute RTCRtpTransceiverDirection? currentDirection;

void stop();

void setCodecPreferences(sequence<RTCRtpCodecCapability> codecs);

Method stop() – Irreversibly marks the transceiver as stopping, unless it is already stopped. This will immediately cause the transceiver’s sender to no longer send, and its receiver to no longer receive.
stopping transceiver will cause future calls to createOffer to generate a zero port in the media description for the corresponding transceiver and stopped transceiver will cause future calls to createOffer or createAnswer to generate a zero port in the media description for the corresponding transceiver

Methods setCodecPreferences() – overrides the default codec preferences used by the user agent.

Example setting codec Preferebec for OPUS in audio

peer = new RTCPeerConnection();    
const transceiver = peer.addTransceiver('audio');
const audiocapabilities = RTCRtpSender.getCapabilities('audio');
let codec = [];

Before setting codec preference for OPUS

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000

After setting codec preference for OPUS audio

m=audio 9 UDP/TLS/RTP/SAVPF 111
c=IN IP4
a=rtcp:9 IN IP4

a=msid:hcgvWcGG7WhdzboWk79q39NiO8xkh4ArWhbM f15d77bb-7a6f-4f41-80cd-51a3c40de7b7
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1


Access to information about the Datagram Transport Layer Security (DTLS) transport over which RTP and RTCP packets are sent and received by RTCRtpSender and RTCRtpReceiver objects, as well other data such as SCTP packets sent and received by data channels.
Each RTCDtlsTransport object represents the DTLS transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

interface RTCDtlsTransport : EventTarget {
  [SameObject] readonly attribute RTCIceTransport iceTransport;
  readonly attribute RTCDtlsTransportState state;
  sequence<ArrayBuffer> getRemoteCertificates();
  attribute EventHandler onstatechange;
  attribute EventHandler onerror;

RTCDtlsTransportState Enum

“new”- DTLS has not started negotiating yet.
“connecting” – DTLS is in the process of negotiating a secure connection and verifying the remote fingerprint.
“connected”- DTLS has completed negotiation of a secure connection and verified the remote fingerprint.
“closed” – transport has been closed intentionally like close_notify alert, or calling close().
“failed” – transport has failed as the result of an error like failure to validate the remote fingerprint

RTCDtlsFingerprint dictionary

dictionary RTCDtlsFingerprint {
  DOMString algorithm;
  DOMString value;

Protocols multiplexed with RTP (e.g. data channel) share its component ID. This represents the component-id value 1 when encoded in candidate-attribute while ICE candadte for RTCP has component-id value 2 when encoded in candidate-attribute.


The track event uses the RTCTrackEvent interface.

interface RTCTrackEvent : Event {
  readonly attribute RTCRtpReceiver receiver;
  readonly attribute MediaStreamTrack track;
  [SameObject] readonly attribute FrozenArray<MediaStream> streams;
  readonly attribute RTCRtpTransceiver transceiver;

dictionary RTCTrackEventInit

dictionary RTCTrackEventInit : EventInit {
  required RTCRtpReceiver receiver;
  required MediaStreamTrack track;
  sequence<MediaStream> streams = [];
  required RTCRtpTransceiver transceiver;


This interface candidate Internet Connectivity Establishment (ICE) configuration used to setup RTCPeerconnection. To facilitate routing of media on given peer connection, both endpoints exchange several candidates and then one candidate out of the lot is chosen which will be then used to initiate the connection.

  • candidate – transport address for the candidate that can be used for connectivity checks.
  • component – candidate is an RTP or an RTCP candidate
  • foundation – unique identifier that is the same for any candidates of the same type , helps optimize ICE performance while prioritizing and correlating candidates that appear on multiple RTCIceTransport objects.
  • ip , port
  • priority
  • protocol – tcp/udp
  • relatedAddress , relatedPort
  • sdpMid – candidate’s media stream identification tag
  • sdpMLineIndex

usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).

Interfaces for Connectivity Establishment

describes ICE candidates

interface RTCIceCandidate {
  readonly attribute DOMString candidate;
  readonly attribute DOMString? sdpMid;
  readonly attribute unsigned short? sdpMLineIndex;
  readonly attribute DOMString? foundation;
  readonly attribute RTCIceComponent? component;
  readonly attribute unsigned long? priority;
  readonly attribute DOMString? address;
  readonly attribute RTCIceProtocol? protocol;
  readonly attribute unsigned short? port;
  readonly attribute RTCIceCandidateType? type;
  readonly attribute RTCIceTcpCandidateType? tcpType;
  readonly attribute DOMString? relatedAddress;
  readonly attribute unsigned short? relatedPort;
  readonly attribute DOMString? usernameFragment;
  RTCIceCandidateInit toJSON();

RTCIceProtocol can be either tcp or udp

TCP candidate type which can be either of

  • active – An active TCP candidate is one for which the transport will attempt to open an outbound connection but will not receive incoming connection requests.
  • passive – A passive TCP candidate is one for which the transport will receive incoming connection attempts but not attempt a connection.
  • so – An so candidate is one for which the transport will attempt to open a connection simultaneously with its peer.

UDP candidate type

  • host – actual direct IP address of the remote peer
  • srflx – server reflexive ,  generated by a STUN/TURN server
  • prflx – peer reflexive ,IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE
  • relay – generated using TURN

ICE Candidate UDP Host

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:27784895 1 udp 2122260223 51577 typ host generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:27784895 1 udp 2122260223 51382 typ host generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:27784895 1 udp 2122260223 53600 typ host generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate TCP Host

Notice TCP host camdidates for mid 0 , 1 and 3 for video , audio and data media types

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate UDP Srflx

Notice 3 candidates for 3 streams sdpMid 0,1 and 2

sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:2163208203 1 udp 1686052607 27177 typ srflx raddr rport 53600 generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:2163208203 1 udp 1686052607 27176 typ srflx raddr rport 51382 generation 0 ufrag muSq network-id 1 network-cost 10
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2163208203 1 udp 1686052607 27175 typ srflx raddr rport 51577 generation 0 ufrag muSq network-id 1 network-cost 10

ICE Candidate (host)

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2880323124 1 udp 2122260223 61622 typ host generation 0 ufrag jsPO network-id 1 network-cost 10

usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).



interface RTCPeerConnectionIceEvent : Event {
  readonly attribute RTCIceCandidate? candidate;
  readonly attribute DOMString? url;


interface RTCPeerConnectionIceErrorEvent : Event {
  readonly attribute DOMString hostCandidate;
  readonly attribute DOMString url;
  readonly attribute unsigned short errorCode;
  readonly attribute USVString errorText;

RTCIceTransport Interface

Access to information about the ICE transport over which packets are sent and received. Each RTCIceTransport object represents the ICE transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

interface RTCIceTransport : EventTarget {
  readonly attribute RTCIceRole role;
  readonly attribute RTCIceComponent component;
  readonly attribute RTCIceTransportState state;
  readonly attribute RTCIceGathererState gatheringState;
  sequence<RTCIceCandidate> getLocalCandidates();
  sequence<RTCIceCandidate> getRemoteCandidates();
  RTCIceCandidatePair? getSelectedCandidatePair();
  RTCIceParameters? getLocalParameters();
  RTCIceParameters? getRemoteParameters();
  attribute EventHandler onstatechange;
  attribute EventHandler ongatheringstatechange;
  attribute EventHandler onselectedcandidatepairchange;

RTCIceParameters Dictionary

dictionary RTCIceParameters {
  DOMString usernameFragment;
  DOMString password;

RTCIceCandidatePair Dictionary

dictionary RTCIceCandidatePair {
  RTCIceCandidate local;
  RTCIceCandidate remote;

RTCIceGathererState Enum


RTCIceTransportState Enum

  • “new” – ICE agent is gathering addresses or is waiting to be given remote candidates 
  • “checking” –
  • “connected” – Found a working candidate pair, but still performing connectivity checks to find a better one.
  • “completed” – Found a working candidate pair and done performing connectivity checks.
  • “disconnected”,
  • “failed”,
  • “closed”

RTCIceRole Enum

“unknown”, // agent who role is not yet defined
“controlling”, // controlling agent
“controlled” // controlled agent

RTCIceComponent Enum

“rtp”, // ICE Transport is used for RTP (or RTCP multiplexing)
“rtcp” // ICE Transport is used for RTCP

Peer-to-peer Data API


Peer-to-peer DTMF


Statistics Model

The browser maintains a set of statistics for monitored objects, in the form of stats objects.
A group of related objects may be referenced by a selector( like MediaStreamTrack that is sent or received by the RTCPeerConnection).

Statistics API extends the RTCPeerConnection interface

partial interface RTCPeerConnection {
  Promise<RTCStatsReport> getStats(optional MediaStreamTrack? selector = null);
  attribute EventHandler onstatsended;

Method getStats()- Gathers stats for the given selector and reports the result asynchronously.

RTCStatsReport Object

map between strings that identify the inspected objects (id attribute in RTCStats instances), and their corresponding RTCStats-derived dictionaries.

interface RTCStatsReport {
  readonly maplike<DOMString, object>;

RTCStats Dictionary

stats object constructed by inspecting a specific monitored object.

dictionary RTCStats {
  required DOMHighResTimeStamp timestamp;
  required RTCStatsType type;
  required DOMString id;


Constructor (DOMString type, RTCStatsEventInit eventInitDict)]

interface RTCStatsEvent : Event {
  readonly attribute RTCStatsReport report;

dictionary RTCStatsEventInit

dictionary RTCStatsEventInit : EventInit {
  required RTCStatsReport report;


  • RTCRTPStreamStats, attributes ssrc, kind, transportId, codecId
  • RTCReceivedRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes packetsReceived, packetsLost, jitter, packetsDiscarded
  • RTCInboundRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes trackId, receiverId, remoteId, framesDecoded, nackCount
  • RTCRemoteInboundRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes localId, bytesReceived, roundTripTime
  • RTCSentRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes packetsSent, bytesSent
  • RTCOutboundRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes trackId, senderId, remoteId, framesEncoded, nackCount
  • RTCRemoteOutboundRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes localId, remoteTimestamp
  • RTCPeerConnectionStats, with attributes dataChannelsOpened, dataChannelsClosed
  • RTCDataChannelStats, with attributes label, protocol, dataChannelIdentifier, state, messagesSent, bytesSent, messagesReceived, bytesReceived
  • RTCMediaStreamStats, with attributes streamIdentifer, trackIds
  • RTCMediaHandlerStats with attributes trackIdentifier, ended
  • RTCSenderVideoTrackAttachmentStats, with all required attributes from its inherited dictionaries
  • RTCSenderAudioTrackAttachmentStats, with all required attributes from its inherited dictionaries
  • RTCAudioHandlerStats with attribute audioLevel
  • RTCVideoHandlerStats with attributes frameWidth, frameHeight, framesPerSecond
  • RTCVideoSenderStats with attribute framesSent
  • RTCVideoReceiverStats with attributes framesReceived, framesDecoded, framesDropped, partialFramesLost
  • RTCCodecStats, with attributes payloadType, codecType, mimeType, clockRate, channels, sdpFmtpLine
  • RTCTransportStats, with attributes bytesSent, bytesReceived, rtcpTransportStatsId, selectedCandidatePairId, localCertificateId, remoteCertificateId
  • RTCIceCandidatePairStats, with attributes transportId, localCandidateId, remoteCandidateId, state, priority, nominated, bytesSent, bytesReceived, totalRoundTripTime, currentRoundTripTime
  • RTCIceCandidateStats, with attributes address, port, protocol, candidateType, url
  • RTCCertificateStats, with attributes fingerprint, fingerprintAlgorithm, base64Certificate, issuerCertificateId

Ref :


To understand the need for implementing an identification verification technique in Internet protocol based network to network communication system , we need to evaluate the existing problem plaguing the VoIP setup .

What is Call ID spoofing ? 

Vulnerability of existing interconnection phone system which is used by robo-callers to mask their identity or to make it appear the call is from a legitimate source, usually originates from voice-over-IP (VOIP) systems.

In this context understand the Caller Line identification CLI/ NCLI techniques used by VoIP and OTT( over the top) providers today.

CLI (Caller Line Identification)

If call goes out on a CLI route ( White Route ) the received party will likely see your callerID information

  • Lawful – Termination is legal on the remote end ie abiding country’s telco infrastructure and stable
  • Expensive – usually with direct or via leased line (TDM) interconnections with the tier-1 carriers.

Non-CLI (Non-Caller Line Identification)

The Caller ID is not visible at the call
If call goes out on a Non-CLI route (Grey Route) goes out on a non-CLI routes they will see either a blocked call or some generic number.

  • Unlawful – questionable legality or maybe violating some providers AUP(Acceptable Use Policy ) on the remote end.
  • Cheaper – low quality , usually via VoIP-GSM gateways

Example include robocalls , tele-marketting / spam etc which are unwilling to share their Caller Id for call receiver, to not be blocked or cancelled.

To overcome the problem of non-verifiable spam , robocalls a suite of protocols and procedures are proposed that can combat caller ID spoofing on VOIP and connected public telephone networks.


Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs

Used by robocallers to mask their identity or to make it appear the call is from a legitimate source
usually orignates from voice-over-IP (VOIP) systems


Standards developed by the Internet Engineering Task Force (IETF) 

For telecommunication service providers implement  certificate management system to create and manage the public and private keys, digital certificates used to sign and verify Caller ID details. 

Adds information to the SIP headers that allow the endpoints along the system to positively identify the origin of the data , such as JSON web tokens encrypted with the provider’s private key, encoded using Base64,

There are three levels of verification, or “attestation”

  • A : Full Attestation
    indicates that the provider recognizes the entire phone number as being registered with the originating subscriber.
  • B : Partial Attestation
    call originated with a known customer but the entire number cannot be verified,
  • C : Gateway Attestation
    call can only be verified as coming from a known gateway

How can the Public Key Infrastructure be used ? 

In an interconnection network , each telephone service provider will obtain its digital certificate from a certificate authority (CA)  that is trusted by other telephone service providers. Calling party signs the SIP Header  caller ID as legitimate . The called party verifies that the calling number is authentic


Originating service provider’s encrypted SIP Identity Header includes the following data:

  1. Attestation level
  2. Date and Time
  3. Calling and Called Numbers
  4. Orig ID for analytics and/or traceback purposes among others
  5. Location of certificate repository
  6. Signature
  7. Encryption algorithm

FCC has also assigned the role of a Secure Telephone Identity Policy Administrator (STI-PA) which oversees that CAs do not provide certificate to spoofing robocallers and enforce the framework for STIR /SHAKEN .

Sample Identity header in SIP requst

INVITE sip:bob@biloxi.example.org SIP/2.0
Via: SIP/2.0/TLS pc33.atlanta.example.com;branch=z9hG4bKnashds8
To: Bob
From: Alice ;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Max-Forwards: 70
Date: Thu, 21 Feb 2002 13:02:03 GMT
Identity-Info: https://atlanta.example.com/atlanta.cer;alg=rsa-sha1
Content-Type: application/sdp
Content-Length: 147

o=UserA 2890844526 2890844526 IN IP4 pc33.atlanta.example.com
s=Session SDP
c=IN IP4 pc33.atlanta.example.com
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000


STIR is based on the SIP protocol and is designed to work with calls being routed through a VOIP network. Since traditional endpoints like POTS and SS7 networks also should be covered under this call authenticity framework , SHAKEN was developed to manage call via IP-to-telephone gateways .

Developed by the Alliance of Telecommunications Industry Solutions (ATIS)

Working Steps  :

  1. When a call is initiated, a SIP INVITE is received by the originating service provider.
  2. Originating service provider verifies the call source and number to determine how to confirm validity.
    1. Full Attestation (A) — The service provider authenticates the calling party AND confirms they are authorized to use this number. An example would be a registered subscriber.
    2. Partial Attestation (B) — The service provider verifies the call origination but cannot confirm that the call source is authorized to use the calling number. An example would be a calling number from behind an enterprise PBX.
    3. Gateway Attestation (C) — The service provider authenticates the call’s origin but cannot verify the source. An example would be a call received from an international gateway.
  3. Create a SIP Identity header that contains information on the calling number, called number, attestation level, and call origination, along with the certificate thus caller ID “signed” as legitimate
  4. SIP INVITE with the SIP Identity header with the certificate is sent to the destination service provider.
  5. Destination service provider verifies the identity of the header and certificate.

Diagrammatic depiction of flow of how Telecom carriers to digitally validates authenticity before receiving or handoff through their network



Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1


MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
generic coding of moving pictures and associated audio information
combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

better than MPEG 1

evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.


  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:


Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
introduced in 2004

Better than MPEG2

40-50% bit rate reduction compared to MPEG-2

Support Up to 4K (4,096×2,304) and 59.94 fps
21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )


  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat


High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC
video compression standard designed to substantially improve coding efficiency
stream high-quality videos in congested network environments or bandwidth constrained mobile networks
Jan 2013
product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

better than H264

overcome shortage of bandwidth, spectrum, storage
bandwidth savings of approx. 45% over H.264 encoded content

resolutions up to 8192×4320, including 8K UHD
Supports up to 300 fps
3 approved profiles, draft for additional 5 ; 13 levels
Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

Enhanced Hybrid spatial-temporal prediction model
CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures

Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
Individual rectangular regions that divide the image are independent

Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.

Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
Entropy coding is only CABAC


  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4                                              ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   libavutil      56. 22.100 / 56. 22.100   libavcodec     58. 35.100 / 58. 35.100   libavformat    58. 20.100 / 58. 20.100   libavdevice    58.  5.100 / 58.  5.100   libavfilter     7. 40.101 /  7. 40.101   libavresample   4.  0.  0 /  4.  0.  0   libswscale      5.  3.100 /  5.  3.100   libswresample   3.  3.100 /  3.  3.100   libpostproc    55.  3.100 / 55.  3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     creation_time   : 2019-06-23T04:58:13.000000Z   Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping:   Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices                              : 1 x265 [info]: frame threads / pool features       : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0 x265 [info]: References / ref-limit  cu / depth  : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     encoder         : Lavf58.20.100     Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1       encoder         : Lavc58.35.100 libx265 frame=  746 fps= 64 q=-0.0 Lsize=     606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x     video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I:      3, Avg QP:27.18  kb/s: 1884.53  x265 [info]: frame P:    179, Avg QP:27.32  kb/s: 523.32   x265 [info]: frame B:    564, Avg QP:35.17  kb/s: 38.69    x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9%  encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25

if you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support


Realtime High quality video encoder
product of product of the Alliance for Open Media (AOM)
Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)

better than H265

AV1 is royalty free and overcomes the patent complexities around H265/HVEC


  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

Audio and Acoustic Signal Processing

Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

Application of audio Signal processing in general

  • storage
  • data compression
  • music information retrieval
  • speech processing ( emotion recognition/sentiment analysis , NLP)
  • localization
  • acoustic detection
  • Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
  • noise cancellation
  • acoustic fingerprinting
  • sound recognition ( speaker Identification , biometric speech verification , voice commands )
  • synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
  • enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

Effects for audio streams processing

  • delay or echo
    To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
    Implemented using tape delays or bucket-brigade devices.
  • flanger
    delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
    signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
  • phaser
    signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
  • chorus
    delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
  • equalization
    frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
    overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
  • pitch shift
    shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
  • time stretching
    changing the speed of an audio signal without affecting its pitch.
  • resonators
    emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
  • modulation
    change the frequency or amplitude of a carrier signal in relation to a predefined signal.
  • compression
    reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
  • 3D audio effects
    place sounds outside the stereo basis
  • reverse echo
    swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
  • wave field synthesis
    spatial audio rendering technique for the creation of virtual acoustic environments

ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

  • Acoustic echo control
    aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
  • Noise control
    microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
  • Gain control
    how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
  • Linear filtering
    ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
  • Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

ASP applications in music playback

AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

  • Post-processing
    techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
  • Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

ASP for virtual assitants

Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

  • Speech enhancement
    multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
  • Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
  • Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

Other areas of ASP

  • Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

Ref :
wikipedia – https://en.wikipedia.org/wiki/Audio_signal_processing
IEEE – https://signalprocessingsociety.org/publications-resources/blog/audio-and-acoustic-signal-processing%E2%80%99s-major-impact-smartphones

JavaScript Session Establishment Protocol (JSEP) in WebRTC handshake

This article is aimed at explaning the intricacies and detailed offer answer flow in webrtc handshake and JSEP . You can read the following artciles on WebRTC as prereq before reading through this one

WebRTC API – Peerconnection , getUserMedia , Datachannel , DataStaats

JSEP (JavaScript Session Establishment Protocol)

JSEP (JavaScript Session Establishment Protocol) is used during signalling via w3c RTCPeerConnectionAPI interface to set up a multimedia session. The multimedia session description specfies the crtical components of setting up a session between local and remote such as transport ports , protcol , profiles . It also handles the intercation with ICE state machine

Offer/Answer Excahange Flow

prereq : Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

  1. Side initiating the session creates a offer by CreateOffer() API
aPromise = myPeerConnection.createOffer([options]);

options is type of RTC Offer Options

  • iceRestart
  • offerToReceiveAudio ( legacy)
  • offerToReceiveVideo ( legacy)
  • voiceActivityDetection

2. The application then stores the offer in local config as setLocalDescriptionAPI()

 myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);

3. Offer is sent to remote side using its choice of signalling ( SIP , WS , HTTP, XMPP .. )

4. Remote party stores it use setRemoteDescription() API

.then(function () {
  return createMyStream();

4. Remote part generates an answer using createAnswer() API

aPromise = RTCPeerConnection.createAnswer([options]);

5. Remote party stores the answer in its local config using setLocalDescription() API

6. Answer is transferred to Initiator side using choice of signalling ( SIP , WS , HTTP, XMPP .. ) again

7. Initiating side stores it use setRemoteDescription() API

Interfaces of webrtc and tracks to stream addition

Process to perform webrtc handshake

Webrtc call setup and incoming call callflow between remote peer , peerconnection actory , peerconnection and application

setup a call
receive a call

Signalling state Transitions on PeerConnection

As the caller initiates a new RTCPeerConnection() , the RTCSignalingState state is “stable” as remote and local descriptions are empty

As the caller initiates call and calls createOffer() , he now has offer SDP and procced to store offer locally with setLocalDescription(offer) the RTCSignalingState state is “have-local-offer” . After than caller send the offer to callee over signalling channel

Simillarily as the calle recives the offer , it starts with RTCSignalingState stable and then proceeds to store the Remote’s offer using setRemoteDescription(offer) , its state is now “have-remote-offer”

The callee generates a provsional answer and for caller and stores it locally , state transitiosn to “have-local-pranswer” . the pranswer SDP is send to caller over signalling channel again .

Caller stores the callee’s pr answer SDP and state updates to “have-remote-pranswer”

Once there is no offer/answer exchange in progress the state again changes to ” stable “.

State schanges to “closed” if RTCpeerConnection is closed

img : https://w3c.github.io/webrtc-pc

Detailed Offer / Answer SDP

Local Offer created by side initiating the session / Caller

The first offfer called initial offer can have dummy date for contact line such as to prevent leaking a local Ip address

c=IN IP4
a=rtcp:9 IN IP4

“o=” line contains <username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

o=- 4445251981417004127 2 IN IP4 127.0.0.

shows username – and 4445251981417004127 as session id. Same username “-” is specified in “s=” line

“t=” line shows <start time> <stop time>

t=0 0

Full session Block example

type: offer, sdp: v=0
o=- 4445251981417004127 2 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2

Media Section : An m= section is generated for each RtpTransceiver that has been added to the PeerConnection. For the initial offer since no ports are available yet , dummy port 9 can be sadded. However if it is bundle only then port value is set to 0. Later the port value will be set to the port value of default ICE candidate.

DTLS filed “UDP/TLS/RTP/SAVPF” is followed by the list of codecs in order of priority.

“c=” line in msection too must be filled with dummy values if IP as no candidates are available yet .




“a=ice-ufrag” , “a=ice-pwd” , “a=fingerprint” , “a=setup” , “a=tls-id”

Media Stream Identification attribute “a-mid:”

For each media format on the m= line, “a=rtpmap” for “rtx” with the clock rate of codec and “a=fmtp” to reference the payload type of the primary codec.  “a=rtcp-fb” specified RTCP feedback

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

Audio Block exmaple

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3968544080 cname:da0nYe1oYR8AvVNp
a=ssrc:3968544080 msid:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2 7525d75c-ffe7-4038-8b71-653d249e63bb
a=ssrc:3968544080 mslabel:DYVK4IA4kA8LvnIYWjXhRzMgSGicnwVutWE2
a=ssrc:3968544080 label:7525d75c-ffe7-4038-8b71-653d249e63bb

// remove video section for simplicity

Data Block is created if data channle has been created with m= section for data.

“a=sctp-port” line referencing the SCTP port number set to 5000

 “a=max-message-size”  set to 262144 here

Data Block example

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 1D:C8:1F:18:D2:AB:B7:68:CC:DC:A8:8D:6B:1D:70:11:06:E9:19:D2:22:CE:A5:F3:BE:82:00:ED:99:58:20:4A

Subsequent Offers

When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processig is different due to gathered ICE candidates . However the <session-version> is not changed .

Additionally m section is updated if RtpTransceiver is added or removed

Each “m=” and c=” line MUST be filled in with the port, relevant RTP profile, and address of the default candidate for the m= section

If the m= section is not bundled into another m= section, update the “a=rtcp” with port and address of RTCP camdidate and add “a=camdidate” with  “a=end-of-candidates” 

Local Answer created by side receiving the session/ Callee

When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer. 

 Each offered m= section will have an associated RtpTransceiver

Remote Destination / Callee can reject the m section by setting port in m line to 0 . It can reject msection if neither of the offered media format are supported , RtpTransceiver is stoopped etc.

For the initial offer the dummy port value of 9 is set as no ICE candudate is avaible yet . Simillarly  “c=” line must contain the “dummy” value “IN IP4” too.

The <proto> field MUST be set to exactly match the <proto> field for the corresponding m= line in the offer.

type: answer, sdp: v=0
o=- 5730481682283561642 3 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT

Audio section

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=msid:KGmQ9mTmvTaWlHTQ0B0YP36QIxOYNeB3i2nT e817fe0f-1cc0-4901-9fd9-e810289cc85d
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3260997313 cname:FxLUKuXrLQe0r1rn

Video section removed for simplicity

Data stream

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 B9:9C:8A:A9:E9:09:0C:FB:52:2A:D3:18:7B:A9:D4:EC:B3:00:77:72:27:51:EC:5F:82:BE:11:7F:C7:CF:43:43

Subsequent Answers

 Port value would normally be set to the port of the default ICE candidate for this m= section. For the exmaple above

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

will be changes with relevant port adress such as

type: offer, sdp: v=0
o=- 6407282338169184323 3 IN IP4
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS bSrCUCFybGovIy0FUhPTZAr9ToRmx8I09nEj
m=audio 55375 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 55375 typ host generation 0 network-id 1 network-c

Simillarly m video and data line will also get ports

m=video 53877 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 53877 typ host generation 0 network-id 1 network-cost 10
m=application 57991 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=candidate:2880323124 1 udp 2122260223 57991 typ host generation 0 network-id 1 network-cost 10

If the answer contains any “a=ice-options” attributes where “trickle” is listed as an attribute, update the PeerConnection canTrickle property to be true. 

Modifying Offer/answer SDP

SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription.
After calling setLocalDescription with an offer or answer, the application MAY modify the SDP to reduce its capabilities before sending it to the far side

Assume we have a MCU at location and want the video stream to relay via a Media Server.

SDP Parsing

SDP is used for session parsing and contians sequence of line with key value pairs. SDP is read, line-by-line, and converted to a data structure that contains the deserialized information.

JSEP SDP bears a lot of simillarity to SIP SDP explained here : SIP and SDP Messages Explained

Session-Level Parsing

Line “v=” , “o=”,”b=” and “a=” are processed . The “i=”, “u=”, “e=”, “p=”, “t=”, “r=”, “z=”, and “k=” lines are not used by this specification; they MUST be checked for syntax but their values are not used. Line “c=” is checked for syntax and ICE mismatch detection

“a= ” attribute could be : “a=group” , “s=”ice-lite” , “a=ice-pwd”, “a=ice-options” , “a=fingerprint”, “a=setup” , a=tls-id”, “a=identity” , “a=extmap”

Media Section Parsing

Line “m=” for media , proto , port , fmt in RTP

Attributes “a=” can be

“a=rtpmap” or “a=fmtp”

map from an RTP payload type number to a media encoding name that identifies the payload format.

a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding parameters>]
m=audio 49230 RTP/AVP 96 97 98
a=rtpmap:96 L8/8000
a=rtpmap:97 L16/8000
a=rtpmap:98 L16/11025/2

“a=ptime” , “a=maxptime”

dierction as  “a=sendrecv” , a=recvonly , a=sendonly , a=inactive

Muxing as “a=rtcp-mux” ,


RTCP attributes “a=rtcp” , “a=rtcp-rsize”

Line “c=” is checked .

Line “b=” for bandiwtdh , bwtype

Attribites for “a=” could be “a=ice-ufrag”, “a=”ice-pwd”, “a=ice-options” , “a=candidate”, “a=remote-candidate” , a=end-of-candidates” and “a=fingerprint”

Semantics Verification

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN
also allows for address selection for multi-homed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

ICE Gathering

Caller and callee performs checks to finalize the protocol and routing needed to establish a peer connection . Number of candudates are proposed till they mutually agree upon one . Peerconnection then uses that candiadte detaisl to initiate the connection .

While Applying a Local Description at the media engine level if m= section is new, WebRTC media stacks begins gathering candidates for it.

RTCPeerconnection specified canTrickleIceCandidates . ICE trickling is the process of continuing to send candidates after the initial offer or answer has already been sent to the other peer.

ICE TransportRole is responsible for Choosing a candidate pair

ICE layer sets one peer as controlling and other as controlled agent . The controling agent makes the final decision as to which candidate pair to choose.

Final selected canduadte in SDP

a=group:BUNDLE 0 1 2
a=msid-semantic: WMS 9Cv3eIelHVuhxrGfxSvUsfokNu4eb4R9PYw2

m=audio 59937 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 x.x.x.x
a=rtcp:9 IN IP4
a=candidate:2880323124 1 udp 2122260223 x.x.x.x 59937 typ host generation 0 network-id 1 network-cost 10
a=candidate:3844981444 1 tcp 1518280447 x.x.x.x 9 typ host tcptype active generation 0 network-id 1 network-cost 10

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    • translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    • addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address

Agent sends the TURN Allocate request from IP address and port X:x,
NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

Only STUN based Binding

agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

Checks : ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first.

TRIGGERED CHECKS – accelerates the process of finding a valid candidate
ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

Checks ensure maintaining frozen candidates and pairs with some foundation for media stream

Each candidate pair in the check list has a foundation and a state. States for candidates pairs

1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.

2. In-Progress: A check has been sent for this pair, but the transaction is in progress.

3. Succeeded: A check for this pair was already done and produced a successful result.

4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.

5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

Example of ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 37542 typ srflx raddr rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 27190 typ relay raddr rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 28049 typ relay raddr rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Examaple Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement
controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. Ways
regular nomination and aggressive nomination


To read More on WebRTC Communication as a platform

WebRTC Media Stack

WebRTC service’s

References :

http://w3c.github.io/webrtc-pc/ WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019
RFC 5245 Inter

Websockets as VOIP signal transport medium

Web resources are usually build on request/response paradigm such as HTTP , SIP messages . This means that server responds only when a client requests it to. This made web intercations very slow and unsuited for VOIP signalling
Long Poll involved repeated polling checks to load new server resources by itself instead of client made explicit request
AJAX and multipart XHR tried to patch the problem by selective reloading however they still required that client perform the mapping for an incomig reply to map to correct request.
However due to overhead latency involved with HTTP transaction and its working mode to open new TCP connetion for every request and reponse and add HTTP headers, none of them were suited to realtime operations

Websocket is the current (2017) most idelistic solution to perform realtime sigalling suited to VOIP requirnments due to its nature os establish a socket .

Websocket Protocol

Enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code.

protocol consists of an opening handshake followed by basic message framing, layered over TCP.
handshake is interpreted by HTTP servers as an Upgrade request.

Secure websocket example :

Request URL: wss://site.com:8084/socket.io/?transport=websocket&sid=hh3Dib_aBWgqyO1IAAEL
Request Method: GET
Status Code: 101 Switching Protocols

Response Headers
Connection: Upgrade
Sec-WebSocket-Accept: UVhTdFOWfywGyQTKDRZyGuhkfls=
Sec-WebSocket-Extensions: permessage-deflate
Upgrade: websocket

Request Headers
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: no-cache
Connection: Upgrade
Host: site.com:8085
Origin: https://site.com:8084
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: 06FNaHge8GLGVuPFxV2fAQ==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36

Query String parameters
transport: websocket
sid: hh3Dib_aBWgqyO1IAAEL

Working with websockets

A new websocket can be opned with ws or wss and it can have sub protocols like in example .

var wsconnection = new WebSocket('wss://voipsever.com', ['soap', 'xmpp']);

It can be attached with event handlers

wsconnection.onopen = function () {
wsconnection.onerror = function (error) {
  console.log('WebSocket Error ' + error);
wsconnection.onmessage = function (e) {
  console.log('message received : ' + e.data);

Send Data on websocket

message string


Blob or ArrayBuffer object to send binary data
Ex : Sending canvas ImageData as ArrayBuffer

var img = canvas_context.getImageData(0, 0, 400, 320);
var binary = new Uint8Array(img.data.length);
for (var i = 0; i < img.data.length; i++) {
  binary[i] = img.data[i];

Ex : sending file as Blob

var file = document.querySelector('input[type="file"]').files[0];

Closing the connection

if (socket.readyState === WebSocket.OPEN) {

Registry for Close codes for WS
1000 Normal Closure [IESG_HYBI] [RFC6455]
1001 Going Away [IESG_HYBI] [RFC6455]
1002 Protocol error [IESG_HYBI] [RFC6455]
1003 Unsupported Data [IESG_HYBI] [RFC6455]
1004 Reserved [IESG_HYBI] [RFC6455]
1005 No Status Rcvd [IESG_HYBI] [RFC6455]
1006 Abnormal Closure [IESG_HYBI] [RFC6455]
1007 Invalid frame payload data [IESG_HYBI] [RFC6455]
1008 Policy Violation [IESG_HYBI] [RFC6455]
1009 Message Too Big [IESG_HYBI] [RFC6455]
1010 Mandatory Ext. [IESG_HYBI] [RFC6455]
1011 Internal Error [IESG_HYBI] [RFC6455][RFC Errata 3227]
1012 Service Restart [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1013 Try Again Later [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1014 The server was acting as a gateway or proxy and received an invalid response from the upstream server. This is similar to 502 HTTP Status Code. [Alexey_Melnikov] [https://www.ietf.org/mail-archive/web/hybi/current/msg10748.html]
1015 TLS handshake [IESG_HYBI] [RFC6455]
1016-3999 Unassigned
4000-4999 Reserved for Private Use [RFC6455]

WebSocket Subprotocol Name Registry

  • MBWS.huawei.com MBWS
  • MBLWS.huawei.com MBLWS
  • soap soap
  • wamp WAMP (“The WebSocket Application Messaging Protocol”)
  • v10.stomp Name: STOMP 1.0 specification
  • v11.stomp Name: STOMP 1.1 specification
  • v12.stomp Name: STOMP 1.2 specification
  • ocpp1.2 OCPP 1.2 open charge alliance
  • ocpp1.5 OCPP 1.5 open charge alliance
  • ocpp1.6 OCPP 1.6 open charge alliance
  • ocpp2.0 OCPP 2.0 open charge alliance
  • ocpp2.0.1 OCPP 2.0.1
  • rfb RFB [RFC6143]
  • sip WebSocket Transport for SIP (Session Initiation Protocol) [RFC7118]
  • notificationchannel-netapi-rest.openmobilealliance.org OMA RESTful Network API for Notification Channel
  • wpcp Web Process Control Protocol (WPCP)
  • amqp Advanced Message Queuing Protocol (AMQP) 1.0+
  • mqtt mqtt [MQTT Version 5.0]
  • jsflow jsFlow pubsub/queue protocol
  • rwpcp Reverse Web Process Control Protocol (RWPCP)
  • xmpp WebSocket Transport for the Extensible Messaging and Presence Protocol (XMPP) [RFC7395]
  • ship SHIP – Smart Home IP SHIP (Smart Home IP) is a an IP based approach to plug and play home automation and smart energy / energy efficiency, which can easily be extended to additional domains such as Ambient Assisted Living (AAL). SHIP can be used solely on the customer premises or can be integrated into a cloud based solution.
  • mielecloudconnect Miele Cloud Connect Protocol This protocol is used to securely connect household or professional appliances to an internet service portal via a public communication network in order to enable remote services.
  • v10.pcp.sap.com Push Channel Protocol
  • msrp WebSocket Transport for MSRP (Message Session Relay Protocol) [RFC7977]
  • v1.saltyrtc.org
  • TLCP-2.0.0.lightstreamer.com TLCP (Text Lightstreamer Client Protocol)
  • bfcp WebSocket Transport for BFCP (Binary Floor Control Protocol)
  • sldp.softvelum.com Softvelum Low Delay Protocol SLDP is a low latency live streaming protocol for delivering media from servers to MSE-based browsers and WebSocket-enabled applications.
  • opcua+uacp OPC UA Connection Protocol
  • opcua+uajson OPC UA JSON Encoding
  • v1.swindon-lattice+json Swindon Web Server Protocol (JSON encoding)
  • v1.usp USP (Broadband Forum User Services Platform)
  • mles-websocket mles-websocket
  • coap Constrained Application Protocol (CoAP) [RFC8323]
  • TLCP-2.1.0.lightstreamer.com TLCP (Text Lightstreamer Client Protocol)
  • sqlnet.oracle.com sqlnet This protocol is used for communication between Oracle database client and database server, and its usage as subprotocol of websocket is primarly geared towards cloud deployments. sqlnet supports bi-directional data transfer and is full duplex in nature.
  • oneM2M.R2.0.json oneM2M R2.0 JSON
  • oneM2M.R2.0.xml oneM2M R2.0 XML
  • oneM2M.R2.0.cbor oneM2M R2.0 CBOR
  • transit Transit
  • 2016.serverpush.dash.mpeg.org MPEG-DASH-ServerPush-23009-6-2017
  • 2018.mmt.mpeg.org MPEG-MMT-23008-1-2018
  • webrtc.softvelum.com Softvelum WebSocket signaling protocol WebRTC live streaming requires WebSocket-based signaling protocol for every specific implementation. Softvelum products will use this subprotocol for signaling

websocket libraries

C++: libwebsockets
Erlang: Shirasu.ws
Java: Jetty
Node.JS: ws
PHP: Ratchet, phpws

Ref :
RFC 6455 – The websocket protocol
Websocket Protocol Registeries : http://www.iana.org/assignments/websocket/websocket.xml
IANA websocket -https://www.iana.org/assignments/websocket/websocket.xhtml

Wifi 6

Wi‑Fi is a trademark of the Wi-Fi Alliance
family of radio technologies commonly used for wireless local area networking (WLAN) of devices.

Current and older Wifi standards

standards operate on varying frequencies, deliver different bandwidth, and support different numbers of channels.

802.11a – transmits at 5 GHz frequency band of the radio spectrum with 54 megabits of data per second.
uses orthogonal frequency-division multiplexing (OFDM) which splits that radio signal into several sub-signals before they reach a receiver to reduces interference.

802.11b – transmits at 2.4 GHz with speed of 11 megabits of data per second,
uses complementary code keying (CCK) modulation to improve speeds.

802.11g – transmits at 2.4 GHz but faster upto 54 megabits of data per second.
uses OFDM coding

802.11n – speeds 140 megabits per second
backward compatible with a, b and g.
can transmit up to four streams of data, each at a maximum of 150 megabits per second, but most routers only allow for two or three streams.

802.11ac – n on the 2.4 GHz band and ac on the 5 GHz band.
backward compatible with 802.11n and thus others
450 megabits per second on a single stream
allows for transmission on multiple spatial streams upo 8
called 5G WiFi because of its frequency band
Very High Throughput (VHT)

Wifi 6

Wi-Fi CERTIFIED 6 networks enable lower battery consumption in devices, making it a solid choice for any environment, including smart home and Internet of Things (IoT) uses.

Wifi Compoents

  • wireless access point (AP) allows wireless devices to connect to the wireless network.
    takes the bandwidth coming from a router and stretches it so that many devices can go on the network from farther distances away.
    give useful data about the devices on the network, provide proactive security, and serve many other practical purposes.
  • Wireless routers are hardware devices that Internet service providers use to connect you to their cable or xDSL Internet network.
    combines the networking functions of a wireless access point and a router.
  • mobile hotspot – feature on smartphones with both tethered and untethered connections
    share your wireless network connection with other devices

Wifi performance

Wi-Fi operational range depends on factors such as the frequency band, radio power output, receiver sensitivity, antenna gain and antenna type as well as the modulation techniquea and propagation charestristics of the signal

Transmitter power
Compared to cell phones and similar technology, Wi-Fi transmitters are low power devices. In general, the maximum amount of power that a Wi-Fi device can transmit is limited by local regulations, such as FCC Part 15 in the US. Equivalent isotropically radiated power (EIRP) in the European Union is limited to 20 dBm (100 mW).

An access point compliant with either 802.11b or 802.11g, using the stock omnidirectional antenna might have a range of 100 m


  • Wired Equivalent Privacy WEP
    client connects to a WEP-protected network, the WEP key is added to some data to create an “initialization vector”/ IV
  • WiFi Protected Access version 2 (WPA2)
    successor to WEP and WPA
    uses either TKIP or Advanced Encryption Standard (AES) encryption
  • WiFi Protected Setup (WPS)
    ties a hard-coded PIN to the router for setup is vulnerabile for exploitation by hackers
  • WPA3™
    Use the latest security methods , higher grader security protocls
    Disallow outdated legacy protocols
    Require use of Protected Management Frames (PMF)
    increased protections from password guessing attempts
    better password protection through Simultaneous Authentication of Equals (SAE), which replaces Pre-shared Key (PSK) in WPA2-Personal.
  • WPA3-Enterprise
    192-bit minimum-strength security protocols and cryptographic tools
    Authenticated encryption: 256-bit Galois/Counter Mode Protocol (GCMP-256)
    Key derivation and confirmation: 384-bit Hashed Message Authentication Mode (HMAC) with Secure Hash Algorithm (HMAC-SHA384)
    Key establishment and authentication: Elliptic Curve Diffie-Hellman (ECDH) exchange and Elliptic Curve Digital Signature Algorithm (ECDSA) using a 384-bit elliptic curve
    Robust management frame protection: 256-bit Broadcast/Multicast Integrity Protocol Galois Message Authentication Code (BIP-GMAC-256)

Ref :

Long Term Evolution (LTE), VOLTE and VOWifi

LTE stands for Long Term Evolution and is a registered trademark owned by ETSI (European Telecommunications Standards Institute) for the wireless data communications technology and a development of the GSM/UMTS standards.

  • Both radio and core network evolution
  • All-IP packet-switched architecture
  • standardised by 3GPP
  • lower CAPEX ans OPEX involved

LTE evolved from an earlier 3GPP system known as the Universal Mobile Telecommunication System (UMTS), which in turn evolved from the Global System for Mobile Communications (GSM). Also it is aligned with 4G (fourth-generation mobile)

LTE is backward compatible with GSM/EDGE/UMTS/CDMA/WCDMA systems on existing 2G and 3G spectrum , even hand-over and roaming to existing mobile networks.

Motivation for evolution

Wireless/cellular technology standards are constantly evolving for better efficiency and performance.LTE evolved as a result of rapid increase of mobile data usage. Applications such as voice over IP (VOIP), streaming multimedia, videoconferencing , cellular modemetc.

It provides packet-switched traffic with seamless mobility and higher qos than predecessors. Also high data rate, throughput, low latency and packet optimized radioaccess technology on flexible bandwidth deployments.

Timeline of Evolution 

  • GSM  : calls  on circuit switching ( CS ) between 2 parties for communication. Dedicated circuits are used for voice and SMS.
  • GPRS : packet switching (PS) is introduced for data services
  • UMTS / 3G : network elements begin evolving into PS . No changes to core.
  • EPC / LTE/VOLTE : No circuit switched domain at all .


Peak Data Rate
uplink – 75Mbps(20MHz bandwidth)
downlink – 150 Mbps(UE Category 4, 2×2 MIMO, 20MHz bandwidth) , 300 Mbps(UE category 5, 4×4 MIMO, 20MHz bandwidth)

carrier bandwidth

range from 1.4 MHz up to 20 MHz. Ultimately bandwidth used by carrier depends on frequency band and the amount of spectrum available with a network operator

Mobility 350 km/h

Multiple Access Schemes
uplink: SC-FDMA (Single Carrier Frequency Division Multiple Access) 50Mbps+ (20MHz spectrum)
downlink: OFDM (Orthogonal Frequency Division Multiple Access) 100Mbps+ (20MHz spectrum)

Multi-Antenna Technology , Multi-user collaborative MIMO for Uplink and TxAA, spatial multiplexing, CDD ,max 4×4 array for downlink


5 – 100km with slight degradation after 30km

LTE architecture supports hard QoS and guaranteed bit rate (GBR) for radio bearers.


All interfaces between network nodes are IP based
Duplexing – Time Division Duplex (TDD) , Frequency Division Duplex (FDD) and half duplex FD

MIMO ( Multiple Input Multiple Output ) transmissions –

Allows the base station to transmit several data streams over the same carrier simultaneously.
Modulation Schemes

QPSK, 16QAM, 64QAM(optional)

LTE Architecture

Primarily composed of

User Equipment (UE)

  • Mobile Termination (MT)
  • Terminal Equipment (TE) 
  • Universal Integrated Circuit Card (UICC) : also known as the SIM card for LTE equipments. It runs an application known as the Universal Subscriber Identity Module (USIM).

2. Evolved UMTS Terrestrial Radio Access Network (E-UTRAN)

handles the radio communications between the mobile and the evolved packet core. High level representation for  eNodeB or eNB

role of eNB :

sends and receives radio transmissions to all the mobiles using the analogue and digital signal processing functions of the LTE air interface.

eNB also controls the low-level operation of all its mobiles, by sending them signalling messages such as handover commands.

3. Evolved Packet Core (EPC)

This sub system resembles IMS environment.

Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world simillar to GGSN ( GPRS support node ) and SGSN ( serving GPRS support node ) in UMTS and GSM.

Home Subscriber Server (HSS) is a central database that contains information about all the network operator’s subscribers. Almost simillar to HLR/AAA in 2G /3G architcture.

Mobility management entity (MME) controls the high-level operation

For a roming user in Visited-PLMN , he is connected with the E-UTRAN, MME and S-GW of the visited LTE network. However, LTE/SAE allows the P-GW of either the visited or the home network to be used, as shown in below:

For roaming prepaid charging , accounting flows are made to access prepaid customer data, via P-Gateways or CSCF in an IMS environment.


LTE devices capable of CAT6 speeds (Category 6 )
Increased peak data rate – downlink 3 Gbps, Uplink 1.5 Gbps ( 1 Gbps = 1000 Mbps)
Spectral efficiency from 16bps/Hz in R8 to 30 bps/Hz in R10
Carrier Aggregation (CA)
Enhanced use of multi-antenna techniques
Support for Relay Nodes (RN)

References :