EEP (formely HEP) Extensible Encapsulation Protocol with HOMER

EEP duplicates and IP datagram and encapsulates and sends for remote relatime monitoring for SIP specific alerts and notifications . HEP is popular among many SIP servers including Freeswitch , Opensips , kamailio , RTP engine as an external module .

  • intended for passive duplicated for remote collection
  • can be used for audit storage and analysis
  • does not alter the orignal datagram or headers

HOMER is Packet and Event capture system popular fpr VOIP/RTC Monitoring based on HEP/EEP (Extensible Encapsulation protocol)

SIP Server Integration

Homer and homer encapsulation protocl (HEP) integration with sip server brings the capabilities to SIP/SDP payload retention with precise timestamping better monitor and detect anomilies in call tarffic and events correlation of session ,logs , reports also the power to bring charts and statictics for SIP and RTP/RTCP packets etc. We read about sipcapture and sip trace modules in project sipcapture_siptrace_hep.

Both Kamailio and Opensips HEP Integration are structurally simmilar. In kamailio SIPCAPTURE [2] module enables support for –

● Monitoring/mirroring port
● HEP encapsulation protocol mode (HEP v1, v2, v3)

Figure Opensips Capturing ( credits

Figure showing Opensips integartion with external capturing agent via proxy agent ( which can be HOMER)

To achieve that, load and configure the SipCapture module in the routing script.

Snippets fro Kamailio Homer docker installation as a collector

git clone
cd homer-docker
docker-compose build
docker-compose up

Outsnippets from screen while the installation takes place

Creating network "homer-docker_default" with the default driver
Creating volume "homer-docker_homer-data-semaphore" with default driver
Creating volume "homer-docker_homer-data-mysql" with default driver
Creating volume "homer-docker_homer-data-dashboard" with default driver
Pulling mysql (mysql:5.6)...
5.6: Pulling from library/mysql
Creating mysql ... done
Creating homer-webapp   ... done
Creating homer-cron      ... done
Creating homer-kamailio  ... done
Creating bootstrap-mysql ... done
Attaching to mysql, homer-webapp, bootstrap-mysql, homer-cron, homer-kamailio
homer-webapp | Homer web app, waiting for MySQL
homer-cron   | Homer cron container, waiting for MySQL
homer-kamailio | Kamailio, waiting for MySQL
bootstrap-mysql | Mysql is now running.
bootstrap-mysql | Beginning initial data load....
bootstrap-mysql | Creating Databases...
bootstrap-mysql | Creating Tables...
omer-kamailio | Kamailio container detected MySQL is running & bootstrapped
homer-kamailio |  0(22) INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
homer-kamailio |  0(22) WARNING: <core> [core/socket_info.c:1315]: fix_hostname(): could not rev. resolve
homer-kamailio | config file ok, exiting...
homer-kamailio | loading modules under config path: //usr/lib/x86_64-linux-gnu/kamailio/modules/
homer-kamailio | Listening on 
homer-kamailio |              udp:
homer-kamailio | Aliases: 
homer-kamailio | 
homer-kamailio |  0(23) INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
homer-kamailio |  0(23) WARNING: <core> [core/socket_info.c:1315]: fix_hostname(): could not rev. resolve
homer-kamailio | loading modules under config path: //usr/lib/x86_64-linux-gnu/kamailio/modules/
homer-kamailio | Listening on 
homer-kamailio |              udp:
homer-kamailio | Aliases: 
homer-kamailio | 
homer-kamailio |  0(23) INFO: sipcapture [sipcapture.c:480]: parse_table_names(): INFO: table name:sip_capture
homer-webapp | Homer web app container detected MySQL is running & bootstrapped
homer-webapp | Module php5 already enabled

Capture tools

Dialoge module

storing dialogs in mysql DB , requires initialising mysql

#!define WITH_MYSQL
#!ifdef WITH_MYSQL
loadmodule ""
#!ifdef WITH_MYSQL
# - database URL - used to connect to database server by modules such
#       as: auth_db, acc, usrloc, a.s.o.
#!ifndef DBURL
#!define DBURL "mysql://root:kamailio@localhost/kamailio"
loadmodule ""
# ----- dialog params ------
modparam("dialog", "dlg_flag", 10)
modparam("dialog", "track_cseq_updates", 0)
modparam("dialog", "dlg_match_mode", 2)
modparam("dialog", "timeout_avp", "$avp(i:10)")
modparam("dialog", "enable_stats", 1)
modparam("dialog", "db_url", DBURL)
modparam("dialog", "db_mode", 1)
modparam("dialog", "db_update_period", 120)
modparam("dialog", "table_name", "dialog")

seting db_mode – synchronisation of dialog information from memory to an underlying database has following options
0 – NO_DB – the memory content is not flushed into DB;
1 – REALTIME – any dialog information changes will be reflected into the database immediately.
2 – DELAYED – the dialog information changes will be flushed into DB periodically, based on a timer routine.
3 – SHUTDOWN – the dialog information will be flushed into DB only at shutdown – no runtime updates.

note :

  • use the same hash_size while using diff kamailio to restore dialogs

database table for dialogue

  1. install mysql
  2. define root ( with db create permissions ) and user ( with database read wrote ) permission in kamctlrc
vi /usr/local/etc/kamailio/kamctlrc
  • Dialogue table schema *
name type size default null key extra attributes description
id unsigned int 10 no primary autoincrement unique ID
hash_entry unsigned int 10 no Number of the hash entry in the dialog hash table
hash_id unsigned int 10 no The ID on the hash entry
callid string 255 no Call-ID of the dialog
from_uri string 128 no URI of the FROM header (as per INVITE)
from_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog.
to_uri string 128 no URI of the TO header (as per INVITE)
to_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog.
caller_cseq string 20 no Last Cseq number on the caller side.
callee_cseq string 20 no Last Cseq number on the caller side.
caller_route_set string 512 yes Route set on the caller side.
callee_route_set string 512 yes Route set on on the caller side.
caller_contact string 128 no Caller's contact uri.
callee_contact string 128 no Callee's contact uri.
caller_sock string 64 no Local socket used to communicate with caller
callee_sock string 64 no Local socket used to communicate with callee
state unsigned int 10 no The state of the dialog.
start_time unsigned int 10 no The timestamp (unix time) when the dialog was confirmed.
timeout unsigned int 10 0 no The timestamp (unix time) when the dialog will expire.
sflags unsigned int 10 0 no The flags to set for dialog and accesible from config file.
iflags unsigned int 10 0 no The internal flags for dialog.
toroute_name string 32 yes The name of route to be executed at dialog timeout.
req_uri string 128 no The URI of initial request in dialog
xdata string 512 yes Extra data associated to the dialog (e.g., serialized profiles).

Siptrace module

SIPtrace module offer a possibility to store incoming and outgoing SIP messages in a database and/or duplicate to the capturing server (using HEP, the Homer encapsulation protocol, or plain SIP mode).

loadmodule ""
modparam("siptrace", "duplicate_uri", "sip:")
modparam("siptrace", "hep_mode_on", 1)
modparam("siptrace", "trace_to_database", 0)
modparam("siptrace", "trace_flag", 22)
modparam("siptrace", "trace_on", 1)

integrating iut with request route to start duplicating the sip messages


  • trace_mode * 1 – uses core events triggered when receiving or sending SIP traffic to mirror traffic to a SIP capture server using HEP 0 – no automatic mirroring of SIP traffic via HEP.


address in form of a SIP URI where to send a duplicate of traced message. It uses UDP all the time.

modparam("siptrace", "duplicate_uri", "sip:")

to check the duplicate messages arriving

ngrep -W byline -d any port 9060 -q

RPC commands

Can ruen sip trace on or off

kamcmd> siptrace.status on   

and to check

kamcmd> siptrace.status check

Store sip_trace in database

modparam("siptrace", "trace_to_database", 1)
modparam("siptrace", "db_url", DBURL)
modparam("siptrace", "table", "sip_trace")

where the sip_trace tabel description is

| Field       | Type             | Null | Key | Default             | Extra          |
| id          | int(10) unsigned | NO   | PRI | NULL                | auto_increment |
| time_stamp  | datetime         | NO   | MUL | 2000-01-01 00:00:01 |                |
| time_us     | int(10) unsigned | NO   |     | 0                   |                |
| callid      | varchar(255)     | NO   | MUL |                     |                |
| traced_user | varchar(128)     | NO   | MUL |                     |                |
| msg         | mediumtext       | NO   |     | NULL                |                |
| method      | varchar(50)      | NO   |     |                     |                |
| status      | varchar(128)     | NO   |     |                     |                |
| fromip      | varchar(50)      | NO   | MUL |                     |                |
| toip        | varchar(50)      | NO   |     |                     |                |
| fromtag     | varchar(64)      | NO   |     |                     |                |
| totag       | varchar(64)      | NO   |     |                     |                |
| direction   | varchar(4)       | NO   |     |                     |                |

sample databse storage for sip traces

select * from sip_trace;

| id | time_stamp          | time_us | callid  | traced_user | msg         | method | status | fromip                   | toip                     | fromtag  | totag    | direction |
|  1 | 2019-07-18 09:00:18 |  417484 | MTlhY2VmNDdjN2QxZGM5ZDFhMWRhZThhZDU4YjE0MGM |             | INVITE sip:altanai@sip_addr;transport=udp SIP/2.0
Via: SIP/2.0/UDP local_addr:25584;branch=z9hG4bK-d8754z-1f5a337092a84122-1---d8754z-;rport
Max-Forwards: 70
Contact: <sip:derek@call_addr:7086;transport=udp>
To: <sip:altanai@sip_addr>
From: <sip:derek@sip_addr>;tag=de523549
Content-Type: application/sdp
Supported: replaces
User-Agent: Bria 3 release 3.5.5 stamp 71243
Content-Length: 214

o=- 1563440415743829 1 IN IP4 local_addr
s=Bria 3 release 3.5.5 stamp 71243
c=IN IP4 local_addr
t=0 0
m=audio 59814 RTP/AVP 9 8 0 101
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendrecv                                                                                                                                                                                      | INVITE |        | udp:caller_addr:27982 | udp:sip_pvt_addr:5060   | de523549 |          | in        |

|  2 | 2019-07-18 09:00:18 |  421675 | MTlhY2VmNDdjN2QxZGM5ZDFhMWRhZThhZDU4YjE0MGM |             | SIP/2.0 100 trying -- your call is important to us
Via: SIP/2.0/UDP local_addr:25584;branch=z9hG4bK-d8754z-1f5a337092a84122-1---d8754z-;rport=27982;received=caller_addr
To: <sip:altanai@sip_addr>
From: <sip:derek@sip_addr>;tag=de523549
Server: kamailio (5.2.3 (x86_64/linux))
Content-Length: 0                                                                                                                                                                                                                                                                                                                                                                                                                                                           | ACK    |        | udp:caller_addr:27982 | udp:local_addr:5060   | de523549 | b2d8ad3f | in       |


Multi-Protocol Go HEP Capture Agent made

sudo tar -C /usr/local -xzf go1.11.2.linux-amd64.tar.gz

move package to /usr/local/go

mv go 

Either add go bin to ~/.profile

export PATH=$PATH:/usr/local/go/bin

and apply

source ~/.profile

or set GO ROOT , and GOPATH

export GOROOT=/usr/local/go
export GOPATH=$HOME/heplify
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

installation of dependencies

go get

clone heplify repo and make



New OSS Capture-Agent framework with capture suitable for SIP, XMPP and more. With internal method filtering , encryption and authetication this does look very promising howevr since I have perosnally not tried it yet , I will leave this space TBD for future


Other include Sipgrep , HEPipe and nProbe


Multi-Protocol HEP Server & Switch in NodeJS. stand-alone HEP Capture Server designed for HOMER7 capable of emitting indexed datasets and tagged timeseries to multiple backends

node hepop.js -c /app/myconfig.js

PCAP monitoring -> Homer Server -> Notification and Fraud Prevention

A realtime monitoring and alerting setup fom homer can best safeguard on VoIP specific attacks and suspecious activity by early warning . Some list of attacks such as DDOS , SIP SQL injections , parser , remote manipulation hijacking as cell as resource enumeration are common ifor a cloud telephony provider.

Adiitionally homer provide session quality using varables that include [1]

SD = Session Defects

ISA = Ineffective Session Attempts

AHR = Average HOP Requests

ASR = Answer Seizure Ratio
[(‘200’ / (INVITES – AUTH – SUM(3XX))) * 100]

NER = Network Efficiency Ratio
[(‘200’ + (‘486′,’487′,’603’) / (INVITES -AUTH-(SUM(30x)) * 100]

HOMER Web Interface or Custom Dashboard

Some more visualization for inter team communication such as NOC team can include

Homer Integration with influx DB

time series Reltiem DB install

sudo dpkg -i influxdb_1.7.7_amd64.deb


 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

2019-07-19T07:03:04.603494Z	info	InfluxDB starting	{"log_id": "0GjGVvbW000", "version": "1.7.7", "branch": "1.7", "commit": "f8fdf652f348fc9980997fe1c972e2b79ddd13b0"}
2019-07-19T07:03:04.603756Z	info	Go runtime	{"log_id": "0GjGVvbW000", "version": "go1.11", "maxprocs": 1}
2019-07-19T07:03:04.707567Z	info	Using data dir	{"log_id": "0GjGVvbW000", "service": "store", "path": "/var/lib/influxdb/data"}

Home Integration with grafana


For Kamailio integration follow github instructions on

References :


[2] HEP/EEP –

[3] kamailio sipdump module –


[5] HOMER Big Data –

Energy Efficient VoIP systems

Data Centres are the concentrated processing units for the amazing Internet that is driving the technological innovation of our generation and has become the backbone of our global economy. DataCentres not only process , store and carry textual data rather a vast amount of computing is for multimedia content which could range from social media to, video streaming or VoIP calls. In this article let us analyze the energy effiiciency , carbon footprint and scope of improvements for a VoIP related data centre which hosts SIP and related RTC technology signalling and media servers and process CDRs and/or media files for playback or recordings.

Just like a regular IT datacentre , storage, computing power and network capacity define the usage of the server.Also unobstructed electricty of of paramount importance as any blackout could drop ongoing calls and lead to loss of revenue for the service provider not to forget the loss caused to parties engaged in call.

Increasing Power consumption by telecom Sector over the years

Global PPA(power purchase agreements) volumes by sector, 2009-2019, IEA, Global PPA volumes by sector, 2009-2019, IEA, Paris
source : CDP The 3% solution 2013 [19]

Typical VoIP Setup : Whether a cloud Infrstrcture provider of a hosted data centre , an aproximate number of 7 servers is required even for SME ( small to medium enterprises ) communication system and VoIP systems

  • 2 signalling servers primary and standby for HA ,
  • 2 media server for MCU of media bridges or IVR playback etc ,
  • 1 for CDR , logs or call analytics , stats and other supplementary operation
  • 1 for dev or engineering team .
  • 1 edge server could be API server or a gateway or laodbalancer.
Sample voIP system

VoIP solutions are more energy expensive, unless aggressive power saving schemes are in place

Comparison of energy efficiency in PSTN and VoIP systems [14]

While PSTN and other hybrid scenarios relied on audio only communication the embedded systems involved took great pain to make then energy efficient which is not really the case with all digital and software based VoIP.

Power Consumption

Mobile phone :  Typical smartphone with 4,000mAh ( 4 Ah) battery that gets 1 full cycle of usage a day. Daily consumption =4Ah*3.7V=14.8 Wh

Laptop : With  14–15″ screen, a laptop can draw 60 watts power in active use depending on model. Runing 8 hours a day can be 60 * 8 = 480 Wh ( 0.480 kWh) energy consumed in a day.

Desktop PC : Runing at 50-60 Hz frequency , can upto draw 200 W power in active use. For 8 hours energy usage 200 * 8 = 1600 Wh ( 1.6 kWh ) energy a day.

Server : Even though servers are virtual to the request maker , they caters to the request on the other end of the internet.

ServerPurposeServer CPU consumptionClientsClient CPU consumption
ApplicationHosts an application, which can be run through a web browser or customized client software.mediumAny network device with access.low
ComputingMakes available CPU and memory to the client. This type of server might be a supercomputer or mainframe.highAny networked computer that requires more CPU power and RAM to complete an activity.medium
DatabaseMaintains and provides access to any database.lowAny form of software that requires access to structured data.low
FileMakes available shared files and folders across a network.mediumAny client that needs access to shared resources.low
GameProvisions a multiplayer game environment.highPersonal computers, tablets, smartphones, or game consoles.high
MailHosts your email and makes it available across the network.mediumUser of email applications.low
MediaEnables media streaming of digital video or audio over a network.highWeb and mobile applications.high
PrintShares printers over a network.lowAny device that needs to print.low
WebHosts webpages either on the internet or on private internal networks.mediumAny device with a browser.medium
CPU consumption of various server types and their clients

Typically runing on 850 Wh ( 0.850 kwh ) of energy in an hour and since server are usually up 24*7 that totals to

0.850 * 24 = 20.4 kWh a day [2].

VoIP System ( 7 VM’s) : For a setup of 7 VM’s ( could on a the same PM), total energy consumed in a day

20.4 * 7 = 142.8 kWh.

Data centre: The data centre building consists of the infrastructure to support the servers, disks and networking equipment it contains. However, for simplicity, I will only use the consumption of servers and ignore the cooling units, networking, backup batteries charging, generators, lightning, fire suppression, maintenance etc.

High tier DC can have 100 Megawatts of capacity having each rack was using 25 kW of power in a 52U Rack. 100,000 kW / 25 kW = 4,000 racks * 52(U) = 208000 1U servers. This number scales down depending on how much energy each server uses and idle servers.

Total energy 100,000 kW * 24 hours = 2400000 kWh

Carbon Footprint

Carbon footprint in the context of this article refers to the amount of greenhouse gas ( consisting majorly of Co2) caused by electricity consumption. The unit is carbon emission equivalent of the total amount of electricity consumed kg CO2 per kWh.

In doing this calculation I have assumed 0.233 kg CO2 per kWh which could be less or more depending on the generation profile of the electricity provider as well as the heat produced by the machine.

Laptop: Aside from the production which could be 61.4 kg (135.5 lbs) of Co2, a 60W laptop will produce 0.112 kg co2 eq per day.

Desktop PC: Aside from production cost and heating, the GWG and co2 eq emission from running a desktop for a day ( 8 hours) produces 1.6* 0.233 = 0.3728 kg CO2 per kWh

Server : 20.4 * 0.233 = 4.7532 kg CO2 per kWh per day .

VoIP System ( 7 VM’s): Again ignoring the GWG emission of associated components, 142.8 * 0.233 = 33.2724 kg CO2 per kWh per day. It is to be noted that DC’s ( datacentres) use the term PUE ( Power Usage Effectiveness) to showcase their energy efficiency and energy efficiency certification uses the same in ratings.

Data centre: electrical carbon footprint( approximate calculation not counting the cooling, infra maintenance, lightning and possibly idle servers in datacentre) is 2400000 * 0.233 = 559200 kg CO2 per kWh per day

It is to be noted that a common figure should not be extrapolated like this to derive carbon emission. The emission depends on the fuel mix of the electricity generation as well as the life cycle assessment (LCA) of carbon equivalent emission. Countries with heavy reliance on renewables have lower co2 footprint per kWh ~ 0.013 kg co2 per kWh Sweden while others may have higher such as 0.819 kg CO2 per kWh Estonia [1].

Flatten the Curve from Tech and Internet usage

Rack servers tend to be the main perpetrators of wasting energy and represent the largest portion of the IT
energy load in a typical data center.

A decade ago, small enterprise IT facilities were quick to create data centres for hosting applications from hospitals, banks, insurance companies. While some of these is likely to have been upgrade to shared server instances runing on IaaS providers, most of them are still serving traffic or stays there for the lack of effort to upgrade.

With the advancement in p2p technlogies such as dApps , bitcoion network , p2p webrtc streaming , more edge computed ML continue to create disruptions in existing trend , most likely to result in in many fold increase in consumption.

According to the Cambridge Center for Alternative Finance (CCAF), Bitcoin currently consumes around 110 Terawatt Hours per year — 0.55% of global electricity production

Harward Business Review [12]

“the emissions generated by watching 30 minutes of Netflix (1.6 kg of CO2) is the same as driving almost four miles.” 

EnergyInnovation [13]

Cloud Computing and Energy efficiency

Cloud computing ( SaaS, PaaS , IaaS and also CPaaS) minimize power consumption and consequently IT costs via virtualization, clustering and dynamic configuration.

With cloud infrastrcture vendors such as Amazon , Google , microsoft .. and their adoption of energy efficiency computing and credible transparency has alleviated some of the stress that could have been made if onsite self – hosted data centres were used as often in mainstream as a decade ago.

Even as cloud providers gives on -demand access to shared resources in large scale distributed computing , the ease of getting on board has inturn created a surge in cloud hosted online applications consequently high power consumption, more operation costs and higher CO2 emissions.

Components of energy Consumption in Data Centre

As shown CPU, Memory, and Storage incur 45% of the costs and consume 26% of the total energy , however power distribution and colling cost 25% but consumer >50% of total energy.

Energy forcast for Data Centres

As reported by nature [3] the widely cited forcasts suggested thte total electrcity demand of ICT ( Informatioin and Communication technology ) will accelerate and while consumer devices such as smart TV , laptops and mobile are becoming energy effcient , the data centres and network devices will demand bigger portions. Reported in 2018 , 200 Twh( terawatt hours) of energy was being consumed by data centers . Although there are no figures for the telecom or specifically IP cloud telephony , the assumption that enormous multimedia data flows in every session is enouogh to assume the figure must be huge.

Energy eficiency in data centres have also been the subject of many papers and studies. Many of the tech advancements and measures have so far been able to keep the growth in energy requirnments by tech sector to a linear/ flat one.

past and projected growth rate of total US data center energy use from 2000 until 2020. It also illustrates how much faster data center energy use would grow if the industry, hypothetically, did not make any further efficiency improvements after 2010. (Source: US Department of Energy, Lawrence Berkeley National Laboratory)

Some noteworthy innovations made in Data centre for energy efficiency include –

  1. Star efficiency requirnments
  • Average server utilization
  • Server power scaling at low utilization
  • Average power draw of hard disk drives
  • Average power draw of network ports
  • Average infrastructure efficiency (i.e., PUE)

PUE = Total Facility power / IT equipment power

Standard 2.0, Good 1.4 , Better 1.1

Low PUE indicates greater efficicny since more power would then be used by It gear . Idealistically 1 should be the perfect score where all power was used only by the IT gears.

2. Optimizing the cooling system which takes a lot of focus is also not touched upon here but can be understood in great detail from very many sources including one here on how google uses AI for cooling its Datacentres [6]

3. Throttle-down drive ,a device that reduces energy consumption on idle processors, so that when a server is running at its typical 20% utilization it is not drawing full power

Energy efficiency is vital to not only productivity and performance but also to carbon neutral tech and economy. There is ample scope to designing energy efficient applications and platfroms. Some approaches are described below:

Energy Efficiency in VoIP Architecture and design

Low Energy consumption not only lowers operating cost but also helps the enviornment by reducing carbon emission.

1.Server Virtualization

By consolidating multiple independant servers to a single underlying physical server helps retain the logical sepration while also maintaining the energy costs and maximizinng utilization . VM’s( Virtual machines) are instances of virtaulized portions on the same server and can be independetly accesed using its own IP and network settings.

To reduce electricity usage in our labs and data centers, we use smart power distribution units to monitor
our lab equipment. We increase server utilization by using virtual machines. Our Cisco Customer
Experience labs use a check-in, check-out system of automation pods to allow lab employees to set up
configurations virtually and then release equipment when they are finished with it.

Cisco 2020 Environment Technical Review [20]

Models to place VMs on PM ( physical machine ) have been proposed by Dong et al[8] , Huang[9]  ,Tian et al [10]

2.Decommissioning old / outdated servers

While this is the most obvious way to increase efficiency , it is also the toughest since legacy applications or a small portion of it may be running on a server that service providers are not keen on updating or updates do not exist and it is past end of life yet somehow still in use. It is important to identify such components. Check if maybe an old glassfish or bea weblogic SIP servlet server needs updating and/or migration !

3.Plan HA ( high availability ) efficiently

Redundant servers take only if at all any , partial loads so they can be activated in full swing when failover happens in other server. With quick load up times and forward looking monitoring , the analyzers can monitor logs for upcoming failure or predictable downtime and infra script can bring up pre designed containers in seconds if not minutes. It isn’t wise to create more than 1 standby server which does no essential work but consumes as much power.

4.Consolidate individual applications on a Server

Map the maximum precitable load and deduce the percentage comsuption with teh same . In view of these figures it is best to consolidate applications servers to be run on a single server . A distributed microservice based architecture can also support consolidation by runing each major application in its own dockerized container. Consolidation ensures that

  • All data can be stored and accessed centrally, which reduces the likelihood of data duplication.
  • while a server is drawing full power , it is also showing relataible utilization.
  • Single point to prevent intrusion , provide security and fix vulnerabilities against malware like ( ransomware , viruses , spyware , trojans)

5.Reduce redundancy

While it is a common practise to store multiple copies of data such as CDR ( call detail records ) and archiev historical logs for later auditing , it is not the most energy efficient way since it ends up wasting stoarge space. It is infact a better approach to skim only the crtical parts and diacard the rest and definetely implement background tasks to compress the older and less referenced logs.

6.Power management

Powering down idle server or putting unused server to sleep is an effective way to reduce operating power but is often ignored by the IT department in view of risking slower performance and failure in call continuity in case a server does go down. However power management leads to potential energy savings and should be weighted accordingly.

7.Common Storage such as Network Attached Storage

Power consumption is roughly linear to the number of storage modules used. Storage redundancy needs to be
right-sized to avoid rapid consumption of avaible storage space , CPU cycles to refer and index them, its associated power consumption [7].

The process of maximizing storage capacity utilization by drawing from a common pool of shared storage on need baisis also allows for flexixbility.

It is sensible to take the data offline thereby reducing clutter on production system and make the existing data quickly retrievable.

8.Sharing other IT resources

Central Processing Units (CPU), disk drives, and memory optimizes electrical usage. Short term load shifting combined with throttling resources up and down as demand dictates improves long term hardware energy efficiency. [7]

Hardware based approaches such as energy star rating, air conditoning , placement of server racks , air flow , cabling etc have not been touched upon in this article they can be read from energystar report here [5] .

9. DMZ / Perimeter network

The perimeter network (also known as DMZ, demilitarized zone, and screened subnet) is a zone where resources and services accessible from outside the organization are available. Often used as barrier between internal secure green zone within company and outside partners / suppliers such as external organization gateways.

  • Load balancers
  • API gateways
  • SBC ( Session Border controllers)
  • Media Gateways

Ways to cut down on CPU consumption in DMZ machines

  1. Scrutinize incoming traffic only , trust outgoing traffic .

2. Use hardware / network firewalls to monitor and block instead of software defined ones . Hardware firewall can be a standalone physical device or form part of another device on your network. Physical devices like routers, for example, already have a built-in firewall. 

Other types of firewalls

  • Application-layer firewalls can be a physical appliance, or software-based, like a plug-in or a filter. These types of firewalls target your applications. For example, they could affect how requests for HTTP connections are inspected across each of your applications.
  • Packet filtering firewalls scrutinize each data packet as it travels through your network. Based on rules you configure, they decide whether to block the specific packet or not. For example firewalls can block SSH/RDP for remote management.
  • Circuit-level firewalls check whether TCP and UDP connections across your network are valid before data is exchanged. For example, this type of firewall might first check whether the source and destination addresses, the user, the time, and date meet certain defined rules.
  • Proxy server firewalls secure the traffic into and out of a network by monitor, filter, and cache data requests to and from the network.

Energy Efficiency in VoIP Applications and algorithms

In theory, energy efficient algorithms would take less processing power , run fewer CPU cycles and consume less memory. For the experiments with WebRTC and SIP VoIP systems CPU performance can be reliable factor to consider for carbon emissions . Here is list of approaches to include energy as of the parameters in programing for RTC applications.

  1. Take advanatge of Multi Core applications

Multi-core processor chips allow simultaneous processing of multiple tasks, which leads to higher efficiency. Same power source and shared cooling leads to better efficiency . It is the same logic which applied to consolidating one power supply for a rach isntead of individual power supply to each servers on rack.

2. Reduce Buffering

Input/Output buffer pile up comuted packets or blocks which will come inot use in near future but may be discarded all together in event of skip or shutdown. For example in case of video on Demand ( VoD) , a buffered video of 1 hour is of not much use if viewer decides to cancel the video session after 10 minutes .

3. Optimize memeory access algorithms

4. Network energy Management to vary as per demand

The newer generations of network equipment pack more throughput per unit of power. There are active energy management measures that can also be applied to reduce energy usage as network demand varies. In a telecoomunication system , almost always a tradeof between power consumption and network performance is made.

  1. Quick switching of speed of the network to match the amount of data that is currently transmitted. A demand following streaming session will maingtain the QoS , avoid imbalance while also reducing power consumption.

2. Avoid sudden burst and peaks and/or align them with energy availaibility .


  • computational performance (i.e., computations/second per server),
  • electrical efficiency of computations (i.e., computations per kWh),
  • storage capacity (i.e., TB per drive), and
  • port speeds (i.e., Gb per port)

5. Task Scheduling algorithms

Some recent researched frameworks and models take Co2 emission into prespective , while allocating resources according to queuing model. The most efficient ones not only bring down the carbon footprint but also the high operating cost [11].

Scheduling and monitoring techniques have been applied to achieve a cost effective and power-aware cloud environment by reducing the resource exploitation

6. Centralised operation – RTP topology ( Mesh , MCU and SFU)

Instead of operating many servers at low CPU utilization, at edge of client’s end, combines the processing
power onto fewer servers that operate at higher utilization.

Modern machine learning programs are computationally intensive, and their integration in VoIP systems for tagging , sentiment analysis , voice quality analysis is increasingly adding additional strain already heavy processing of media server in transcoding and multiplexing .

Media Server using SFU ( Selective Forwarding unit) to transmit mediastrem

As an example a SFU client sends one upstream but receives 4 downstreams which reduces the load on server but increases on clients .

7. Distributing workload based on server performance

Aggregating tasks and runing them as Serverless , asynchronous jobs instead of standalone processes is very efficient way to cut down idle runing wastage. Additioally catagorizing server workloads based on server performance can also reduce power consumption by using idle servers efficiently. Thermal aware workload distribution also helps reducing power consumption and consequently electricity consumption in cooling .

8 . Reduce reauthetication and challemge response mechanism when it can be avoided.

There exists multiple modes to authenticate and authorize users and application access to server content

Over the network

  • password based auth ,
  • third party based auth ( Oauth)
  • 2 factors authetication( phone/sms based) ,
  • multi factor auth ( sms / email / other media) ,
  • token auth ( custom USB device/ smart card ) ,
  • biometric auth (physical human charecteristics / scanners ) ,
  • transactional auth ( location , hour of day , browser/ machine type)

Computer recognition authentication

  • Single sign-on

Authentication protocols

  • Kerbos – Key Distribution Center (KDC) using a Ticker gransting Server ( TGS)

A callflow involves AAA while creating the session and may require occsional re authetication to reafform the user is intended one. Doing re-authtication too often increases the power consumption and can be countered by caching and timeout mechanism.

Point of presence and handover using Carbon footprint in different demographics

  1. Include Carbon emission from Datacentre in condieration before engaging the server in call path from load balancer gateway

2. Use point of presence ( PoP) for server according to their carbon emission factor in the demography .

Us states carbon emission rate from electricity generation (2018 report ) Source : [16]
UK greenhouse gas reporting source : [17]

Energy Efficiency in WebRTC browser applications and native applications

In a Video conferencing the over browser, WebRTC has emerged as te the default standard . The efficiency of sch webrtc browser based video conferencing web applications can be enhanced in the following ways :

1.Use VoIP Push Notifications to Avoid Persistent Connections

2. Voice Activity detection ( Mute the spectators ) and join with video true , audio false for attendeees

Energy efficiency in VoIP phones

If all eligible VoIP phones sold in the United States were ENERGY STAR certified, the energy cost savings would grow to more than $65 million each year and 1.2 billion pounds of annual greenhouse gas emissions would be prevented, equivalent to the emissions from more than 119,000 vehicles.

Energystart [15]

Low-energy-consuming embedded hardware on most phones keep the average consumption low . A analog phone can consume power between 0.07 W to 9.27 W while a VoIP phone can consume 0.1W to 3.5 W of standby power.

Off mode power is often less than standby power since phone is on low power model during idle hours such as night . According to energy star Sund transmission mechnism also plays a key role and hybrid phones consume more power.

Power allowance (W) for each of the below features of the device:

  • 1.0 watt for Gigabit Ethernet
  • 0.2 watt for Energy Efficiency Ethernet 802.3az compliant Gigabit Ethernet

Additional proxy incentive(W) for the ability to maintain network presence while in a low power mode and intelligently wake when needed

  • 0.3 watt for base capability
  • 0.5 watt for remote wake

Government bodies and groups to track Energy efficiency of Telecom and IP telephony

  • Alliance for Telecommunications Industry Solutions (ATIS)
  • Telecommunications Energy Efficiency Ratio (TEER)
  • measurement method covers all power conversion and power distribution from the front end of the
    system to the data wire plug, including application-specific integrated circuits (ASICs).
  • European Telecommunications Standards Institute (ETSI)
  • International Telecommunication Union (ITU)
  • U.S. Department of Energy (DOE), Environmental Protection Agency (EPA)

External links

Amazon :

Cisco :

3CX :

The purpose of the article is to raise awareness about carbon footprint from application programs to archietcture designs techniques to data centres and commuulative performance. It gives a direction to stakeholders (customers , programmers , architects , mangers , … ) to choose less carbon emitting approach whenever possible since every bit counts to help the environment.




[3] nature :

[4] Center of Expertise for Energy Efficiency in Data Centers at the US Department of Energy’s Lawrence Berkeley National Laboratory in Berkeley, California.

[5] energy Star –



[8] Yin K, Wang S, Wang G, Cai Z, Chen Y. Optimizing deployment of VMs in cloud computing environment. In: Proceedings of the 3rd international conference on computer science and network technology. IEEE; 2013. p. 703–06.

[9] Huang W, Li X, Qian Z. An energy efficient virtual machine placement algorithm with balanced resource utilization. In: Proceedings of the seventh IEEE international conference on innovative mobile and internet services in ubiquitous computing; 2013. p. 313–19.

[10] W. Tian, C.S. Yeo, R. Xue, Y. Zhong Power-aware schedulingof real-time virtual machines in cloud data centers considering fixed processing intervalsProc IEEE, 1 (2012), pp. 269-273

[11] H. Chen, X. Zhu, H. Guo, J. Zhu, X. Qin, J. Wu Towards energy-efficient scheduling for real-time tasks under uncertain Cloud computing environmentJ Syst Softw, 99 (2015), pp. 20-35



[14] F. Bota, F. Khuhawar, M. Mellia and M. Meo, “Comparison of energy efficiency in PSTN and VoIP systems,” 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), 2012, pp. 1-4, doi: 10.1145/2208828.2208834.


[16] egrid summary table 2018 for carbon emission rate in Us states :

[17] UK greenhourse gas reporting –



[21] It’s Not Easy Being Green by Peter Xiang Gao, Andrew R. Curtis, Bernard Wong, S. Keshav
Cheriton School of Computer Sc

TeleMedicine and WebRTC

Anywhere anytime Telemedicine communication tool accessible on any device.  The solution provides a low eight signalling server which drops out as soon as call is connected thus ensuring absolutely private calls without relaying or involving any central server in any call related data or media . This ensure doctor patient details are not processed , stored or recorded by our servers.

The solution enables doctors / nurses / medical practitioners and patients  to do

  • High definition Audio/video calls 
  • End to end encrypted p2p chats 
  • Integration with HMS ( hospital management system ) to fetch history of the patients 
  • Screens sharing to show reports without transferring them as files 
  • Include more concerned people of doctors using Mesh based peer to peer conferencing feature.      

Confidentialty and Privacy

For privacy and security of certain health information only HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant video-conferencing tools can only be used for Telemedicine in US.

Telemedicine scenario Callflow

Calllfow for Attended Call Transfer and 2 way conference in a Telemedicine scenario between Patient , hospital attendant , doctor and a nurse

References :

Performance of WebRTC sites and electron apps

This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.

Best practices for WebRTC single page applications

As a check the communication clients become single page driven , a lot of authetication , heartbeat sync , web workers , signalling event driven flow management resides on the same page along with the actuall CPU consumption for the audio video resources and media streams . This in turns can make the webpage heavy and many a times could result in crash due to being ” unresponsive”.

Here are some my best to-dos for making sure the webrtc communication client page runs efficiently

Visual stability and CLS ( Cummulative Layout Shift)

CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.

To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .

Minimize main thread work

The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive .

Depriciation of XMLHTTP request on main thread

Reduce Javacsipt execution Time

Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.

If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.

Some effective tips to spedding up JS execution include

  • minifying and compressing code
  • Removing the unused code and console.logs
  • Apply caching to save lookup time



Security Concerns

In one of my previous posts I have mentioned about Security threats to WebRTC Solution . It includes mainly 4 ways in which WebRTC Solution Providers and Users are vulnerable , which includes

  1. Identity Management ,
  2. Browser Security ,
  3. Authentication and
  4. Media encryption.

WebRTC Security

WebRTC Security
Identity Management ,
Browser Security ,
Authentication and
Media encryption.
Browser Threat Model
Best practices for the Webrtc comm agents
ICE TURN challenges

Cookies – Security vs persistsnt state

Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.

While adding cookies we must ensure that if SameSite =None , the cookies must be secure

Set-Cookie: widget_session=abc123; SameSite=None; Secure

SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar. 

Set-Cookie: promo_shown=1; SameSite=Strict

You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.

Performance monitoring

Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .

Lighthouse results

Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App

Also shsows individual categories and comments

Time to render and Page load

Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has

  • Time to Title
  • Time to render
  • Time to inetract

Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports

Task interaction time

Other page interaction crtiteria includes the frames their inetraction and timings for the same.

In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.

Page’s total memeory



The above functions ( old and new ) estimates the memory usage of the entire web page

these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks . Can also use these memery metrics to do A/B testing .

DNS lookup Time

Services such as Pingdom ( or WebPageTest can quickly calculate your website’s DNS lookup times

Load / sgtress tresting can be perfomed over tools such as LoadStorm , JMeter

Page weight and PRPL

Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the pagae light and active and prevent any chrome tab crashes .

PRPL expands to Push/preload , Render , PreCache , Lazy load

  • Render the initial route as soon as possible.
  • Pre-cache remaining assets.
  • Lazy load other routes and non-critical assets.

Preload is a declarative fetch request that tells the browser to request a resource as soon as possible. Hence should be used for crticial assests .

<link rel="preload" as="script" href="critical.js">

The non critical compoenents could then be loaded on async .

Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.

Web Workers

Web Workers are a simple means for web content to run scripts in background threads.The Worker interface spawns real OS-level threads

By acting as a proxy, service workers can fetch assets directly from the cache rather than the server on repeat visits. 

Conversion Rates over analytics Tools

Google Analytics is a good way of deducing Converdsation and bounce rates

Codecs weight

Vide codecs Vp8 vs VP9

SFU vs MCU or p2p media flow

References :


In the course of evolution of RAN ( Radio Access layer) technologies 5G outsmarts 4G-2010 which comes in succession after 3G-2000, 2.5G, 2G -1990 and 1G/PSTN -1980 respectively. Among the most striking features of 5G –

  • entirely IP based
  • ability to connect 100x more devices ( IOT favourable )
  • speed upto 10 Gbit/s
  • high peak bit rate
  • high data volume per unit area
  • virtually 0 latency hence high response time

Thus it can accomodate the rapid growth of rich mulimedia application like OTT streaming of HD content, gaming , Augmented reality so on while enabling devices connected to Internet of Things sto onboard the telecommunication backbone with high system spectral efficiency and ubiquitious connectivity .

Infact 5G has seen maximum investment in year 2020 in revamping infrastrcuture as compared to other technologies such as IoT or even Cloud. This could be partly due to high rise in high speed communication for streaming and remote communication owining to steep rise in remote learning adn working from home scenarious.

img source statista – global-telecom-industry-priority-investment-areas


5G is specified to operate over range 1 GHz to 100 GHz.

  • Low-band spectrum (below 2.5 GHz) – excellent coverage,
  • mid- band spectrum (2.5–10 GHz) – a combination of good coverage and very high bitrates,
  • high band-spectrum (10–100 GHz) – the bandwidths needed for the highest bitrates (up to 20 Gb/s) and lowest latencies

Workplan for 5G standardisation and release

The Workplan started in 2014 and is ongoing as of now (2018)

image source : 3GPP “Getting ready for 5G”

3GPP is the standard defining body for telecom and has specified almost all RAN technologies like GSM , GPRS , W-CDMA , UMTS , EDGE , HSPAand LTE before .

Applications of 5G

5G targets three main use case

  • enhanced mobile broadband (eMBB),
  • massive machine type communications (mMTC)
  • ultra-reliable low latency communications (URLLC) (also called critical machine type communications (cMTC))
sources : whitepaper ericsson

General Data Protection Regulation (GDPR) in VoIP

GDPR, Europe’s digital privacy legislation passed in 2018, replaces the 1995 EU Data Protection Directive. It is rules designed to give EU citizens more control over their personal data & strengthen privacy rights. It aims to simplify the regulatory environment for business and citizens.

To read about other Certificates , compliances and Security in VoIP which summaries

  • HIPAA (Health Insurance Portability and Accountability Act) ,
  • SOX( Sarbanes Oxley Act of 2002),
  • Privacy Related Compliance certificates like COPPA (Children’s Online Privacy Protection Act ) of 1998,
  • CPNI (Customer Proprietary Network Information) 2007,
  • GDPR (General Data Protection Regulation)  in European Union 2018,
  • California Consumer Privacy Act (CCPA) 2019,
  • Personal Data Protection Bill (PDP) – India 2018 and
  • also specifications against Robocalls and SPIT ( SPAM over Internet Telephony) among others

Multinational companies will predominantly be regulated by the supervisory authority where they have their “main establishment” or headquarter. However, the issue concerning GDPR is that it not only applies to any organisation operating within the EU, but also to any organisations outside of the EU which offer goods or services to customers or businesses in the EU.

Key Principles of GDPR are

  • Lawfulness, fairness and transparency
  • Purpose limitation
  • Data minimisation
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality (security)
  • Accountability

GDPR consists of 7 projects (DPO, Impact assessment, Portability, Notification of violations, Consent, Profiling, Certification and Lead authority) that will strengthen the control of personal data throughout the European Union.


stakeholders of data protection regulation are
Data Subject – an individual, a resident of the European Union, whose personal data are to be protected

Data Controller – an institution, business or a person processing the personal data e.g. e-commerce website.

Data Protection Officer – a person appointed by the Data Controller responsible for overseeing data protection practices.

Data Processor – a subject (company, institution) processing a data on behalf of the controller. It can be an online CRM app or company storing data in the cloud.

Data Authority – a public institution monitoring implementation of the regulations in the specific EU member country.

Extra-Territorial Scope

Any VoIP service provider may feel that since they are not based out of EU such as officially headquartered in the Asia Pacific or US region they may not be legally binding to GDPR. However, GDPR expands the territorial and material scope of EU data protection law.  It applies to both controllers and processors established in the EU, and those outside the EU, who offer goods or services to or monitor EU data subject.

VoIP service providers as Data Processors

A processor is a “person, public authority, agency or other body which processes personal data on behalf of the controller”.
Most VoIP service providers are multinational in nature with services offered directly or indirectly to all regions. The GDPR imposes direct statutory obligations on data processors, which means they will be subject to direct enforcement by supervisory authorities, fines, and compensation claims by data subjects. However, a processor’s liability will be limited to the extent that it has not complied with it’s statutory and contractual obligations.

Data minimization – It is now a good practise to store and process as less user’s personal data as necessary to render our services effectively. Also to maintain data for only a stipulated time ( approx 90 days of CDR for call details and logs )

Record Keeping, Accountability and governance

To show compliance with GDPR, a service provider maintain detailed records of processing activities. Also, they must implement technological and organisational measures to ensure, and be able to demonstrate, that processing is performed in accordance with the GDPR. Some ways to apply these are :

  • Contracts: putting written contracts in place with organisations that process personal data on your behalf
  • maintaining documentation of your processing activities
  • Organisational policies focus on Data protection by design and default – two-factor auth, strong passwords to guard against brute-force, encryption, focus on security in architecture
  • Risk analysis and impact assessments: for uses of personal data that are likely to result in a high risk to individuals’ interests
  • Audit by Data protection officer
  • Clear Codes of conduct
  • Certifications

As for a VOIP landscape thankfully every call or message session is followed by a CDR ( Calld Detail Record ) or MDR ( Message Detail Record).

Additionally, assign a unique signature to every data-access client the VoIP system and log every read/write operation carried out on data stores whether persistent datastores or system caches.

Privacy Notices to Subjects

User profile data such as :

  • Basic identity information, name, address and ID numbers
  • Web data such as location, IP address, cookie data and RFID tags
  • Health and genetic data
  • Bio-metric data
  • Racial or ethnic data
  • Political opinions
  • Sexual orientation

is protected strictly under GDPR rules

A service provider should provide indepth information to data subjects when collecting their personal data, to ensure fairness and transparency. They must provide the information in an easily accessible form, using clear and plain language.


The GDPR introduces a higher bar for relying on consent , requiring clear affirmative action. Silence, pre ticked boxes or inactivity will not be sufficient to constitute consent. Data subjects can withdraw their consent at any time, and it must be easy for them to do so.

Lawful basis for processing Data now include

In Article 6 of the GDPR , there are six available lawful bases for processing.

(a) Consent: the individual has given clear consent for you to process their personal data for a specific purpose.

(b) Contract: the processing is necessary for a contract you have with the individual, or because they have asked you to take specific steps before entering into a contract.

(c) Legal obligation: the processing is necessary for you to comply with the law (not including contractual obligations).

(d) Vital interests: the processing is necessary to protect someone’s life.

(e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law.

(f) Legitimate interests: the processing is necessary for your legitimate interests or the legitimate interests of a third party, unless there is a good reason to protect the individual’s personal data which overrides those legitimate interests.

File such as PCAPS , Recordings and transcripts of calls hold sensitive information from end users , these should be encryoted and inaccssible to even the dev teams within the org without explicit consent of end user .

Individuals’ Rights

The GDPR provides individuals with new and enhanced rights to Data subjects who will have more control over the processing of their personal data. A data subject access request can only be refused if it is manifestly unfounded or excessive, in particular because of its repetitive character.

Rights of Data Subjets include

  • Right of Access
  • Right to Rectification
  • Right to Be Forgotten
  • Right to Restriction of Processing
  • Right to Data Portability
  • Right to Object
  • Right to Object to Automated Decisionmaking

For a VoIP service provider if a user opts for redaction then none of his calls or messages should be traced in logs . Also replace distinguishable end user identifier such as phone number and sip uri with *** charecters

Provide option for “Account Deletion” and purge account – If a user wished to close his/her account , his/her detaisl should be deleted form the sustem except for the bare bones detaisl which are otherwise required for legal , taxation and accounting requirnments

Breach Notification

A controller is a “person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of processing of personal data”,

A controller will have a mandatory obligation to notify his supervisory authority of a data breach within 72 hours unless the breach is unlikely to result in a risk to the rights of data subjects. Will also have to notify affected data subjects where the breach is likely to result in a “high risk” to their rights. A processor, however, will only be obliged to report data breaches to controllers

International Data Transfers

Data transfers to countries outside the EEA(European Economic Area) continue to be prohibited unless that country ensures an adequate level of protection. The GDPR retains existing transfer mechanisms and provides for additional mechanisms, including approved codes of conduct and certification schemes.

The GDPR prohibits any non-EU court, tribunal or regulator from ordering the disclosure of personal data from EU companies unless it requests such disclosure under an international agreement, such as a mutual legal assistance treaty.

One of the biggest challenges for a service provider is the identification & categorization of GDPR impacted data sets in disparate locations across the enterprise. A dev team must flag tables, attributes and other data objects that are categorically covered under GDPR regulations and then ensure that they are not transferred to a server outside of EU.

In the present age of Virtual shared server instance, cloud computing and VoIP protocol it is operational a very tough task for a communication service provider to ensure that data is not transferred outside of EU such as a VoIP call from origination in US and destination in EU will require information exchanges via SDP, vcard , RTP stream via media proxies etc.


The GDPR provides supervisory authorities with wide-ranging powers to enforce compliance, including the power to impose significant fines. You will face fines of up to €20m or 4% of your total worldwide annual turnover of the preceding financial year. In addition, data subjects can sue you for pecuniary or non-pecuniary damages (i.e. distress). Supervisory authorities will have a discretion as to whether to impose a fine and the level of that fine.

Data Protection officer (DPO)

Under the terms of GDPR, an organisation must appoint a Data Protection Officer (DPO) if it carries out large-scale processing of special categories of data, carries out large scale monitoring of individuals such as behaviour tracking or is a public authority.

Reference :

Media Architecture , RTP topologies

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Point to Point

Endpoints communicating over unicast
RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session.

Advantages of P2p Disadvantages of p2p
  • Facilitates private communication between the parties
  • Only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports
  • Point to Point via Middlebox

    Same as above but with a middle-box involved


    mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion
    does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

    Subtypes of Multibox :

    Transport/Relay Anchoring

    Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server

    Middleboxes for auditing or privacy control of participant’s IP

    Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup

    Transport translator

    interconnecting networks like multicast to unicast

    media packetization to allow other media to connect to the session like non-RTP protocols

    Media translator

    modified the media inside of RTP streams commonly known as transcoding

    can do up to full encoding/decoding of RTP streams

    in many cases it can also act on behalf of non-RTP supported endpoints , receiving and responding to feedback reports ad performing FEC ( forward error corrected )

    Back-To-Back RTP Session

    Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions.

    Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs

    Advantages of Back-To-Back RTP SessionDisadvantages of Back-To-Back RTP Session
    B2BUA / media bridge take responsibility tpo relay and manages congestion
  • It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations
  • Point to Point using Multicast

    Any-Source Multicast (ASM)

    traffic from any particpant sent to the multicat group address reaches all other partcipants

    Source-Specific Multicast (SSM)

    Selective Sender stream to the multicast group which streams it to the recibers

    Point to Multipoint using Mesh

    many unicast RTP streams making a mesh

    Point to Multipoint + Translator

    Some more variants of this topology are Point to Multipoint with Mixer

    Media Mixing Mixer

    receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through

    static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

    Media Switching Mixer

    RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

    The Mixer can reduce bitrate or switch between sources like active speakers.

    SFU ( Selective Forwarding Unit)

    Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.

    Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

    Advantges of SFUDisadvatages of SFU
    Low lanetncy and low jitter buffer requirnment by avoiding re encondingunable to manage network and control bitrate

    On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.

    Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.

    Other Hybrid Topologies

    There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan

    Hybrid model

    Some endpoints receive forwarded streams while others receive mixed/composited streams.

    Serverless models

    Centralized topology in which one endpoint serves as an MCU or SFU.

    Used by Jitsi and Skype

    Point to Multipoint Using Video-Switching MCUs

    Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.

    This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP

    To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains

    Cascaded SFUs

    SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources

    Transport Protocols

    Before getting into an in-depth discussion of all possible types of Media Architectures in VoIP systems, let us learn about TCP vs UDP

    TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.

    Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction is used due to its tunnelling methods where packets of other protocols are encapsulated inside of UDP packet. However to provide E2E security other methods for Auth and encryption are used.

    Audio PCAP storage and Privacy constraints for Media Servers

    A Call session produces various traces for offtime monitoring and analysis which can include

    CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

    Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

    Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

    Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

    Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

    Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

    References :

    To learn about the difference between Media Server tologies

    • centralized vs decentralised,
    • SFU vs MCU ,
    • multicast vs unicast ,

    Read – SIP conferecning and Media Bridge

    SIP conferencing and Media Bridges

    SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

    To read more about buildinga scalable VoIP Server Side architecture and

    • Clustering the Servers with common cache for High availiability and prompt failure recovery
    • Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
    • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
    • Containerization and Autoscalling

    Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    I have been contemplating points that make for a successful developer to develop solutions and services for a Telecom Application Server. The trend has shown many variations from pure IN programs like VPN, Prepaid billing logic to SIP servlets for call parking, call completion. From SIP servlets to JAISNLEE open standard-based communication. Read about Introduction … Continue reading VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    WebRTC Audio/Video Codecs

    Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

    As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

    RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.

    RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

    In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

    Quick links : If you are new to WebRTC read : Introduction to WebRTC is at and Layers of WebRTC at

    WebRTC Media Stack

    Media Stream Trcaks in WebRTC

    The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

    The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

    Media Flow in VoIP system
    Media Flow in WebRTC Call


    Video Capture insync with hardware’s capabilities

    WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

    Video Capture Resolution

    Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

    Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

    SDP attributes for resolution, frame rate, and bitrate

    SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

    Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

    Dynamic FPS control based on actual hardware encoding :

    video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

    Stream Orientation

    support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer


    WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

    WebRTC Browsers MUST implement the VP8 video codec as described in
    RFC6386] and H.264 Constrained Baseline as described in [H264].

    RFC 7442 WebRTC Video Codec and Processing Requirements

    most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding.

    Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

    m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123
    a=rtpmap:96 VP8/90000
    a=rtcp-fb:96 goog-remb
    a=rtcp-fb:96 transport-cc
    a=rtcp-fb:96 ccm fir
    a=rtcp-fb:96 nack
    a=rtcp-fb:96 nack pli
    a=rtpmap:97 rtx/90000
    a=fmtp:97 apt=96
    a=rtpmap:98 VP9/90000
    a=rtcp-fb:98 goog-remb
    a=rtcp-fb:98 transport-cc
    a=rtcp-fb:98 ccm fir
    a=rtcp-fb:98 nack
    a=rtcp-fb:98 nack pli
    a=fmtp:98 profile-id=0
    a=rtpmap:99 rtx/90000
    a=fmtp:99 apt=98
    a=rtpmap:100 VP9/90000
    a=rtcp-fb:100 goog-remb
    a=rtcp-fb:100 transport-cc
    a=rtcp-fb:100 ccm fir
    a=rtcp-fb:100 nack
    a=rtcp-fb:100 nack pli
    a=fmtp:100 profile-id=2
    a=rtpmap:101 rtx/90000
    a=fmtp:101 apt=100
    a=rtpmap:102 H264/90000
    a=rtcp-fb:102 goog-remb
    a=rtcp-fb:102 transport-cc
    a=rtcp-fb:102 ccm fir
    a=rtcp-fb:102 nack
    a=rtcp-fb:102 nack pli
    a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
    a=rtpmap:122 rtx/90000
    a=fmtp:122 apt=102
    a=rtpmap:127 H264/90000
    a=rtcp-fb:127 goog-remb
    a=rtcp-fb:127 transport-cc
    a=rtcp-fb:127 ccm fir
    a=rtcp-fb:127 nack
    a=rtcp-fb:127 nack pli
    a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
    a=rtpmap:121 rtx/90000
    a=fmtp:121 apt=127
    a=rtpmap:125 H264/90000
    a=rtcp-fb:125 goog-remb
    a=rtcp-fb:125 transport-cc
    a=rtcp-fb:125 ccm fir
    a=rtcp-fb:125 nack
    a=rtcp-fb:125 nack pli
    a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
    a=rtpmap:107 rtx/90000
    a=fmtp:107 apt=125
    a=rtpmap:108 H264/90000
    a=rtcp-fb:108 goog-remb
    a=rtcp-fb:108 transport-cc
    a=rtcp-fb:108 ccm fir
    a=rtcp-fb:108 nack
    a=rtcp-fb:108 nack pli
    a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
    a=rtpmap:109 rtx/90000
    a=fmtp:109 apt=108
    a=rtpmap:124 red/90000
    a=rtpmap:120 rtx/90000
    a=fmtp:120 apt=124


    Developed by on2 and then acquired and opensource by google . Now free of royality fees.

    Supported conatiner – 3GP, Ogg, WebM

    No limit on frame rate or data rate and provides maximum resolution of 16384×16384 pixels.

    libvpx encoder library.

    VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.
    encode and decode pixels with an implied 1:1 (square) aspect ratio.

    supported simulcast


    Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates .

    Open and free of royalties and any other licensing requirements
    Its supported Containers are – MP4, Ogg, WebM

    H264/AVC constrained

    AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC

    Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

    Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

    Other profiles , which are not supporedt are Baseline (BP) , Extended (XP), Main (MP) , High (HiP) , Progressive High (ProHiP) , High 10 (Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

    Its supported containers are 3GP, MP4, WebM

    Parameter settings:

    • packetization-mode
    • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
    • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
    • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

    It is a propertiary , patented codec , mianted by MPEG / ITU

    AV1 (AOMedia Video 1)

    open format designed by the Alliance for Open Media
    royality free
    especially designed for internet video HTML element and WebRTC
    higher data compression rates than VP9 and H.265/HEVC

    offers 3 profiles in increasing support for color depths and chroma subsampling.
    high, and

    supports HDR
    supports Varible Frame Rate

    Supported container are ISOBMFF, MPEG-TS, MP4, WebM

    Stats for Video based media stream track

    timestamp 04/05/2020, 14:25:59
    ssrc 3929649593
    isRemote false
    mediaType video
    kind video
    trackId RTCMediaStreamTrack_sender_2
    transportId RTCTransport_0_1
    codecId RTCCodec_1_Outbound_96
    [codec] VP8 (payloadType: 96)
    firCount 0
    pliCount 9
    nackCount 476
    qpSum 912936
    [qpSum/framesEncoded] 32.86666666666667
    mediaSourceId RTCVideoSource_2
    packetsSent 333664
    [packetsSent/s] 29.021823604499957
    retransmittedPacketsSent 0
    bytesSent 342640589
    [bytesSent/s] 3685.7715977714947
    headerBytesSent 8157584
    retransmittedBytesSent 0
    framesEncoded 52837
    [framesEncoded/s] 30.022576142586164
    keyFramesEncoded 31
    totalEncodeTime 438.752
    [totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
    totalEncodedBytesTarget 335009905
    [totalEncodedBytesTarget/s] 3602.7091371103397
    totalPacketSendDelay 20872.8
    [totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
    qualityLimitationReason bandwidth
    qualityLimitationResolutionChanges 20
    encoderImplementation libvpx
    Graph for Video Track in chrome://webrtc-internals

    Other RTP parameters

    RTX(regtranmission ) – packet loss recovery technique for real-time applications with relaxed delay bounds.

    Non WebRTC supported Video codecs

    Need active realtime media transcoding


    Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.
    suited for low bandwidth networks
    Although it is not comaptible with WebRTC but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

    H.265 / HEVC

    proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

    Container – Mp4

    Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

    With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

    The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

    Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints


    Audio Level

    audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

    normalization considering frequencies above 300 Hz, regardless of the sampling rate used.

    adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

    GAIN calculation

    • If the endpoint has control over the entire audio-capture path like a regular phone
      the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
    • If the endpoint does not have control over the entire audio capture like software endpoint
      then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
    • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

    Acoustic Echo Cancellation (AEC)

    Endpoints shoudl allow echo control mechsnisms


    WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

    Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

    m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

    a=rtpmap:111 opus/48000/2
    a=rtcp-fb:111 transport-cc
    a=fmtp:111 minptime=10;useinbandfec=1
    a=rtpmap:103 ISAC/16000
    a=rtpmap:104 ISAC/32000
    a=rtpmap:9 G722/8000
    a=rtpmap:0 PCMU/8000
    a=rtpmap:8 PCMA/8000
    a=rtpmap:106 CN/32000
    a=rtpmap:105 CN/16000
    a=rtpmap:13 CN/8000
    a=rtpmap:110 telephone-event/48000
    a=rtpmap:112 telephone-event/32000
    a=rtpmap:113 telephone-event/16000
    a=rtpmap:126 telephone-event/8000


    stabdardised by IETF

    container- Ogg, WebM, MPEG-TS, MP4

    supportes multiple comptression algorithms

    For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommenda that Opus be offered before PCMA/PCMU.

    AAC (Advanvced Audio Encoding)

    part of the MPEG-4 (H.264) standard
    supported congainers – MP4, ADTS, 3GP

    Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

    G.711 (PCMA and PCMU)

    ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
    vital to interface with the standard teelcom network and carriers

    Fixed 64Kbpd bit rate

    supports 3GP container formats

    G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU


    ncoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression
    conatiners used 3GP, AMR-WB

    Comfort noise (CN)

    artificial background noise which is used to fill gaps in a transmission instead of using pure silence

    avoids jarring or RTP Timeout

    for streams encoded with G.711 or any other supported codec that does not provide its own CN.
    Use of Discontinuous Transmission (DTX) / CN by senders is optional

    Internet Low Bitrate Codec (iLBC)

    opensource narrow band codec
    designed specifically for streaming voice audio

    Internet Speech Audio Codec (iSAC)

    designed for voice transmissions which are encapsulated within an RTP stream.

    DTMF and ‘audio/telephone-event’ media type

    endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

    DTMF events list
    | 0 | DTMF digit “0”
    | 1 | DTMF digit “1”
    | 2 | DTMF digit “2”
    | 3 | DTMF digit “3”
    | 4 | DTMF digit “4”
    | 5 | DTMF digit “5”
    | 6 | DTMF digit “6”
    | 7 | DTMF digit “7”
    | 8 | DTMF digit “8”
    | 9 | DTMF digit “9”
    | 10 | DTMF digit “*”
    | 11 | DTMF digit “#”
    | 12 | DTMF digit “A”
    | 13 | DTMF digit “B”
    | 14 | DTMF digit “C”
    | 15 | DTMF digit “D”

    Stats for Audio Media track

    timestamp 04/05/2020, 14:25:59
    ssrc 3005719707
    isRemote fals
    mediaType audio
    kind audio
    trackId RTCMediaStreamTrack_sender_1
    transportId RTCTransport_0_1
    codecId RTCCodec_0_Outbound_111
    [codec] opus (payloadType: 111)
    mediaSourceId RTCAudioSource_1
    packetsSent 88277
    [packetsSent/s] 50.03762690431027
    retransmittedPacketsSent 0
    bytesSent 1977974
    [bytesSent/s] 150.11288071293083
    headerBytesSent 2118648
    retransmittedBytesSent 0
    Graphs in chrome://webrtc-internals for Audio


    m=application 9 UDP/DTLS/SCTP webrtc-datachannel
    c=IN IP4
    a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE

    Stats for Datachannel

    Statistics RTCDataChannel_1
    timestamp 04/05/2020, 14:25:59
    label sctp
    datachannelid 1
    state open
    messagesSent 1
    [messagesSent/s] 0
    bytesSent 228
    [bytesSent/s] 0
    messagesReceived 1
    [messagesReceived/s] 0
    bytesReceived 228
    [bytesReceived/s] 0

    Refrenecs :

    Attacks on SIP Networks

    Major standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling protocol for services such as LTE , VoIP, conferencing, Video on Demand (VoD), IPTV (Internet Television), presence, and Instant Messaging (IM) etc. With the continous evolution of SIP as the defacto VoIP protocol , we need to underatdn the risk mitigartion practices around it .

    I have written about VoIP and security in these blogs before

    For Security around web browser based calling via webrtc i have written

    • Webrtc Security , which describes browser threat modal , access to local resource , Same Orogin Policy (SOP) and Cross Resource Sharing ( CORS) as well as Location sharing , ICE , TUEN and threats to privacy with screen sharing , microgone camera long term access and probable mid call attacks .
    • Genric secrutity of web Application build around hosting platform of webrtc. Includs concepts like Identity management , browser security – cross site security amd clickjacking , Authetication of devices and applications , Media Encryption and regex checking.

    Also Written about VoIP security at protocol level with SRTP /DTLS using TLS and specifically using available pre added security modules on kamailio SIP server. It describes Sanity checks , ACL lists with permissions , hiding topology details , countering Flood using pike and Fail2Ban as well as Traffic monitoring and detection .

    In this article we will cover types of attacks on SIP systems

    Types of attacks on SIP based systems

    Registration Hijacking

    malicious registrations on registrar by a third party who modifies From header field of a SIP request.

    exmaple implementation :
    attacker de-registers all existing contacts for a URI
    attacker can also register their own device as the appropriate contact address, thereby directing all requests for the affected user to him

    solution – Autheticaion of user

    Impersonating a Server

    attacker impersonates the remote server
    user’s request can now be intercepted by some other party
    user’s request may be forwarded to insecure locations

    Solution –

    confidentiality, integrity, and authentication of proxy servers

    Proxy/redirect sever, and registrars SHOULD possess a site certificate issued by CA which could be validated by UA

    Temparing Message bodies

    If users are relying on SIP message bodies to communicate either of

    • session encryption keys for a media session
    • MIME bodies
    • SDP
    • encapsulated telephony signals
      Then the atackers on proxy server can modify the session key or can act as a man-in-the-middle and do eaves droppng

    exmaple implementation :
    attacker can point RTP media streams to a wiretapping device
    can changes Subject header field to appear to users as spam

    solution – end to end ecryption over TLS + Digest Authorization

    mid-session threats like tearing down session

    Request forging
    attacker learns the params of the session like To , From tags etc then he can alter ongoing session parameters and even bring it down

    example implementation :
    attacker inserts a BYE in a ongoing session thereby tearing it down
    can insert re INVITE and redierct the stream to wiretaping device

    solution – authetication on every request
    signing and encrypting of MIME bodies, and transference of credentials with S/MIME

    Denial of Service and Amplification

    DOS attacks – rendering a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.
    dDOS – multiple network hosts to flood a target host with a large amount of network traffic.

    Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

    exmaple implementation :
    attackers creates a falsified source IP address and a corresponding Via header field that identify a targeted host as the originator of the request. Then send this to large number of SIP network element . This geneerates DOS aimed at target.

    attackers uses falsified Route header field values in a request that identify the target host and then send such messages to forking proxies that will amplify messaging sent to the target.

    Flooding with register attacks can deplete available memory and disk resources of a registrar by registering huge numbers of bindings.
    Flooding a stateful proxy server causes it to consume computational expense associated with processing a SIP transaction

    Solution –
    detect flooding and pike in traffic and use ipban to block
    challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required), forgoing the normal response retransmission algorithm, and thus behaving statelessly towards unauthenticated requests.

    Security mchanisms

    Full encryption vs hop by hop encrption

    SIP mssages cannot be encrypted end-to-end in their entirety since
    message fields such as the Request-URI, Route, and Via need to be visible to proxies in most network architectures
    so that SIP requests are routed correctly.
    proxy servers need to also update the message with via headers

    Thus SIP uses low level security along with hop by hop encrption and auth headers to verify the identity of proxy servers

    Transport and Network Layer Security

    IPsec – used where set of hosts or administrative domains have an existing trust relationship with one another.

    TLS – used where hop-by-hop security is required between hosts with no pre-existing trust association.

    SIPS URI Scheme

    Used as an address-of-record for a particular user, signifies that each hop over which the request is forwarded, must be secured with TLS

    HTTP Authentication

    Reuse of the HTTP Digest authentication via 401 and 407 response codes that implement challenge for autehtication
    provides replay protection and one-way authentication.


    allows SIP UAs to encrypt MIME bodies within SIP, securing these bodies end-to-end without affecting message headers.
    provides end-to-end confidentiality and integrity for message bodies


    provides replay protection

    SIP over TLS

    SIP messages can be secured using TLS. There is also TLS for Datagrams called DTLS.

    Security of SIP signalling is different from security of protocols used in concert with SIP like RTP , RTCP. and that will be covered in later topics of this article.

    TLS operation consists of two phases: handshake phase and bulk data encryption phase

    Handshake phase

    Prepare algorithm to be used during TLS session

    Server Authentication

    server sends its certificate to the client, which then verifies the certificate using a certificate authority’s (CA’s) public key.

    Client Authentication

    Server sends an additional CertificateRequest message to request the client’s certificate. The client responds with

    1. Certificate message containing the client certificate with the client public key and
    2. CertificateVerify message containing a digest signature of the handshake messages signed by clients private key

    Server authenticates client by client’s public key , since only client holding correct private key can sign the message.

    prepare the shared secret for bulk data encryption

    client generate a pre_master_secret, and encrypt it using the server’s public key obtained from the server’s certificate. The server decrypts the pre_master_secret using its own private key.
    Both the server and client then compute a master_secret they share based on the same pre_master_secret. The master_secret is further used to generate the shared symmetric keys for bulk data encryption and message authentication

    Public key cryptographic operations such as RSA are much more expensive than shared key cryptography. This is why TLS uses public key cryptography to establish the shared secret key in the handshake phase, and then uses symmetric key cryptography with the negotiated shared secret as the data encryption key.

    Stateless proxy servers do not maintain state information about the SIP session and therefore tend to be more scalable. However, many standard application functionalities, such as authentication, authorization, accounting, and call forking require the proxy server to operate in a stateful
    mode by keeping different levels of session state information.

    Steps :

    1. The SIP proxy server enforces proxy authentication with
      407 Proxy Authentication Required challenge.
    2. UAC provides credentials that verify its claimed identity (e.g., based on MD5 [34] digest algorithm) and retransmits in authorization header

    Security of RTP

    confidentiality protection of the RTP session and integrity protection of the RTP/RTCP packets requires source authentication of all the packets to ensure no man-in-the-middle (MITM) attack is taking place.

    end to end media encryption – SRTP ( Secure RTP )

    encodes the voice into encrypted IP packages and transport those via the internet from the transmitter  to receive


    • The Impact of TLS on SIP Server Performance – Charles Shen† Erich Nahum‡ Henning Schulzrinne† Charles Wright , Department of Computer Science, Columbia University,IBM T.J. Watson Research Center

    HTTP/2 – offer/answer signaling for WebRTC call

    HTTP ( Hyper Text Transfer Protocol ) is the top application layer protocol atop the Tarnsport layer ( TCP ) and the Network layer ( IP )


    release in 1997. Since HTTP/1 allowed only 1 req at a time , HTTP/1.1

    Allows one one outstanding connection on a TCP session but allowed request pieplinig to achieve concurency.


    In 2015, HTTP/2 was released which aimed at reducing latency while delivering heavy graphics, videos and other media cpmponents on web page especially on mobile sites .
    optimizes server push and service workers


    A key differenet between Http/1.1 and HTTP/2 is the fact that former transmites requests and reponses in plaintext whereas the later encapsulates them into binary format , proving more features and scope for optimzation.

    Thus at protocol level , it is all about frames of bytes which are part of stream.

    “enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.”

    Hypertext Transfer Protocol Version 2 (HTTP/2) draft-ietf-httpbis-http2-latest

    It is important to know that Browsers only implement HTTP/2 under HTTPS, thus TLS connection is must for whichw e need certs ad keys signed by CA ( either self signed using openssl , signed by public CA like godaddy , verisign or letsencrypt)

    Compatibility Layer between HTTP1.1 and HTTP2.0 in node

    Nodejs >9 provides http2 as native module. Exmaple of using http2 with compatibility layer

    const http2 = require('http2');
    const options = {
     key: 'ss/key', // path to key
     cert: 'ssl/cert' // path to cert
    const server = http2.createSecureServer(options, (req, res) => {
        req.addListener('end', function () {
            file.serve(req, res);

    in replacement for existing server http/https server

    const https = require('https');
    app = https.createServer(options, function (request, response) {
        request.addListener('end', function () {
            file.serve(request, response);
 Websocket over HTTP2

    The WebSocket Protocol uses the HTTP/1.1 Upgrade mechanism to transition a TCP connection from HTTP into a WebSocket connection

    Due to its multiplexing nature, HTTP/2 does not allow connection-wide header fields or status codes, such as the Upgrade and Connection request-header fields or the 101 (Switching Protocols) response code. These are all required for opening handshake.

    Ideally the code shouldve looekd like this with backward compatiability layer , but continue reading update ..

    var app = http2
        .createSecureServer(options, (req, res) => {
            req.addListener('end', function () {
                file.serve(req, res);
    var io = require('').listen(app);'*:*');
    io.on('connection', onConnection); // evenet handler onconnection

    Error during WebSocket handshake: Unexpected response code: 403

    update May 2020 : I tried using the http2 server with websocket like mentioned above ,h owever many many hours of working around WSS over HTTP2 secure server , I consistencly kept faccing the ECONNRESET issues after couple of seconds , which would crash the server

    client 403
    server ECONNRESET

    Therefore leaving the web server to server htmll conetnt I reverted the siganlling back to HTTPs/1.1 given the reasons for sticking with WSS is low latency and existing work that was already put in.

    Example Repo :

    Reading Further of exploring HTTP CONNECT methods for setting WS handshake . Will update this section in future if it works .


    A “stream” is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection.
    A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.

    Core http2 module provides new core API (Http2Stream), accessed via a “stream” listener:

    const http2 = require('http2');
    const options = {
     key: 'ss/key', // path to key
     cert: 'ssl/cert' // path to cert
    const server = http2.createSecureServer(options, (stream, headers) => {
        stream.respond({ ':status': 200 });
        stream.end('some text!');

    Other features

    • stream multiplexing
    • stream Prioritization
    • header compression
    • Flow Control
    • support for trailer

    Persistent , one connection per origin.

    With the new binary framing mechanism in place, HTTP/2 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP/2 connections are persistent, and only one connection per origin is required,

    Server Push

    bundling multiple assets and resources into a single HTTP/2  and lets the srever proactively push resources to client’s cache .

    Server issues PUSH_PROMISE , client validates whether it needs the resource of not. If the client matches it then they will load like regular GET call

    The PUSH_PROMISE frame includes a header block that contains a complete set of request header fields that the server attributes to the request.

    After sending the PUSH_PROMISE frame, the server can begin delivering the pushed response as a response on a server-initiated stream that uses the promised stream identifier.

    Client receives a PUSH_PROMISE frame and can either chooses to accept the pushed response or if it does not wish to receive the pushed response from the server it can can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream’s identifier.

    Push Stream Support


    respondWithFile() and respondWithFD() APIs can send raw file data that bypasses the Streams API.

    Related technologies


    Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.

    Email messages + MIME : transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

    MIME in HTTP in WWW : servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.

    MIME header fields

    MIME version

    MIME-Version: 1.0

    Content Type

    Content-Type: text/plain

    multipart/mixed , text/html, image/jpeg, audio/mp3, video/mp4, and application/msword

    content disposition

    Content-Disposition: attachment; filename=genome.jpeg;
      modification-date="Wed, 12 Feb 1997 16:29:51 -0500";


    References :

    Certificates, compliances and Security in VoIP

    This article describes various Certificates and compliances, Bill and Acts on data privacy, Security and prevention of Robocalls as adopted by countries around the world pertaining to Interconnected VoIP providers, telecommunications services, wireless telephone companies etc

    Compliance certificates by Industry types

    HIPAA (Health Insurance Portability and Accountability Act)

    Deals with privacy and security of personal medical records and electronic health care transaction

    Applicability  : If voip company handles medical information

    Includes : 

    • Not allowed Voice mail transcription
    • Should have End-to-End Encryption
    • Restrict  using unsecured WiFi networks to prevent Snooping
    • User security , strong password rules  and mandatory monthly change
    • Secure Firmware on VoIP phones
    • Maintaining Call and Access Logs

    SOX( Sarbanes Oxley Act of 2002)

    Also known as SOX, SarbOX or Public Company Accounting Reform and Investor Protection Act

    Applicability : if managing the communications operations of a regulated, publicly traded company 

    Includes : 

    • Retain records which include financial and other sensitive data
    • ways employees are provided or denied access to records or data based on their roles and responsibilities
    • do information audit by a trusted third party. 
    • Retention and deletion of files such as audio files like voicemails, text messages, video clips, declared paper records, storage, and logs of communications activities
    • Physical and digital security controls around cloud-based VoIP applications and the networks

    Privacy Related Compliance certificates

    COPPA (Children’s Online Privacy Protection Act ) of 1998 

    prohibits deceptive marketing to children under the age of 13, or collecting personal information without disclosure to their parents. 

    any information is to be passed on to a third party, must be easy for the child’s guardian to review and/or protect

    2011 amendment  requires that the data collected was erased after a period of time,

    2014 FTC issued guidelines that apps and app stores require “verifiable parental consent.”

    CPNI (Customer Proprietary Network Information) 2007

    CPNI (Customer Proprietary Network Information) in united states is the information that communication providers  acquire about their subscribers. This Individually identifiable information that is created by a customer’s relationship with a provider, such as data about the frequency, duration, and timing of calls, the information on a customer’s bill, and call identifying information. This processing information is governed strictly by FCC and certification should be renewed on an annual basis

    Provider can pass along that information to marketers to sell other services, as long as the customer is notified

    In 2007, the FCC explicitly extended the application of the Commission’s CPNI rules of the Telecommunications Act of 1996 to providers of interconnected VoIP service.


    Communications Assistance for Law Enforcement Act (CALEA) conduct electronic surveillance by imposing specific obligations on “telecommunications carriers” for assisting law enforcement, including delivering call interception and call identification functionality to the government with a minimum of interference to customer service and privacy.

    Read more about CALEA and its roles in VoIP here Regulatory and Legal Considerations with WebRTC development

    GDPR (General Data Protection Regulation)  in European Union 2018

    Supersedes the 1995 Data Protection Directive

    Establishes requirements of organizations that process data, defines the rights of individuals to manage their data, and outlines penalties for those who violate these rights.

    No personal data may be processed unless this processing is done under one of six lawful bases specified by the regulation (consent, contract, public task, vital interest, legitimate interest or legal requirement). When the processing is based on consent the data subject has the right to revoke it at any time.

    Controllers must notify Supervising Authorities (SA)s of a personal data breach within 72 hours of learning of the breach.

    California Consumer Privacy Act (CCPA) 2019

    consumer rights relating to the access to, deletion of, and sharing of personal information that is collected by businesses. 

    Allows consumers to know whether their personal data is sold or disclosed , to whom .

    Allows opt-out right for sales of personal information

    Right to deletion – to request a business to delete any personal information about a consumer collected from that consumer

    Personal Data Protection Bill (PDP) – India 2018

    This bill introduces various private and sensitive protection frameworks  like restriction on retention of personal data, Right to correction and erasure (such as right to be forgotten) , Prohibition and transparency of processing of personal data. It also classifies data fiduciaries  including certain social media intermediaries. 

    The Bill amends the Information Technology Act, 2000 to delete the provisions related to compensation payable by companies for failure to protect personal data.

    Other data privacy acts similar to GDPR 

    • South Korea’s Personal Information Protection Act  2011
    • Brazil’s Lei Geral de Proteçao de Dados (LGPD)  2020
    • Privacy Amendment (Notifiable Data Breaches) to Australia’s Privacy Act 2018
    • Japan’s Act on Protection of Personal Information 2017
    • Thailand Personal Data Protection Act (PDPA) 2020

    Features offered by VOIP companies for Data privacy 

    • Access Control & Logging
    • Auto Data Redaction / Account Deletion policy 
    • SIEM (Security information and event management) alerts 
    • Information security , Encrypted Storage For Recordings & Transcripts
    • Disclosing all third party services that are involved in data processing too
    • Role Based Access Control and 2 Factor Authentication
    • Data Security Audits and appointing  data protection officer to oversee GDPR compliance

    Against Robocalls and SPIT ( SPAM over Internet Telephony)

     2009 Truth in Caller ID Act 

    Telephone Consumer Protection Act of 1991

    Implementation of Do not call registry against use of robocalls, automatic dialers, and other methods of communication

    Do-Not-Call Implementation Act of 2003

    if a business has an established relationship with a customer, it can continue to call them for up to 18 months. If a consumer calls the company, say, to ask for information about the product or service, the company has three months to get back to him.

    if the customer asks to not receive calls, the company must stop calling, or be subject to fines.

    Exemptions – Calls from a not-for-profit B organisation , informational messages as flight cancellations , Calls from sales and debt collectors etc

    Personal Data Privacy and Security Act 2009

    Implemented to curb  identity theft and computer hacking. Sensitive personal identifiable information includes : victim’s name, social security number, home address, fingerprint/biometrics data, date of birth, and bank account numbers.

    Any company that is breached must notify the affected individuals by mail, telephone, or email, and the message must include information on the company and how to get in touch with credit reporting agencies

    If the breach involves government or national security , company must also contact the Secret Service within fourteen days 

    TRACED Act (Telephone Robocall Abuse Criminal Enforcement and Deterrence) 2019

    Canadian Radio-television and Telecommunications Commission (CRTC) 2018 -32

    A solution mechanism has already been standardised and active in adoption called STIR / SHAKEN ( Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs) described in another article here.

    Emergency services 

    FCC E911 E911 / VoIP E911 rules

    Unlike traditional telephone connections, which are tied to a physical location, VOIP’s packet switched technology allows a particular number to be anywhere making it more difficult for it to reach localised services like emergency numbers of Public Safety Answering Points (PSAPs) . Thus FCC regulations as well as the New and Emerging Technologies 911 Improvement Act of 2008 (NET 911 Act), interconnected VoIP providers are required to provide 911 and E911 service. 

    Ref : 

    WebRTC APIs


    • Media Capture and streams
    • Peer to peer Connection
      • RTCPeerConnection, RTCConfiguration, ICE, Offer/Answer, states
    • RTP Media API
      • RTCRtpSender
      • RTCRtpReceiver
      • RTCRtpTransreciver
      • SDP Semantics
    • RTCDTLS Transport
    • RTCIceCandidate
      • ICE gathering
    • RTCIceTransport Interface
    • Peer-to-peer Data API
    • Peer-to-peer DTMF
    • Statistics

    Peer-to-peer connections

    creates p2p communication channel

    RTCConfiguration Dictionary

    dictionary RTCConfiguration {
      sequence<RTCIceServer> iceServers;
      RTCIceTransportPolicy iceTransportPolicy;
      RTCBundlePolicy bundlePolicy;
      RTCRtcpMuxPolicy rtcpMuxPolicy;
      DOMString peerIdentity;
      sequence<RTCCertificate> certificates;
      [EnforceRange] octet iceCandidatePoolSize = 0;

    RTCIceCredentialType Enum

    enum RTCIceCredentialType {

    supports OAuth 2.0 based authentication. The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection setConfiguration method to periodically refresh the TURN credentials.

    RTCOAuthCredential Dictionary

    Describes the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server

    dictionary RTCOAuthCredential {
      required DOMString macKey;
      required DOMString accessToken;

    RTCIceServer Dictionary

    Describes the STUN and TURN servers that can be used by the ICE Agent to establish a connection with a peer.

    dictionary RTCIceServer { required (DOMString or sequence<DOMString>) urls; DOMString username; (DOMString or RTCOAuthCredential) credential; RTCIceCredentialType credentialType = "password"; };

    Example :

     [{urls: ''},
      {urls: ['', ''],
        username: 'user',
        credential: 'myPassword',
        credentialType: 'password'},
      {urls: '',
        username: '22BIjxU93h/IgwEb',
        credential: {
          macKey: 'WmtzanB3ZW9peFhtdm42NzUzNG0=',
          accessToken: 'AAwg3kPHWPfvk9bDFL936wYvkoctMADzQ5VhNDgeMR3+ZlZ35byg972fW8QjpEl7bx91YLBPFsIhsxloWcXPhA=='
        credentialType: 'oauth'}

    RTCIceTransportPolicy Enum

    ICE candidate policy [JSEP] to select candidates for the ICE connectivity checks

    • relay – use only media relay candidates such as candidates passing through a TURN server. It prevents the remote endpoint/unknown caller from learning the user’s IP addresses
    • all – ICE Agent can use any type of candidate when this value is specified.

    RTCBundlePolicy Enum

    • balanced – Gather ICE candidates for each media type (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
    • max-compat – Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
    • max-bundle – Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

    If the value of configuration.bundlePolicy is set and its value differs from the connection’s bundle policy, throw an InvalidModificationError.

    RTCRtcpMuxPolicy Enum

    what ICE candidates are gathered to support non-multiplexed RTCP.

    • negotiate – Gather ICE candidates for both RTP and RTCP candidates. If the remote-endpoint is capable of multiplexing RTCP, multiplex RTCP on the RTP candidates. If it is not, use both the RTP and RTCP candidates separately.
    • require – Gather ICE candidates only for RTP and multiplex RTCP on the RTP candidates. If the remote endpoint is not capable of rtcp-mux, session negotiation will fail.

    If the value of configuration.rtcpMuxPolicy is set and its value differs from the connection’s rtcpMux policy, throw an InvalidModificationError. If the value is “negotiate” and the user agent does not implement non-muxed RTCP, throw a NotSupportedError.

    Offer/Answer Options – VoiceActivityDetection

    dictionary RTCOfferAnswerOptions {
      boolean voiceActivityDetection = true;

    capable of detecting “silence”

    dictionary RTCOfferOptions : RTCOfferAnswerOptions {
      boolean iceRestart = false;
    dictionary RTCAnswerOptions : RTCOfferAnswerOptions {};

    An RTCPeerConnection object has a signaling state, a connection state, an ICE gathering state, and an ICE connection state.

    RTCSignalingState Enum


    RTCIceGatheringState Enum


    RTCIceConnectionState Enum


    An RTCPeerConnection object has an operations chain which ensures that only one asynchronous operation in the chain executes concurrently.

    Also an RTCPeerConnection object MUST not be garbage collected as long as any event can cause an event handler to be triggered on the object. When the object’s internal slot is true ie closed, no such event handler can be triggered and it is therefore safe to garbage collect the object.

    RTCPeerConnection Interface

    interface RTCPeerConnection : EventTarget {
    Promise<RTCSessionDescriptionInit> createOffer(optional RTCOfferOptions options);
    Promise<RTCSessionDescriptionInit> createAnswer(optional RTCAnswerOptions options);
    Promise<void> setLocalDescription(optional RTCSessionDescriptionInit description);
    readonly attribute RTCSessionDescription? localDescription;
    readonly attribute RTCSessionDescription? currentLocalDescription;
    readonly attribute RTCSessionDescription? pendingLocalDescription;
    Promise<void> setRemoteDescription(optional RTCSessionDescriptionInit description);
    readonly attribute RTCSessionDescription? remoteDescription;
    readonly attribute RTCSessionDescription? currentRemoteDescription;
    readonly attribute RTCSessionDescription? pendingRemoteDescription;
    Promise<void> addIceCandidate(optional RTCIceCandidateInit candidate);
    readonly attribute RTCSignalingState signalingState;
    readonly attribute RTCIceGatheringState iceGatheringState;
    readonly attribute RTCIceConnectionState iceConnectionState;
    readonly attribute RTCPeerConnectionState connectionState;
    readonly attribute boolean? canTrickleIceCandidates;
    void restartIce();
    static sequence<RTCIceServer> getDefaultIceServers();
    RTCConfiguration getConfiguration();
    void setConfiguration(RTCConfiguration configuration);
    void close();
    attribute EventHandler onnegotiationneeded;
    attribute EventHandler onicecandidate;
    attribute EventHandler onicecandidateerror;
    attribute EventHandler onsignalingstatechange;
    attribute EventHandler oniceconnectionstatechange;
    attribute EventHandler onicegatheringstatechange;
    attribute EventHandler onconnectionstatechange;

    CreateOffer() – generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including

    • descriptions of the local MediaStreamTracks attached to this RTCPeerConnection,
    • codec/RTP/RTCP capabilities
    • ICE agent (usernameFragment, password , local candiadtes etc )
    • DTLS connection
    var pc = new RTCPeerConnection();
         mandatory: {
            OfferToReceiveAudio: true,
            OfferToReceiveVideo: true
        optional: [{
            VoiceActivityDetection: false
    }).then(function(offer) {
    	return pc.setLocalDescription(offer);
    .then(function() {
    // Send the offer to the remote through signaling server

    CreateAnswer() – generates an SDPanswer with the supported configuration for the session that is compatible with the parameters in the remote configuration

    var pc = new RTCPeerConnection();
      OfferToReceiveAudio: true
      OfferToReceiveVideo: true
    .then(function(answer) {
      return pc.setLocalDescription(answer);
    .then(function() {
      // Send the answer to the remote through signaling server

    Codec preferences of an m= section’s associated transceiver is said to be the value of the RTCRtpTranceiver with the following filtering applied

    • If direction is “sendrecv”, exclude any codecs not included in the intersection of RTCRtpSender.getCapabilities(kind).codecs and RTCRtpReceiver.getCapabilities(kind).codecs.
    • If direction is “sendonly”, exclude any codecs not included in RTCRtpSender.getCapabilities(kind).codecs.
    • If direction is “recvonly”, exclude any codecs not included in RTCRtpReceiver.getCapabilities(kind).codecs.

    Legacy Interface Extensions

    partial interface RTCPeerConnection {
    Promise<void> createOffer(
     RTCSessionDescriptionCallback successCallback,
     RTCPeerConnectionErrorCallback failureCallback,
     optional RTCOfferOptions options);
    Promise<void> setLocalDescription(
    optional RTCSessionDescriptionInit description,
    VoidFunction successCallback,                                  RTCPeerConnectionErrorCallback failureCallback);
    Promise<void> createAnswer(
    RTCSessionDescriptionCallback successCallback,
    RTCPeerConnectionErrorCallback failureCallback);
    Promise<void> setRemoteDescription(
    optional RTCSessionDescriptionInit description,
    VoidFunction successCallback,
                                         RTCPeerConnectionErrorCallback failureCallback);
    Promise<void> addIceCandidate(
    RTCIceCandidateInit candidate,
    VoidFunction successCallback,
    RTCPeerConnectionErrorCallback failureCallback);

    Session Description Model

    enum RTCSdpType {
    interface RTCSessionDescription {
      readonly attribute RTCSdpType type;
      readonly attribute DOMString sdp;
      [Default] object toJSON();
    dictionary RTCSessionDescriptionInit {
      RTCSdpType type;
      DOMString sdp = "";

    Priority and QoS Model which can be


    RTP Media API

    Send and receive MediaStreamTracks over a peer-to-peer connection.
    Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.

    RTCRtpSenders objects manages encoding and transmission of MediaStreamTracks .

    RTCRtpReceivers objects manage the reception and decoding of MediaStreamTracks. These are associated with one track.

    RTCRtpTransceivers interface describes a permanent pairing of an RTCRtpSender and an RTCRtpReceiver.

    Each transceiver is uniquely identified using its mid ( media id) property from the corresponding m-line.

    They are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack(), or explicitly when the application uses the addTransceiver(). They are also created when a remote description is applied that includes a new media description.

    rtpTransceiver = RTCPeerConnection.addTransceiver(trackOrKind, init);

    trackOrKind should be either audio or video othereise a TypeError is thrown

    init is optional – can contain direction , sendEncodings , streams

    RTCPeerConnection Interface

    partial interface RTCPeerConnection {
    sequence<RTCRtpSender> getSenders();
    sequence<RTCRtpReceiver> getReceivers();
    sequence<RTCRtpTransceiver> getTransceivers();
    RTCRtpSender addTrack(MediaStreamTrack track, MediaStream... streams);
    void removeTrack(RTCRtpSender sender);
    RTCRtpTransceiver addTransceiver((MediaStreamTrack or DOMString) trackOrKind, optional RTCRtpTransceiverInit init);
    attribute EventHandler ontrack;


    dictionary RTCRtpTransceiverInit {
      RTCRtpTransceiverDirection direction = "sendrecv";
      sequence<MediaStream> streams = [];
      sequence<RTCRtpEncodingParameters> sendEncodings = [];

    RTCRtpTransceiverDirection can be either of


    RTCRtpSender Interface

    Allows an application to control how a given MediaStreamTrack is encoded and transmitted to a remote peer.

    interface RTCRtpSender {
    readonly attribute MediaStreamTrack? track;
    readonly attribute RTCDtlsTransport? transport;
    readonly attribute RTCDtlsTransport? rtcpTransport;
    static RTCRtpCapabilities? getCapabilities(DOMString kind);
    Promise<void> setParameters(RTCRtpSendParameters parameters);
    RTCRtpSendParameters getParameters();
    Promise<void> replaceTrack(MediaStreamTrack? withTrack);
    void setStreams(MediaStream... streams);
    Promise<RTCStatsReport> getStats();

    RTCRtpParameters Dictionary

    dictionary RTCRtpParameters {
      required sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
      required RTCRtcpParameters rtcp;
      required sequence<RTCRtpCodecParameters> codecs;

    RTCRtpSendParameters Dictionary

    dictionary RTCRtpSendParameters : RTCRtpParameters {
      required DOMString transactionId;
      required sequence<RTCRtpEncodingParameters> encodings;
      RTCDegradationPreference degradationPreference = "balanced";
      RTCPriorityType priority = "low";

    RTCRtpReceiveParameters Dictionary

    dictionary RTCRtpReceiveParameters : RTCRtpParameters {
      required sequence<RTCRtpDecodingParameters> encodings;

    RTCRtpCodingParameters Dictionary

    dictionary RTCRtpCodingParameters {
      DOMString rid;

    RTCRtpDecodingParameters Dictionary

    dictionary RTCRtpDecodingParameters : RTCRtpCodingParameters {};

    RTCRtpEncodingParameters Dictionary

    dictionary RTCRtpEncodingParameters : RTCRtpCodingParameters {
      octet codecPayloadType;
      RTCDtxStatus dtx;
      boolean active = true;
      unsigned long ptime;
      unsigned long maxBitrate;
      double maxFramerate;
      double scaleResolutionDownBy;

    RTCDtxStatus Enum

    disabled- Discontinuous transmission is disabled.
    enabled- Discontinuous transmission is enabled if negotiated.

    RTCDegradationPreference Enum

    enum RTCDegradationPreference {

    RTCRtcpParameters Dictionary

    dictionary RTCRtcpParameters {
      DOMString cname;
      boolean reducedSize;

    RTCRtpHeaderExtensionParameters Dictionary

    dictionary RTCRtpHeaderExtensionParameters {
      required DOMString uri;
      required unsigned short id;
      boolean encrypted = false;

    RTCRtpCodecParameters Dictionary

    dictionary RTCRtpCodecParameters {
      required octet payloadType;
      required DOMString mimeType;
      required unsigned long clockRate;
      unsigned short channels;
      DOMString sdpFmtpLine;

    payloadType – identify this codec.
    mimeType – codec MIME media type/subtype. Valid media types and subtypes are listed in [IANA-RTP-2]
    clockRate – expressed in Hertz
    channels – number of channels (mono=1, stereo=2).
    sdpFmtpLine – “format specific parameters” field from the “a=fmtp” line in the SDP corresponding to the codec

    RTCRtpCapabilities Dictionary

    dictionary RTCRtpCapabilities {
      required sequence<RTCRtpCodecCapability> codecs;
      required sequence<RTCRtpHeaderExtensionCapability> headerExtensions;

    RTCRtpCodecCapability Dictionary

    dictionary RTCRtpCodecCapability {
      required DOMString mimeType;
      required unsigned long clockRate;
      unsigned short channels;
      DOMString sdpFmtpLine;

    RTCRtpHeaderExtensionCapability Dictionary

    dictionary RTCRtpHeaderExtensionCapability {
      DOMString uri;

    Example JS code to RTCRtpCapabilities

    const pc = new RTCPeerConnection();
    const transceiver = pc.addTransceiver('audio');
    const capabilities = RTCRtpSender.getCapabilities('audio');

    Output :

    codecs: Array(13)
    0: {channels: 2, clockRate: 48000, mimeType: "audio/opus", sdpFmtpLine: "minptime=10;useinbandfec=1"}
    1: {channels: 1, clockRate: 16000, mimeType: "audio/ISAC"}
    2: {channels: 1, clockRate: 32000, mimeType: "audio/ISAC"}
    3: {channels: 1, clockRate: 8000, mimeType: "audio/G722"}
    4: {channels: 1, clockRate: 8000, mimeType: "audio/PCMU"}
    5: {channels: 1, clockRate: 8000, mimeType: "audio/PCMA"}
    6: {channels: 1, clockRate: 32000, mimeType: "audio/CN"}
    7: {channels: 1, clockRate: 16000, mimeType: "audio/CN"}
    8: {channels: 1, clockRate: 8000, mimeType: "audio/CN"}
    9: {channels: 1, clockRate: 48000, mimeType: "audio/telephone-event"}
    10: {channels: 1, clockRate: 32000, mimeType: "audio/telephone-event"}
    11: {channels: 1, clockRate: 16000, mimeType: "audio/telephone-event"}
    12: {channels: 1, clockRate: 8000, mimeType: "audio/telephone-event"}
    length: 13
    headerExtensions: Array(6)
    0: {uri: "urn:ietf:params:rtp-hdrext:ssrc-audio-level"}
    1: {uri: ""}
    2: {uri: ""}
    3: {uri: "urn:ietf:params:rtp-hdrext:sdes:mid"}
    4: {uri: "urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id"}
    5: {uri: "urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id"}
    length: 6

    RTCRtpReceiver Interface

    allows an application to inspect the receipt of a MediaStreamTrack.

    interface RTCRtpReceiver {
      readonly attribute MediaStreamTrack track;
      readonly attribute RTCDtlsTransport? transport;
      readonly attribute RTCDtlsTransport? rtcpTransport;
      static RTCRtpCapabilities? getCapabilities(DOMString kind);
      RTCRtpReceiveParameters getParameters();
      sequence<RTCRtpContributingSource> getContributingSources();
      sequence<RTCRtpSynchronizationSource> getSynchronizationSources();
      Promise<RTCStatsReport> getStats();

    dictionary RTCRtpContributingSource

    dictionary RTCRtpContributingSource {
      required DOMHighResTimeStamp timestamp;
      required unsigned long source;
      double audioLevel;
      required unsigned long rtpTimestamp;

    dictionary RTCRtpSynchronizationSource

    dictionary RTCRtpSynchronizationSource : RTCRtpContributingSource {
      boolean voiceActivityFlag;

    voiceActivityFlag of type boolean – Only present for audio receivers. Whether the last RTP packet, delivered from this source, contains voice activity (true) or not (false).

    RTCRtpTransceiver Interface

    Each SDP media section describes one bidirectional SRTP (“Secure Real Time Protocol”) stream. RTCRtpTransceiver describes this permanent pairing of an RTCRtpSender and an RTCRtpReceiver, along with some shared state. It is uniquely identified using its mid property.

    Thus it is combination of an RTCRtpSender and an RTCRtpReceiver that share a common mid. An associated transceiver( with mid) is one that’s represented in the last applied session description.

    interface RTCRtpTransceiver {
    readonly attribute DOMString? mid;
    [SameObject] readonly attribute RTCRtpSender sender;
    [SameObject] readonly attribute RTCRtpReceiver receiver;
    attribute RTCRtpTransceiverDirection direction;
    readonly attribute RTCRtpTransceiverDirection? currentDirection;
    void stop();
    void setCodecPreferences(sequence<RTCRtpCodecCapability> codecs);

    Method stop() – Irreversibly marks the transceiver as stopping, unless it is already stopped. This will immediately cause the transceiver’s sender to no longer send, and its receiver to no longer receive.
    stopping transceiver will cause future calls to createOffer to generate a zero port in the media description for the corresponding transceiver and stopped transceiver will cause future calls to createOffer or createAnswer to generate a zero port in the media description for the corresponding transceiver

    Methods setCodecPreferences() – overrides the default codec preferences used by the user agent.

    Example setting codec Preferebec for OPUS in audio

    peer = new RTCPeerConnection();    
    const transceiver = peer.addTransceiver('audio');
    const audiocapabilities = RTCRtpSender.getCapabilities('audio');
    let codec = [];

    Before setting codec preference for OPUS

    m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
    c=IN IP4
    a=rtcp:9 IN IP4

    a=rtpmap:111 opus/48000/2
    a=rtcp-fb:111 transport-cc
    a=fmtp:111 minptime=10;useinbandfec=1
    a=rtpmap:103 ISAC/16000
    a=rtpmap:104 ISAC/32000
    a=rtpmap:9 G722/8000
    a=rtpmap:0 PCMU/8000
    a=rtpmap:8 PCMA/8000
    a=rtpmap:106 CN/32000
    a=rtpmap:105 CN/16000
    a=rtpmap:13 CN/8000
    a=rtpmap:110 telephone-event/48000
    a=rtpmap:112 telephone-event/32000
    a=rtpmap:113 telephone-event/16000
    a=rtpmap:126 telephone-event/8000

    After setting codec preference for OPUS audio

    m=audio 9 UDP/TLS/RTP/SAVPF 111
    c=IN IP4
    a=rtcp:9 IN IP4

    a=msid:hcgvWcGG7WhdzboWk79q39NiO8xkh4ArWhbM f15d77bb-7a6f-4f41-80cd-51a3c40de7b7
    a=rtpmap:111 opus/48000/2
    a=rtcp-fb:111 transport-cc
    a=fmtp:111 minptime=10;useinbandfec=1


    Access to information about the Datagram Transport Layer Security (DTLS) transport over which RTP and RTCP packets are sent and received by RTCRtpSender and RTCRtpReceiver objects, as well other data such as SCTP packets sent and received by data channels.
    Each RTCDtlsTransport object represents the DTLS transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

    interface RTCDtlsTransport : EventTarget {
      [SameObject] readonly attribute RTCIceTransport iceTransport;
      readonly attribute RTCDtlsTransportState state;
      sequence<ArrayBuffer> getRemoteCertificates();
      attribute EventHandler onstatechange;
      attribute EventHandler onerror;

    RTCDtlsTransportState Enum

    “new”- DTLS has not started negotiating yet.
    “connecting” – DTLS is in the process of negotiating a secure connection and verifying the remote fingerprint.
    “connected”- DTLS has completed negotiation of a secure connection and verified the remote fingerprint.
    “closed” – transport has been closed intentionally like close_notify alert, or calling close().
    “failed” – transport has failed as the result of an error like failure to validate the remote fingerprint

    RTCDtlsFingerprint dictionary

    dictionary RTCDtlsFingerprint {
      DOMString algorithm;
      DOMString value;

    Protocols multiplexed with RTP (e.g. data channel) share its component ID. This represents the component-id value 1 when encoded in candidate-attribute while ICE candadte for RTCP has component-id value 2 when encoded in candidate-attribute.


    The track event uses the RTCTrackEvent interface.

    interface RTCTrackEvent : Event {
      readonly attribute RTCRtpReceiver receiver;
      readonly attribute MediaStreamTrack track;
      [SameObject] readonly attribute FrozenArray<MediaStream> streams;
      readonly attribute RTCRtpTransceiver transceiver;

    dictionary RTCTrackEventInit

    dictionary RTCTrackEventInit : EventInit {
      required RTCRtpReceiver receiver;
      required MediaStreamTrack track;
      sequence<MediaStream> streams = [];
      required RTCRtpTransceiver transceiver;


    This interface candidate Internet Connectivity Establishment (ICE) configuration used to setup RTCPeerconnection. To facilitate routing of media on given peer connection, both endpoints exchange several candidates and then one candidate out of the lot is chosen which will be then used to initiate the connection.

    • candidate – transport address for the candidate that can be used for connectivity checks.
    • component – candidate is an RTP or an RTCP candidate
    • foundation – unique identifier that is the same for any candidates of the same type , helps optimize ICE performance while prioritizing and correlating candidates that appear on multiple RTCIceTransport objects.
    • ip , port
    • priority
    • protocol – tcp/udp
    • relatedAddress , relatedPort
    • sdpMid – candidate’s media stream identification tag
    • sdpMLineIndex

    usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).

    Interfaces for Connectivity Establishment

    describes ICE candidates

    interface RTCIceCandidate {
      readonly attribute DOMString candidate;
      readonly attribute DOMString? sdpMid;
      readonly attribute unsigned short? sdpMLineIndex;
      readonly attribute DOMString? foundation;
      readonly attribute RTCIceComponent? component;
      readonly attribute unsigned long? priority;
      readonly attribute DOMString? address;
      readonly attribute RTCIceProtocol? protocol;
      readonly attribute unsigned short? port;
      readonly attribute RTCIceCandidateType? type;
      readonly attribute RTCIceTcpCandidateType? tcpType;
      readonly attribute DOMString? relatedAddress;
      readonly attribute unsigned short? relatedPort;
      readonly attribute DOMString? usernameFragment;
      RTCIceCandidateInit toJSON();

    RTCIceProtocol can be either tcp or udp

    TCP candidate type which can be either of

    • active – An active TCP candidate is one for which the transport will attempt to open an outbound connection but will not receive incoming connection requests.
    • passive – A passive TCP candidate is one for which the transport will receive incoming connection attempts but not attempt a connection.
    • so – An so candidate is one for which the transport will attempt to open a connection simultaneously with its peer.

    UDP candidate type

    • host – actual direct IP address of the remote peer
    • srflx – server reflexive ,  generated by a STUN/TURN server
    • prflx – peer reflexive ,IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE
    • relay – generated using TURN

    ICE Candidate UDP Host

    sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:27784895 1 udp 2122260223 51577 typ host generation 0 ufrag muSq network-id 1 network-cost 10
    sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:27784895 1 udp 2122260223 51382 typ host generation 0 ufrag muSq network-id 1 network-cost 10
    sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:27784895 1 udp 2122260223 53600 typ host generation 0 ufrag muSq network-id 1 network-cost 10

    ICE Candidate TCP Host

    Notice TCP host camdidates for mid 0 , 1 and 3 for video , audio and data media types

    sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
    sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10
    sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:1327761999 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag muSq network-id 1 network-cost 10

    ICE Candidate UDP Srflx

    Notice 3 candidates for 3 streams sdpMid 0,1 and 2

    sdpMid: 2, sdpMLineIndex: 2, candidate: candidate:2163208203 1 udp 1686052607 27177 typ srflx raddr rport 53600 generation 0 ufrag muSq network-id 1 network-cost 10
    sdpMid: 1, sdpMLineIndex: 1, candidate: candidate:2163208203 1 udp 1686052607 27176 typ srflx raddr rport 51382 generation 0 ufrag muSq network-id 1 network-cost 10
    sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2163208203 1 udp 1686052607 27175 typ srflx raddr rport 51577 generation 0 ufrag muSq network-id 1 network-cost 10

    ICE Candidate (host)

    sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2880323124 1 udp 2122260223 61622 typ host generation 0 ufrag jsPO network-id 1 network-cost 10

    usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).



    interface RTCPeerConnectionIceEvent : Event {
      readonly attribute RTCIceCandidate? candidate;
      readonly attribute DOMString? url;


    interface RTCPeerConnectionIceErrorEvent : Event {
      readonly attribute DOMString hostCandidate;
      readonly attribute DOMString url;
      readonly attribute unsigned short errorCode;
      readonly attribute USVString errorText;

    RTCIceTransport Interface

    Access to information about the ICE transport over which packets are sent and received. Each RTCIceTransport object represents the ICE transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].

    interface RTCIceTransport : EventTarget {
      readonly attribute RTCIceRole role;
      readonly attribute RTCIceComponent component;
      readonly attribute RTCIceTransportState state;
      readonly attribute RTCIceGathererState gatheringState;
      sequence<RTCIceCandidate> getLocalCandidates();
      sequence<RTCIceCandidate> getRemoteCandidates();
      RTCIceCandidatePair? getSelectedCandidatePair();
      RTCIceParameters? getLocalParameters();
      RTCIceParameters? getRemoteParameters();
      attribute EventHandler onstatechange;
      attribute EventHandler ongatheringstatechange;
      attribute EventHandler onselectedcandidatepairchange;

    RTCIceParameters Dictionary

    dictionary RTCIceParameters {
      DOMString usernameFragment;
      DOMString password;

    RTCIceCandidatePair Dictionary

    dictionary RTCIceCandidatePair {
      RTCIceCandidate local;
      RTCIceCandidate remote;

    RTCIceGathererState Enum


    RTCIceTransportState Enum

    • “new” – ICE agent is gathering addresses or is waiting to be given remote candidates 
    • “checking” –
    • “connected” – Found a working candidate pair, but still performing connectivity checks to find a better one.
    • “completed” – Found a working candidate pair and done performing connectivity checks.
    • “disconnected”,
    • “failed”,
    • “closed”

    RTCIceRole Enum

    “unknown”, // agent who role is not yet defined
    “controlling”, // controlling agent
    “controlled” // controlled agent

    RTCIceComponent Enum

    “rtp”, // ICE Transport is used for RTP (or RTCP multiplexing)
    “rtcp” // ICE Transport is used for RTCP

    Peer-to-peer Data API


    Peer-to-peer DTMF


    Statistics Model

    The browser maintains a set of statistics for monitored objects, in the form of stats objects.
    A group of related objects may be referenced by a selector( like MediaStreamTrack that is sent or received by the RTCPeerConnection).

    Statistics API extends the RTCPeerConnection interface

    partial interface RTCPeerConnection {
      Promise<RTCStatsReport> getStats(optional MediaStreamTrack? selector = null);
      attribute EventHandler onstatsended;

    Method getStats()- Gathers stats for the given selector and reports the result asynchronously.

    RTCStatsReport Object

    map between strings that identify the inspected objects (id attribute in RTCStats instances), and their corresponding RTCStats-derived dictionaries.

    interface RTCStatsReport {
      readonly maplike<DOMString, object>;

    RTCStats Dictionary

    stats object constructed by inspecting a specific monitored object.

    dictionary RTCStats {
      required DOMHighResTimeStamp timestamp;
      required RTCStatsType type;
      required DOMString id;


    Constructor (DOMString type, RTCStatsEventInit eventInitDict)]
    interface RTCStatsEvent : Event {
      readonly attribute RTCStatsReport report;

    dictionary RTCStatsEventInit

    dictionary RTCStatsEventInit : EventInit {
      required RTCStatsReport report;


    • RTCRTPStreamStats, attributes ssrc, kind, transportId, codecId
    • RTCReceivedRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes packetsReceived, packetsLost, jitter, packetsDiscarded
    • RTCInboundRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes trackId, receiverId, remoteId, framesDecoded, nackCount
    • RTCRemoteInboundRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes localId, bytesReceived, roundTripTime
    • RTCSentRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes packetsSent, bytesSent
    • RTCOutboundRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes trackId, senderId, remoteId, framesEncoded, nackCount
    • RTCRemoteOutboundRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes localId, remoteTimestamp
    • RTCPeerConnectionStats, with attributes dataChannelsOpened, dataChannelsClosed
    • RTCDataChannelStats, with attributes label, protocol, dataChannelIdentifier, state, messagesSent, bytesSent, messagesReceived, bytesReceived
    • RTCMediaStreamStats, with attributes streamIdentifer, trackIds
    • RTCMediaHandlerStats with attributes trackIdentifier, ended
    • RTCSenderVideoTrackAttachmentStats, with all required attributes from its inherited dictionaries
    • RTCSenderAudioTrackAttachmentStats, with all required attributes from its inherited dictionaries
    • RTCAudioHandlerStats with attribute audioLevel
    • RTCVideoHandlerStats with attributes frameWidth, frameHeight, framesPerSecond
    • RTCVideoSenderStats with attribute framesSent
    • RTCVideoReceiverStats with attributes framesReceived, framesDecoded, framesDropped, partialFramesLost
    • RTCCodecStats, with attributes payloadType, codecType, mimeType, clockRate, channels, sdpFmtpLine
    • RTCTransportStats, with attributes bytesSent, bytesReceived, rtcpTransportStatsId, selectedCandidatePairId, localCertificateId, remoteCertificateId
    • RTCIceCandidatePairStats, with attributes transportId, localCandidateId, remoteCandidateId, state, priority, nominated, bytesSent, bytesReceived, totalRoundTripTime, currentRoundTripTime
    • RTCIceCandidateStats, with attributes address, port, protocol, candidateType, url
    • RTCCertificateStats, with attributes fingerprint, fingerprintAlgorithm, base64Certificate, issuerCertificateId

    Ref :

    CLI/NCLI, Robocalls and STIR/SHAKEN

    To understand the need for implementing an identification verification technique in Internet protocol based network to network communication system , we need to evaluate the existing problem plaguing the VoIP setup .

    What is Call ID spoofing ? 

    Vulnerability of existing interconnection phone system which is used by robo-callers to mask their identity or to make it appear the call is from a legitimate source, usually originates from voice-over-IP (VOIP) systems.

    In this context understand the Caller Line identification CLI/ NCLI techniques used by VoIP and OTT( over the top) providers today.

    CLI (Caller Line Identification)

    If call goes out on a CLI route ( White Route ) the received party will likely see your callerID information

    • Lawful – Termination is legal on the remote end ie abiding country’s telco infrastructure and stable
    • Expensive – usually with direct or via leased line (TDM) interconnections with the tier-1 carriers.

    Non-CLI (Non-Caller Line Identification)

    The Caller ID is not visible at the call
    If call goes out on a Non-CLI route (Grey Route) goes out on a non-CLI routes they will see either a blocked call or some generic number.

    • Unlawful – questionable legality or maybe violating some providers AUP(Acceptable Use Policy ) on the remote end.
    • Cheaper – low quality , usually via VoIP-GSM gateways

    Example include robocalls , tele-marketting / spam etc which are unwilling to share their Caller Id for call receiver, to not be blocked or cancelled.

    To overcome the problem of non-verifiable spam , robocalls a suite of protocols and procedures are proposed that can combat caller ID spoofing on VOIP and connected public telephone networks.


    Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs

    Used by robocallers to mask their identity or to make it appear the call is from a legitimate source
    usually orignates from voice-over-IP (VOIP) systems


    Standards developed by the Internet Engineering Task Force (IETF) 

    For telecommunication service providers implement  certificate management system to create and manage the public and private keys, digital certificates used to sign and verify Caller ID details. 

    Adds information to the SIP headers that allow the endpoints along the system to positively identify the origin of the data , such as JSON web tokens encrypted with the provider’s private key, encoded using Base64,

    There are three levels of verification, or “attestation”

    • A : Full Attestation
      indicates that the provider recognizes the entire phone number as being registered with the originating subscriber.
    • B : Partial Attestation
      call originated with a known customer but the entire number cannot be verified,
    • C : Gateway Attestation
      call can only be verified as coming from a known gateway

    How can the Public Key Infrastructure be used ? 

    In an interconnection network , each telephone service provider will obtain its digital certificate from a certificate authority (CA)  that is trusted by other telephone service providers. Calling party signs the SIP Header  caller ID as legitimate . The called party verifies that the calling number is authentic


    Originating service provider’s encrypted SIP Identity Header includes the following data:

    1. Attestation level
    2. Date and Time
    3. Calling and Called Numbers
    4. Orig ID for analytics and/or traceback purposes among others
    5. Location of certificate repository
    6. Signature
    7. Encryption algorithm

    FCC has also assigned the role of a Secure Telephone Identity Policy Administrator (STI-PA) which oversees that CAs do not provide certificate to spoofing robocallers and enforce the framework for STIR /SHAKEN .

    Sample Identity header in SIP requst

    INVITE SIP/2.0
    Via: SIP/2.0/TLS;branch=z9hG4bKnashds8
    To: Bob
    From: Alice ;tag=1928301774
    Call-ID: a84b4c76e66710
    CSeq: 314159 INVITE
    Max-Forwards: 70
    Date: Thu, 21 Feb 2002 13:02:03 GMT
    Content-Type: application/sdp
    Content-Length: 147
    o=UserA 2890844526 2890844526 IN IP4
    s=Session SDP
    c=IN IP4
    t=0 0
    m=audio 49172 RTP/AVP 0
    a=rtpmap:0 PCMU/8000


    STIR is based on the SIP protocol and is designed to work with calls being routed through a VOIP network. Since traditional endpoints like POTS and SS7 networks also should be covered under this call authenticity framework , SHAKEN was developed to manage call via IP-to-telephone gateways .

    Developed by the Alliance of Telecommunications Industry Solutions (ATIS)

    Working Steps  :

    1. When a call is initiated, a SIP INVITE is received by the originating service provider.
    2. Originating service provider verifies the call source and number to determine how to confirm validity.
      1. Full Attestation (A) — The service provider authenticates the calling party AND confirms they are authorized to use this number. An example would be a registered subscriber.
      2. Partial Attestation (B) — The service provider verifies the call origination but cannot confirm that the call source is authorized to use the calling number. An example would be a calling number from behind an enterprise PBX.
      3. Gateway Attestation (C) — The service provider authenticates the call’s origin but cannot verify the source. An example would be a call received from an international gateway.
    3. Create a SIP Identity header that contains information on the calling number, called number, attestation level, and call origination, along with the certificate thus caller ID “signed” as legitimate
    4. SIP INVITE with the SIP Identity header with the certificate is sent to the destination service provider.
    5. Destination service provider verifies the identity of the header and certificate.

    Diagrammatic depiction of flow of how Telecom carriers to digitally validates authenticity before receiving or handoff through their network



    Video Codecs – H264 , H265 , AV1

    Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1

    MPEG 2

    MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
    generic coding of moving pictures and associated audio information
    combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

    better than MPEG 1

    evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.


    • over-the-air digital television broadcasting and in the DVD-Video standard.
    • TV stations, TV receivers, DVD players, and other equipment
    • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
    • XDCAM – professional file-based video recording format.
    • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:


    Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
    introduced in 2004

    Better than MPEG2

    40-50% bit rate reduction compared to MPEG-2

    Support Up to 4K (4,096×2,304) and 59.94 fps
    21 profiles ; 17 levels

    Compression Model

    Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

    Hybrid spatial-temporal prediction model
    Flexible partition of Macro Block(MB), sub MB for motion estimation
    Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
    Introduced multi-view extension
    9 directional modes for intra prediction
    Macro Blocks structure with maximum size of 16×16
    Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )


    • most deployed video compression standard
    • Delivers high definition video images over direct-broadcast satellite-based television services,
    • Digital storage media and Blu-Ray disc formats,
    • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
    • Security and surveillance systems and DVB
    • Mobile video, media players, video chat


    High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC
    video compression standard designed to substantially improve coding efficiency
    stream high-quality videos in congested network environments or bandwidth constrained mobile networks
    Jan 2013
    product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

    better than H264

    overcome shortage of bandwidth, spectrum, storage
    bandwidth savings of approx. 45% over H.264 encoded content

    resolutions up to 8192×4320, including 8K UHD
    Supports up to 300 fps
    3 approved profiles, draft for additional 5 ; 13 levels
    Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

    multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

    Compression Model

    Enhanced Hybrid spatial-temporal prediction model
    CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures

    Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
    Individual rectangular regions that divide the image are independent

    Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.

    Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
    33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
    Entropy coding is only CABAC


    • cater to growing HD content for multi platform delivery
    • differentiated and premium 4K content

    reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
    also provide greater video quality experience at same bitrate

    Using ffmpeg for H265 encoding

    I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

    After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

    > ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4                                              ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   libavutil      56. 22.100 / 56. 22.100   libavcodec     58. 35.100 / 58. 35.100   libavformat    58. 20.100 / 58. 20.100   libavdevice    58.  5.100 / 58.  5.100   libavfilter     7. 40.101 /  7. 40.101   libavresample   4.  0.  0 /  4.  0.  0   libswscale      5.  3.100 /  5.  3.100   libswresample   3.  3.100 /  3.  3.100   libpostproc    55.  3.100 / 55.  3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     creation_time   : 2019-06-23T04:58:13.000000Z   Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping:   Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices                              : 1 x265 [info]: frame threads / pool features       : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0 x265 [info]: References / ref-limit  cu / depth  : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     encoder         : Lavf58.20.100     Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1       encoder         : Lavc58.35.100 libx265 frame=  746 fps= 64 q=-0.0 Lsize=     606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x     video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I:      3, Avg QP:27.18  kb/s: 1884.53  x265 [info]: frame P:    179, Avg QP:27.32  kb/s: 523.32   x265 [info]: frame B:    564, Avg QP:35.17  kb/s: 38.69    x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9%  encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25

    if you get error like

    Unknown encoder 'libx265'

    then reinstall ffmpeg with h265 support


    Realtime High quality video encoder
    product of product of the Alliance for Open Media (AOM)
    Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)

    better than H265

    AV1 is royalty free and overcomes the patent complexities around H265/HVEC


    • Video transmission over internet , voip , multi conference
    • Virtual / Augmented reality
    • self driving cars streaming
    • intended for use in HTML5 web video and WebRTC together with the Opus audio format

    Audio and Acoustic Signal Processing

    Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

    Application of audio Signal processing in general

    • storage
    • data compression
    • music information retrieval
    • speech processing ( emotion recognition/sentiment analysis , NLP)
    • localization
    • acoustic detection
    • Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
    • noise cancellation
    • acoustic fingerprinting
    • sound recognition ( speaker Identification , biometric speech verification , voice commands )
    • synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
    • enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

    Effects for audio streams processing

    • delay or echo
      To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
      Implemented using tape delays or bucket-brigade devices.
    • flanger
      delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
      signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
    • phaser
      signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
    • chorus
      delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
    • equalization
      frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
      overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
    • pitch shift
      shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
    • time stretching
      changing the speed of an audio signal without affecting its pitch.
    • resonators
      emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
    • modulation
      change the frequency or amplitude of a carrier signal in relation to a predefined signal.
    • compression
      reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
    • 3D audio effects
      place sounds outside the stereo basis
    • reverse echo
      swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
    • wave field synthesis
      spatial audio rendering technique for the creation of virtual acoustic environments

    ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

    • Acoustic echo control
      aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
    • Noise control
      microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
    • Gain control
      how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
    • Linear filtering
      ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
    • Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

    ASP applications in music playback

    AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

    • Post-processing
      techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
    • Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

    ASP for virtual assitants

    Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

    • Speech enhancement
      multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
    • Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
    • Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

    Other areas of ASP

    • Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

    Ref :
    wikipedia –
    IEEE –