VoIP API design

  • Public API endpoints
  • Internal API gateways
  • API Rate Limiter
    • Token based Rate Limiting
    • Token bucket filter
    • Hierarchical Token Bucket (HTB)
    • Fair Queing
    • CBQ (Class Based Queing)
    • Modular QoS command-Line interface (MQC) Shaping
  • Throttling

VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.

Public API endpoints

Programmatic call control APIs

  1. Making a Call

HTTP POST https://www.altteelcom.com/voice/call


to: '+14155551212',
from: '+18668675310'

Calback params

statusCallback: 'https://www.myapp.com/events',
statusCallbackEvent: ['initiated', 'answered'],
statusCallbackMethod: 'POST'


"from": "+9999999999"
"to": "+111111111",
"status": "ongoing"

"date_created": "Mon, 5 Sep 2020 20:36:28 +0000"
"start_time": "Mon, 5 Sep 2020 20:36:29 +0000"
"date_updated": "Mon, 5 Sep 2020 20:36:44 +0000"
"direction": "outbound",
"duration": ""
"end_time": ""

"price": "-0.03000"
"price_unit": "USD"

The response can additional have SID and app version and other URI for recording , transcription , apyment and other services for this call .

2. Ending an ongoing Call

HTTP UPDATE https://www.altteelcom.com/voice/call/callid001


status: 'end'

This updates the end time of the call and sets the evenst for CDR processing

Services API

  • Call Reording
  • Call transcription

Confernece APIs
HTTP POST https://www.altteelcom.com/voice/conferences

  • creating a conf
  • fetching conf based on date or room name
  • updating a ongoing conf
  • ending a conf
  • set IVR announcement on ongoing conf

Auth API


HTTP POST https://www.altteelcom.com/cdr

  • get CDR ( filtered per cal or acc to specific date or account)
  • bulk export of CDR

Internal API gateways

API Rate Limiter

Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.

Auto scaling Load balancerRate Limiter
horizotal or vertical scalling can countger incoming trafficLB can limit number of simultaneous requests. It can reject or send to queue for later operationCan intelligently understand the cost of each operation and perform throttling.
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately(-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations)
(-) LB cannot ensure uniform distribution of distribution of operations among all servers.

A rate limiter should have low latency, accurate and scalable.

RateLimiter inside the serviceprocessRate Limiter as its own process outside as a daemon
(+) faster , no IPC
(+) reisstnt to interprocess call failures
(+) programming langiage agnostic daemon
(+) uses its own memory space, more predictable
(-) service meory needs to allocate space for rate limiters
widely used for auto discovery of service host

Token based Rate Limiting

 provides admission contro

Token bucket filter

define a users quota in terms average rate and burst capacity

Hierarchical Token Bucket ( HTB)

 uses the deficit round-robin algorithm for fair queuing

Fair Queing

give paying users a bandwidth fraction of 25%

priority queuing

decide 1 packet/ms for free or reduce rate user

distributes that sender’s bandwidth among the other senders

CBQ ( Class Based Queing)

Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.

Modular QoS command-Line interface (MQC) Shaping

mplement traffic shaping for a specific type of traffic using a traffic policy

  • When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
  • When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
  • When the buffer queue is full, the device discards the buffered packets.


  • delay the packet until the bucket is ready / shaping
  • drop the packet / Policing
  • mark the packet as non-compliant

Failure management on Rate Limiter

  • Node Crash : just less requests trolled
  • Leaky bucket
  • tokens can go into -ve

System Design for API gateway

Important points for design API gateway

  • Serialize data in company binary format
  • allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
  • aggregation on API gateway on the fly
Frontend ServicePartitioned ServiceBackend Service
Lightweight web service
Request Validation
Auth / Authorization
TLS(SSL ) termination
Server sode encryption
Rate Limiting(throttling)
Request deduplication
Caching layer between frontend and backend
Leader Selection + Quorem

Distributed messaging system( fast and slow paths) for API

A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.

Applications of this system design

  • Find heavy hitters ( Top K problem )
  • Popular products / trends
  • Voltaile stocks
  • DDoS Attack Prevention

References :

High availiability and Scalibility in VoIP platforms

Load Balancers

software LBLayer 4 / hardware LB
Amazon ELB ( eleastic load balanecr)
F5 BIG-IP load balancer
CISCO system catalyst
Barracuda load balancer
used by applications in cloud
ADN (Application delivery network)
used by  network address translators (NATs) 
DNS load balancing

LB ping each server fpr health status and greylists server that are unhealthy , it rechecks aftera while and if a server is healthy ( reponds with pong) it can resume sending traffic to it.

LB should also be distributed to diff data centres in primary -secondary setup for HA.

Networking protocol

TCP LoadbalancerHTTP load balancersSIP based LB as Kamailio/ Opensips
can forwrad the packet without inspecting the content of the packet.terminate the connection and look inside the request to make a load balcing decsiion for exmaple by using a cookie or a header.domain specific to VoIP
(+) fast, can handle million of req per second(+) handle SIP routing based on SIP headers and prevent flooding atacks and other malicious malformed packets from reaching application server

Load balancing algorithms

  • Weighted Scheduling Algorithm
  • Round Robin Algorithm
  • Least Connection First Scheduling
  • Lest response time algorithms
  • Hash based algorithm ( send req based on hashed value such as suing IP address of request URL)
LoadbalancerReverse Proxy
forward proxy server allows multiple clients to route traffic to an external serveraccepts clients requestd for server and also returns the server’s response to the client ie routes traffic on behalf of multiple servers.
Balances load and incoming traffic endpointpublic facing endpoint for outgoing traffic
 additional level of abstraction and security, compression
used in SBC (session border controllers) and gateways

Service Discovery

client side service discovery uses broadcasting or heartbeat mechanism to keep track of active servers and deactivates unresponsive or failed servers. Some approaches for Service Discovery

  1. Mesh
    1. (-) exponentially incresing network traffic
  2. Gossip
  3. Distributed cache
  4. Coordination service with Service
    • (-) requesres coordination service for leader selection
    • (-) needs consensus
    • (-) RAFT and pbFT for mnaging failures
  5. Random leader selection
    • (+) quicker
    • (-) may not gurantee one leader
    • (-)split brain problem

Keepalive, unregistering unhealthy nodes

Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint.


Usuallay there is a tradeoff between liveness and safety.

  1. Single leader replication
    • (-) vulnerable to loss of data is leader goes down before replication completes
    • used to in sql
  2. multileader replication and
  3. leaderless replication
    • (-) increases latencies
    • (-) quorem based on majority , cannot function is majority node are not down
    • used in cassandra

Data Store Replication

For Relatonal Dataabase

For NoSQL databse replication and HA

Quick Response / Low latency

Message format

Textual Message formatBinary Message Format
human readbale like
json xml
diff to comprehend , need shared schema between sender and receiver to serilaize and deserialze ,
names for every field adds to size no field name or only tags , reduces message size

Gateways for faster routing and caching to services

gateways are single entry point to route user requests to backend services .

Separate hot storage from cold storage

hot storage is frquently accessed data which must be near to server

cold storage is less frequently accessed data such as archives

  • object storage
  • slow access


To make a system

  • scalable : use partitioing
  • reliable : use replication and checkpointing to not loose data in failures
  • fast : use in -memory usage

According to CAP theorem Consistency and Availiability are difficult to achieve together and there has to be a tradeof acc to requirnments.


Partition strategy can be based on various ways such as :-

  • Name based partition
  • geographic partition
  • names’ hashed value based on identifier
    • (-) can lead to hot partitions ( high density in areas of freq accessible identioers )
    • (-) high density spots for example all messages with a null key to go to the same partition
    • (-) doesnt scale
  • event time based hash
    • (+) data is spread evenly over time

To create a well distributed partition we could spread hot partition into 2 partitions or dedicate partitions for freq accessible items. An effective partitioning keys uses

  • Cardinality : total num of unique keys for a usecase. High cardinality leads to better distribution.
    • high cardinatility keys : names , email address , url since they have high variatioln
    • low cardinatlity keys : boolean flags such as gender M/F
  • Selectivity : number of message with each key. High selectivity leads to hotspots and hence low selectivity is better for even distribution.


Scale Out not Up !

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management used in DevOps. I have mentioned this in detail on the article on VoIP and DevOps below.

Multiple PoPs (point of presence)

for a VOIP system catering to many clients accross the globe or accessing multiple carriers meant for different counteries based on Prefix matching , there should be alocal PoP in most used regions . typically these regions include – US east – west coasts, UK – germany of London , Asia Pacific – Mumbai ,Hong Kong and Australia.

Minimal Latency and lowest amount of tarffic via public internet

Created multiple POPs and enbaling private traffic via VPN inbetween them ensures that we use the backbone of our cloud proivider such as AWS or datacentre instead of traversing via public internet which is slower and more insecure .

By hoping on private interface between the cloud server and mainting a private connection and keepalive betwen them helps optimize the traffic flow while keeping the RTT and latency low.

HA ( High-availability )

Some factors affecting Dependability are

  • Eventual Consistency
  • MultiRegion failover
  • Disaster Recovery

A high-availability (HA) architecture implies Dependability.Usually via existence of redundant applications servers for backups: a primary and a standby. These applications are configured so that if primary fails, the other can take over its operations without significant loss of data or impact to business operations.

Downtime / SLA of 5 9’s in aggregate failures

4 9’s of availiability on each service components gives a downtime of 53 mins per service each year. However in aggregate failure this could amlount to (99.99)10 = 99.9 downtime which is 8-10 hours each year.

Thus, aggregate failure should be taken into consideration while designing reliable systems.

HA for Proxy / Load balancer (LB)

A LB is the first point of contact for outbound calls and usually does not save the dialogue information into memory or database but still contain the transaction information in memory. In case the LB crashes and has to restart, it should

  • have a quick uptime
  • be able to handle in dialogue requests
  • handle new incoming dialogue requests in a stateless manner
  • verify auth/authorization details from requests even after restart

HA for Call Control app server

App server is where all the business logic for call flow management resides and it maintains the dialog information in memory.

Issues with in-memory call states : If the VM or server hosting the call control app server is down or disconnected, then live calls are affected, this, in turn, causes revenue loss. Primarily since the state variable holding the call duration would be able to pass onto the CDR/ billing service upon the termination of the call. For long-distance, multi telco endpoint calls running hours this could be a significant loss.

  • Standby app server configuration and shared memory : If the primary app server crashes the standby app server should be ready to take its place and reads the dialog states from the shared memory.
  • Live load balanced secondary app server + external cache for state varaibles : External cache for state variables: a cluster of master-slave caches like Redis is a good way of maintaining the dialogue state and reading from it once the app server recovers from a failed state or when a secondary server figures it has a missing variable in local memory.

Media Server HA

Assuming the kamailio-RTPengine duo as App server and Media Server. These components can reside in same or different VMs. Incase of media server crash, during the process of restoring restarted RTpengine or assigning a secondary backup RTpengine , it should load the state of all live calls without dropping any and causing loss of revenue . This is achived by

  • external cache such as Redis ,
  • quick switchover from primary to secondary/fallback media server and
  • floating IPs for media servers that ensures call continuity inspite of failure on active media server.

Architecturally it looks the same as fig above on HA for the SIP app server.

Security against malicious attacks

Attacks and security compromisation pose a very signficant threat to a VoIP platform.

MITM attacks

Man in midddle attacks can be counetred by

  • End to end encryption of media using SRTP and signals using TLS
  • Strong SIP auth mechanism using challenges and creds where password is composed of mixed alphanumeric charecters and atleast 12 digits long
  • Authorization / whitelisting based on IP which adheres to CIDR notation

DDOS attacks

DDOS renders a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.

dDOS – multiple network hosts to flood a target host with a large amount of network traffic. Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

Can be counetred by

  • detect flooding and q in traffic and use Fail2ban to block
  • challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required)

Read about SIP security practices in deatils https://telecom.altanai.com/2020/04/12/sip-security/

Other important factors leading to security

  • Keystores and certificate expiry tracker
  • priveligges and roles
  • Test cases and code coverage
  • Reviewers approval before code merge
  • Window for QA setup and testing , to give go ahead before deployment

Identifying outages and Alerting

Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion .

Logging and Alerting for a VoIP CPaaS platform .
Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion.

Sources for alert manager

  • Build failed ( code crashes, Jenkins error)
  • Deployment failed ( from Kubernetes , codechef, docker ..)
  • configuration errors ( setting VPN etc )
  • Server logs
  • Server health
  • homer alerts ( SIP calls responses 4xx,5xx,6xx)
  • PCAP alerts ( Malformed SIP SDP ..)
  • Internal Smoke test ( auto testing procedure done routinely to check live systems )
  • Support tickets from customer complaints ( treat these as high priority since they are directly impacting customers)


The test bed and QA framework play a very crticial role in final product’s credibility and quality.

Performance Testing

  • Stress Testing : take to breaking
  • Load Testing : 2x to 3x testing
  • Soak Testing : typical network load to long time ( identify leaks )

Robust QA framework( stress and monkey testing) to identify potential bottlenecks before going live

A QA framework basically validates the services and callflows on staging envrionment before pushing changes to production. Any architectural changes should especially be validated throughly on staginng QA framework befire making the cut. The qualities of an efficient QA platform are :

Genric nature – QA framework should be adatable to different envrionments such as dev , staging , prod

Containerized – it should be easy to spn the QA env to do large scale or small scale testing and hence it should be dockerized

CICD Integration and Automation – integrate the testcases tightly with gt post push and pull request creation . Minimal Latency and lowest amount of tarffic via public internet

Keep as less external dependecies as possible for exmaple a telecom carrier can be simulated by using an PBX like freeswitch or asterix

Asynchronous Run – Test cases should be able to run asynchronously. Such as seprate sipp xml script for reach usecase

Sample Testcases for VoIP

  • Authentication before establish a session
  • Balance and account check before establishing a session like whitelisting , blacklisting , restricted permission in a particular geography
  • Transport security and adaptibility checks , TLS , UDP , TCP
  • codec support validation
  • DTMF and detection
  • Cross checking CDR values with actual call initiator and terminator party
  • cross checking call uuid and stats
  • Validating for media and related timeouts

QA frameworks tools – Robot framework

traffic monitor – VOIP monitor

customer simulator – sipP scripts

network traffic analyser – wireshark

pcap collevcter – tcpdump , sngrep

Distributed Data Store

A Distributed Database Design could have many components. It could work on static datastore like

  • SQL DB where schema is important
    • MySQL
    • postgress
    • Spanner – Globally-distributed database from Google
  • NoSQL DB for to store records in json
    • Cassandra – Distributed column-oriented database
  • Cache for low latency retrivals
    • Memcached – Distributed memory caching system
    • Redis – Distributed memory caching system with persistence and value types
  • Data lakes for heavy sized data
    • AWS s3 object storage
    • blob storage
  • File System
    • Google File System (GFS) – Distributed file system
    • Hadoop File System (HDFS) – Open source Distributed file system

or work on realtime data streams

  • Batch processing ( Hadoop Mapreduce)
  • Stream processing ( Kafka + spark)
    • Kafka – Pub/sub message queue
  • Cloud native stream processing ( kinesis)

Each component has its own pros and cons. The choice depends on requirnments and scope for system behaviour like

  • users/customer usuage and expectation ,
  • Scale ( read and write )
  • Performnace
  • Cost
Users/customersScale ( read / write)PerformanceCost
Who uses the system ?
How the system will be used?
Read / writes per second ?
Size of data per request ?
cps ( calls or click per second) ?
write to read delay ?
p99 latency for read querries ?
should design minimize the cost of development ?

should design mikn ize the cost of mantainance ?
spikes in traffic eventual consistency ( prefer quick stale data ) as compared to no data at all
redundancy for failure management

Some fundamental constrains while design distributed data structure :-

p99 latency : 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.

Request latency:
    min: 0.1
    max: 7.2
    median: 0.2
    p95: 0.5
    p99: 1.3

Inidiviual Events vs Aggregate Data

Inidividual Events ( like every click or every call metric)Aggregate Data ( clicks per minute, outgoing calls per minute)
(+) fast write
(+) can customize/ recalculate data from raw
(+) faster reads
(+) data is fready for decision making / statistics
(-) slow reads
(-) costlier for large scale implementations ( many events )
(-) can only query in the data as was aggregates ( no raw )
(-) requires data aggregation pipeline
(-) hard to fix errors
suitable for realtime / data on fly
low expected data delay ( minutes )
suitable for batch processing in background where delay is acceptable from mintes to hours

Push vs Pull Architecture

Push : A processing server manages state of varaible in memory and pushes them to data store.

  • (-) crashed processingserver means all data is lost

Pull : A temporary data strcyture such as a queue manages the stream of data and processing service pull from it to process before pusging to data stoore.

  • (+) a crashed server has to effect on temporarily queue held data and new server can simply take on where previous processing server left.
  • (+) can use checkpointing
Structured and Strict schema
Relational data with joins
Semi-structured data
Dynamic or flexible schema
(+) faster lookup by index(-) data intensive workload
(+) high throughput for IOPS (Input/output operations per second )
used for
Account information
best suitable for
Rapid ingest of clickstream and log data
Leaderboard or scoring data
Metadata/lookup tables
DynamoDB – Document-oriented database from Amazon
MongoDB – Document-oriented database

A NoSQL databse can be of type

  • Quorem
  • Document
  • Key value
  • Graph

Cassandra is wide column supports asyn master less replication

Hinge base also a quorem based db also has master based preplication

MongoDB documente orientd DB used leacder based replication

SQL scaling patterns include:

  • Federation/ federated database system : transparently maps multiple autonomous database systems into a single virtual/federated database.
    • (-) slow since it access multiple data storages to get the value
  • Sharding / horizontal partition
  • Denormalization : Even though normalization is more memory efficient denormalization can enhance read performance by additing redundant pre computed data in db or grouping related data.
    • Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.
  • SQL Tuning : “iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals”

Influx DB : to store time series data

AWS Redshift

Apache Hadoop


Embeed Data : RocksDB

Message Queues(Buffering) vs Batch Processing

Distributed event management, monitoring and working on incoming realtime data instead of stored Database is the preferred way to churn realtime analysis and updates. The multiple ways to handle incoming data are

  1. Batch processing – has lags to produce results, not time crtical
  2. Data stream – realtime response
  3. Message Queues – ensures timely sequence and order
Add events to buffer that can be read Add events to batch and send when batch is full
(+) can handle each event(+) cost effective
(+) ensures throughput
(-) if some events in batch fail should whole batch fail ?
(-) not suited for real time processing
S3 like objects storage + Hadoop Mapreduce for processing


  • Connection timeout : use latency percentiles to calculate this
  • Request timeout


  • exponential backoff : increase waiting time each try
  • jitter : adds rabdomness to retry intervals to spread out the load.

Grouping events into object storage and Message Brokers

slower than stream processing but faster than batch processing.

Distributed Event management and Event Driven architecture using streams

In event driven archietcture a produce components performs and action which creates an event thata consumer/listener would subscribes to consume.

  • (+) time sensitive
  • (+)Asynch
  • (+) Decoupled
  • (+) Easy scaling and Elasticity
  • (+) Heterogeneous
  • (+) contginious

Expanding the stream pipeline

Event Streams decouple the source and sink applications. The event source and event sinks (such as webhooks) can asynchronously communicate with each other through events.

Options for stream processing architectures

  • Apache Kafka
  • Apache Spark
  • Amazon kinesis
  • Google Cloud Data Flow
  • Spring Cloud Data Flow

Here is a post from earlier which discusses – Scalable and Flexible SIP platform building, Multi geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Failure Recovery, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, Load Balancer / Message Dispatcher, Back end Dynamic Routing and REST API services, Containerization and Auto Deployment, Auto scaling Cloud Servers using containerized images.

Lambda Architecture

Stream processing on top of map reduce and stream processing engine. In lambda architecture we can send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Lambda Archietcture : stream processing on top of map reduce and stream processing engine. Send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Apache Kafka is used as source which is a framework implementation of a software bus using stream-processing. “.. high-throughput, low-latency platform for handling real-time data feeds”.

Apache Spark : Data partitioning and in memory aggregation.

Distributed cache for call control Servers

Dedicated Cache ClusterCo located cache
Isolates cache fro service
Cache and service do not share memory and CPU
can scale independently
can be used by many microservices
flexibility in choosing hardware
doesnt require seprate hardware
low operational and hardware cost
scales together with the service

Choosing cache host

  • Mod function
    • (-) behaves differently when a new client is added or one is removed , unsuitable for prod
  • Consistent hashing ( chord)
    • maps each value to a point on circle

Cache Replacement

Least Recently Used Cache Replacement

Consistency and High Availiability in Cache setup

ReadReplicas live in differenet data centre for disaster recovery.

Strong consistency using Master Slave

Circuits – fail fast, wait for circuit to recover before using again

Design patterns for a circuit base setup to gracefully handle exceptions using fallback.

Circuit breaker : stops client from repeatedly trying to exceute by calculate the error threshold.

Isolated thread pool in circuits and ensure full recovery before calling the service again.

(+) Circuit breaker event causes the entire circuit to repair itself before attempting operations.

References :

EEP (formely HEP) Extensible Encapsulation Protocol with HOMER

EEP duplicates and IP datagram and encapsulates and sends for remote relatime monitoring for SIP specific alerts and notifications . HEP is popular among many SIP servers including Freeswitch , Opensips, Kamailio, RTP engine as an external module .

  • intended for passive duplicated for remote collection
  • can be used for audit storage and analysis
  • does not alter the orignal datagram or headers

HOMER is Packet and Event capture system popular fpr VOIP/RTC Monitoring based on HEP/EEP (Extensible Encapsulation protocol)

SIP Server Integration

Homer and homer encapsulation protocl (HEP) integration with sip server brings the capabilities to SIP/SDP payload retention with precise timestamping better monitor and detect anomilies in call tarffic and events correlation of session ,logs , reports also the power to bring charts and statictics for SIP and RTP/RTCP packets etc. We read about sipcapture and sip trace modules in project sipcapture_siptrace_hep.

Both Kamailio and Opensips HEP Integration are structurally simmilar. In kamailio SIPCAPTURE [2] module enables support for –

● Monitoring/mirroring port
● HEP encapsulation protocol mode (HEP v1, v2, v3)

Figure Opensips Capturing ( credits http://www.opensips.org)

Figure showing Opensips integartion with external capturing agent via proxy agent ( which can be HOMER)

To achieve that, load and configure the SipCapture module in the routing script.

Snippets fro Kamailio Homer docker installation as a collector

git clone https://github.com/sipcapture/homer-docker.git
cd homer-docker
docker-compose build
docker-compose up

Outsnippets from screen while the installation takes place

Creating network "homer-docker_default" with the default driver
Creating volume "homer-docker_homer-data-semaphore" with default driver
Creating volume "homer-docker_homer-data-mysql" with default driver
Creating volume "homer-docker_homer-data-dashboard" with default driver
Pulling mysql (mysql:5.6)...
5.6: Pulling from library/mysql
Creating mysql ... done
Creating homer-webapp   ... done
Creating homer-cron      ... done
Creating homer-kamailio  ... done
Creating bootstrap-mysql ... done
Attaching to mysql, homer-webapp, bootstrap-mysql, homer-cron, homer-kamailio
homer-webapp | Homer web app, waiting for MySQL
homer-cron   | Homer cron container, waiting for MySQL
homer-kamailio | Kamailio, waiting for MySQL
bootstrap-mysql | Mysql is now running.
bootstrap-mysql | Beginning initial data load....
bootstrap-mysql | Creating Databases...
bootstrap-mysql | Creating Tables...
omer-kamailio | Kamailio container detected MySQL is running & bootstrapped
homer-kamailio |  0(22) INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
homer-kamailio |  0(22) WARNING: <core> [core/socket_info.c:1315]: fix_hostname(): could not rev. resolve
homer-kamailio | config file ok, exiting...
homer-kamailio | loading modules under config path: //usr/lib/x86_64-linux-gnu/kamailio/modules/
homer-kamailio | Listening on 
homer-kamailio |              udp:
homer-kamailio | Aliases: 
homer-kamailio | 
homer-kamailio |  0(23) INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
homer-kamailio |  0(23) WARNING: <core> [core/socket_info.c:1315]: fix_hostname(): could not rev. resolve
homer-kamailio | loading modules under config path: //usr/lib/x86_64-linux-gnu/kamailio/modules/
homer-kamailio | Listening on 
homer-kamailio |              udp:
homer-kamailio | Aliases: 
homer-kamailio | 
homer-kamailio |  0(23) INFO: sipcapture [sipcapture.c:480]: parse_table_names(): INFO: table name:sip_capture
homer-webapp | Homer web app container detected MySQL is running & bootstrapped
homer-webapp | Module php5 already enabled

Capture tools

Dialoge module

storing dialogs in mysql DB , requires initialising mysql

#!define WITH_MYSQL
#!ifdef WITH_MYSQL
loadmodule "db_mysql.so"
#!ifdef WITH_MYSQL
# - database URL - used to connect to database server by modules such
#       as: auth_db, acc, usrloc, a.s.o.
#!ifndef DBURL
#!define DBURL "mysql://root:kamailio@localhost/kamailio"
loadmodule "dialog.so"
# ----- dialog params ------
modparam("dialog", "dlg_flag", 10)
modparam("dialog", "track_cseq_updates", 0)
modparam("dialog", "dlg_match_mode", 2)
modparam("dialog", "timeout_avp", "$avp(i:10)")
modparam("dialog", "enable_stats", 1)
modparam("dialog", "db_url", DBURL)
modparam("dialog", "db_mode", 1)
modparam("dialog", "db_update_period", 120)
modparam("dialog", "table_name", "dialog")

seting db_mode – synchronisation of dialog information from memory to an underlying database has following options
0 – NO_DB – the memory content is not flushed into DB;
1 – REALTIME – any dialog information changes will be reflected into the database immediately.
2 – DELAYED – the dialog information changes will be flushed into DB periodically, based on a timer routine.
3 – SHUTDOWN – the dialog information will be flushed into DB only at shutdown – no runtime updates.

note :

  • use the same hash_size while using diff kamailio to restore dialogs

database table for dialogue

  1. install mysql
  2. define root ( with db create permissions ) and user ( with database read wrote ) permission in kamctlrc
vi /usr/local/etc/kamailio/kamctlrc
  • Dialogue table schema *
name type size default null key extra attributes description
id unsigned int 10 no primary autoincrement unique ID
hash_entry unsigned int 10 no Number of the hash entry in the dialog hash table
hash_id unsigned int 10 no The ID on the hash entry
callid string 255 no Call-ID of the dialog
from_uri string 128 no URI of the FROM header (as per INVITE)
from_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog.
to_uri string 128 no URI of the TO header (as per INVITE)
to_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog.
caller_cseq string 20 no Last Cseq number on the caller side.
callee_cseq string 20 no Last Cseq number on the caller side.
caller_route_set string 512 yes Route set on the caller side.
callee_route_set string 512 yes Route set on on the caller side.
caller_contact string 128 no Caller's contact uri.
callee_contact string 128 no Callee's contact uri.
caller_sock string 64 no Local socket used to communicate with caller
callee_sock string 64 no Local socket used to communicate with callee
state unsigned int 10 no The state of the dialog.
start_time unsigned int 10 no The timestamp (unix time) when the dialog was confirmed.
timeout unsigned int 10 0 no The timestamp (unix time) when the dialog will expire.
sflags unsigned int 10 0 no The flags to set for dialog and accesible from config file.
iflags unsigned int 10 0 no The internal flags for dialog.
toroute_name string 32 yes The name of route to be executed at dialog timeout.
req_uri string 128 no The URI of initial request in dialog
xdata string 512 yes Extra data associated to the dialog (e.g., serialized profiles).

Siptrace module

SIPtrace module offer a possibility to store incoming and outgoing SIP messages in a database and/or duplicate to the capturing server (using HEP, the Homer encapsulation protocol, or plain SIP mode).

loadmodule "siptrace.so"
modparam("siptrace", "duplicate_uri", "sip:")
modparam("siptrace", "hep_mode_on", 1)
modparam("siptrace", "trace_to_database", 0)
modparam("siptrace", "trace_flag", 22)
modparam("siptrace", "trace_on", 1)

integrating iut with request route to start duplicating the sip messages


  • trace_mode * 1 – uses core events triggered when receiving or sending SIP traffic to mirror traffic to a SIP capture server using HEP 0 – no automatic mirroring of SIP traffic via HEP.


address in form of a SIP URI where to send a duplicate of traced message. It uses UDP all the time.

modparam("siptrace", "duplicate_uri", "sip:")

to check the duplicate messages arriving

ngrep -W byline -d any port 9060 -q

RPC commands

Can ruen sip trace on or off

kamcmd> siptrace.status on   

and to check

kamcmd> siptrace.status check

Store sip_trace in database

modparam("siptrace", "trace_to_database", 1)
modparam("siptrace", "db_url", DBURL)
modparam("siptrace", "table", "sip_trace")

where the sip_trace tabel description is

| Field       | Type             | Null | Key | Default             | Extra          |
| id          | int(10) unsigned | NO   | PRI | NULL                | auto_increment |
| time_stamp  | datetime         | NO   | MUL | 2000-01-01 00:00:01 |                |
| time_us     | int(10) unsigned | NO   |     | 0                   |                |
| callid      | varchar(255)     | NO   | MUL |                     |                |
| traced_user | varchar(128)     | NO   | MUL |                     |                |
| msg         | mediumtext       | NO   |     | NULL                |                |
| method      | varchar(50)      | NO   |     |                     |                |
| status      | varchar(128)     | NO   |     |                     |                |
| fromip      | varchar(50)      | NO   | MUL |                     |                |
| toip        | varchar(50)      | NO   |     |                     |                |
| fromtag     | varchar(64)      | NO   |     |                     |                |
| totag       | varchar(64)      | NO   |     |                     |                |
| direction   | varchar(4)       | NO   |     |                     |                |

sample databse storage for sip traces

select * from sip_trace;

| id | time_stamp          | time_us | callid  | traced_user | msg         | method | status | fromip                   | toip                     | fromtag  | totag    | direction |
|  1 | 2019-07-18 09:00:18 |  417484 | MTlhY2VmNDdjN2QxZGM5ZDFhMWRhZThhZDU4YjE0MGM |             | INVITE sip:altanai@sip_addr;transport=udp SIP/2.0
Via: SIP/2.0/UDP local_addr:25584;branch=z9hG4bK-d8754z-1f5a337092a84122-1---d8754z-;rport
Max-Forwards: 70
Contact: <sip:derek@call_addr:7086;transport=udp>
To: <sip:altanai@sip_addr>
From: <sip:derek@sip_addr>;tag=de523549
Content-Type: application/sdp
Supported: replaces
User-Agent: Bria 3 release 3.5.5 stamp 71243
Content-Length: 214

o=- 1563440415743829 1 IN IP4 local_addr
s=Bria 3 release 3.5.5 stamp 71243
c=IN IP4 local_addr
t=0 0
m=audio 59814 RTP/AVP 9 8 0 101
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendrecv                                                                                                                                                                                      | INVITE |        | udp:caller_addr:27982 | udp:sip_pvt_addr:5060   | de523549 |          | in        |

|  2 | 2019-07-18 09:00:18 |  421675 | MTlhY2VmNDdjN2QxZGM5ZDFhMWRhZThhZDU4YjE0MGM |             | SIP/2.0 100 trying -- your call is important to us
Via: SIP/2.0/UDP local_addr:25584;branch=z9hG4bK-d8754z-1f5a337092a84122-1---d8754z-;rport=27982;received=caller_addr
To: <sip:altanai@sip_addr>
From: <sip:derek@sip_addr>;tag=de523549
Server: kamailio (5.2.3 (x86_64/linux))
Content-Length: 0                                                                                                                                                                                                                                                                                                                                                                                                                                                           | ACK    |        | udp:caller_addr:27982 | udp:local_addr:5060   | de523549 | b2d8ad3f | in       |


Multi-Protocol Go HEP Capture Agent made   https://github.com/sipcapture/heplify

wget https://dl.google.com/go/go1.11.2.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.11.2.linux-amd64.tar.gz

move package to /usr/local/go

mv go 

Either add go bin to ~/.profile

export PATH=$PATH:/usr/local/go/bin

and apply

source ~/.profile

or set GO ROOT , and GOPATH

export GOROOT=/usr/local/go
export GOPATH=$HOME/heplify
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

installation of dependencies

go get

clone heplify repo and make



New OSS Capture-Agent framework with capture suitable for SIP, XMPP and more. With internal method filtering , encryption and authetication this does look very promising howevr since I have perosnally not tried it yet , I will leave this space TBD for future



Other include Sipgrep , HEPipe and nProbe


Multi-Protocol HEP Server & Switch in NodeJS. stand-alone HEP Capture Server designed for HOMER7 capable of emitting indexed datasets and tagged timeseries to multiple backends


node hepop.js -c /app/myconfig.js

PCAP monitoring -> Homer Server -> Notification and Fraud Prevention

A realtime monitoring and alerting setup fom homer can best safeguard on VoIP specific attacks and suspecious activity by early warning . Some list of attacks such as DDOS , SIP SQL injections , parser , remote manipulation hijacking as cell as resource enumeration are common ifor a cloud telephony provider.

Adiitionally homer provide session quality using varables that include [1]

SD = Session Defects

ISA = Ineffective Session Attempts

AHR = Average HOP Requests

ASR = Answer Seizure Ratio
[(‘200’ / (INVITES – AUTH – SUM(3XX))) * 100]

NER = Network Efficiency Ratio
[(‘200’ + (‘486′,’487′,’603’) / (INVITES -AUTH-(SUM(30x)) * 100]

HOMER Web Interface or Custom Dashboard

Some more visualization for inter team communication such as NOC team can include

Homer Integration with influx DB

time series Reltiem DB install

wget https://dl.influxdata.com/influxdb/releases/influxdb_1.7.7_amd64.deb
sudo dpkg -i influxdb_1.7.7_amd64.deb


 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

2019-07-19T07:03:04.603494Z	info	InfluxDB starting	{"log_id": "0GjGVvbW000", "version": "1.7.7", "branch": "1.7", "commit": "f8fdf652f348fc9980997fe1c972e2b79ddd13b0"}
2019-07-19T07:03:04.603756Z	info	Go runtime	{"log_id": "0GjGVvbW000", "version": "go1.11", "maxprocs": 1}
2019-07-19T07:03:04.707567Z	info	Using data dir	{"log_id": "0GjGVvbW000", "service": "store", "path": "/var/lib/influxdb/data"}

For Kamailio integration follow github instructions on https://github.com/altanai/kamailioexamples

References :

[1] https://www.kamailio.org/events/2013-KamailioWorld/13-Alexandr.Dubovikov-Homer-SIP-Capture.pdf

[2] HEP/EEP – https://github.com/sipcapture/hep

[3] kamailio sipdump module – https://www.kamailio.org/docs/modules/devel/modules/sipdump.html

[4] https://github.com/sipcapture/HEPop

[5] HOMER Big Data – https://github.com/sipcapture/homer/wiki/Homer-Bigdata

Energy Efficient VoIP systems

Data Centres are the concentrated processing units for the amazing Internet that is driving the technological innovation of our generation and has become the backbone of our global economy. DataCentres not only process , store and carry textual data rather a vast amount of computing is for multimedia content which could range from social media to, video streaming or VoIP calls. In this article let us analyze the energy effiiciency , carbon footprint and scope of improvements for a VoIP related data centre which hosts SIP and related RTC technology signalling and media servers and process CDRs and/or media files for playback or recordings.

Just like a regular IT datacentre , storage, computing power and network capacity define the usage of the server.Also unobstructed electricty of of paramount importance as any blackout could drop ongoing calls and lead to loss of revenue for the service provider not to forget the loss caused to parties engaged in call.

Increasing Power consumption by telecom Sector over the years

Global PPA(power purchase agreements) volumes by sector, 2009-2019, IEA, Global PPA volumes by sector, 2009-2019, IEA, Paris https://www.iea.org/data-and-statistics/charts/global-ppa-volumes-by-sector-2009-2019
source : CDP The 3% solution 2013 [19]

Typical VoIP Setup : Whether a cloud Infrstrcture provider of a hosted data centre , an aproximate number of 7 servers is required even for SME ( small to medium enterprises ) communication system and VoIP systems

  • 2 signalling servers primary and standby for HA ,
  • 2 media server for MCU of media bridges or IVR playback etc ,
  • 1 for CDR , logs or call analytics , stats and other supplementary operation
  • 1 for dev or engineering team .
  • 1 edge server could be API server or a gateway or laodbalancer.
Sample voIP system

VoIP solutions are more energy expensive, unless aggressive power saving schemes are in place

Comparison of energy efficiency in PSTN and VoIP systems [14]

While PSTN and other hybrid scenarios relied on audio only communication the embedded systems involved took great pain to make then energy efficient which is not really the case with all digital and software based VoIP.

Power Consumption

Mobile phone :  Typical smartphone with 4,000mAh ( 4 Ah) battery that gets 1 full cycle of usage a day. Daily consumption =4Ah*3.7V=14.8 Wh

Laptop : With  14–15″ screen, a laptop can draw 60 watts power in active use depending on model. Runing 8 hours a day can be 60 * 8 = 480 Wh ( 0.480 kWh) energy consumed in a day.

Desktop PC : Runing at 50-60 Hz frequency , can upto draw 200 W power in active use. For 8 hours energy usage 200 * 8 = 1600 Wh ( 1.6 kWh ) energy a day.

Server : Even though servers are virtual to the request maker , they caters to the request on the other end of the internet.

ServerPurposeServer CPU consumptionClientsClient CPU consumption
ApplicationHosts an application, which can be run through a web browser or customized client software.mediumAny network device with access.low
ComputingMakes available CPU and memory to the client. This type of server might be a supercomputer or mainframe.highAny networked computer that requires more CPU power and RAM to complete an activity.medium
DatabaseMaintains and provides access to any database.lowAny form of software that requires access to structured data.low
FileMakes available shared files and folders across a network.mediumAny client that needs access to shared resources.low
GameProvisions a multiplayer game environment.highPersonal computers, tablets, smartphones, or game consoles.high
MailHosts your email and makes it available across the network.mediumUser of email applications.low
MediaEnables media streaming of digital video or audio over a network.highWeb and mobile applications.high
PrintShares printers over a network.lowAny device that needs to print.low
WebHosts webpages either on the internet or on private internal networks.mediumAny device with a browser.medium
CPU consumption of various server types and their clients

Typically runing on 850 Wh ( 0.850 kwh ) of energy in an hour and since server are usually up 24*7 that totals to

0.850 * 24 = 20.4 kWh a day [2].

VoIP System ( 7 VM’s) : For a setup of 7 VM’s ( could on a the same PM), total energy consumed in a day

20.4 * 7 = 142.8 kWh.

Data centre: The data centre building consists of the infrastructure to support the servers, disks and networking equipment it contains. However, for simplicity, I will only use the consumption of servers and ignore the cooling units, networking, backup batteries charging, generators, lightning, fire suppression, maintenance etc.

High tier DC can have 100 Megawatts of capacity having each rack was using 25 kW of power in a 52U Rack. 100,000 kW / 25 kW = 4,000 racks * 52(U) = 208000 1U servers. This number scales down depending on how much energy each server uses and idle servers.

Total energy 100,000 kW * 24 hours = 2400000 kWh

Carbon Footprint

Carbon footprint in the context of this article refers to the amount of greenhouse gas ( consisting majorly of Co2) caused by electricity consumption. The unit is carbon emission equivalent of the total amount of electricity consumed kg CO2 per kWh.

In doing this calculation I have assumed 0.233 kg CO2 per kWh which could be less or more depending on the generation profile of the electricity provider as well as the heat produced by the machine.

Laptop: Aside from the production which could be 61.4 kg (135.5 lbs) of Co2, a 60W laptop will produce 0.112 kg co2 eq per day.

Desktop PC: Aside from production cost and heating, the GWG and co2 eq emission from running a desktop for a day ( 8 hours) produces 1.6* 0.233 = 0.3728 kg CO2 per kWh

Server : 20.4 * 0.233 = 4.7532 kg CO2 per kWh per day .

VoIP System ( 7 VM’s): Again ignoring the GWG emission of associated components, 142.8 * 0.233 = 33.2724 kg CO2 per kWh per day. It is to be noted that DC’s ( datacentres) use the term PUE ( Power Usage Effectiveness) to showcase their energy efficiency and energy efficiency certification uses the same in ratings.

Data centre: electrical carbon footprint( approximate calculation not counting the cooling, infra maintenance, lightning and possibly idle servers in datacentre) is 2400000 * 0.233 = 559200 kg CO2 per kWh per day

It is to be noted that a common figure should not be extrapolated like this to derive carbon emission. The emission depends on the fuel mix of the electricity generation as well as the life cycle assessment (LCA) of carbon equivalent emission. Countries with heavy reliance on renewables have lower co2 footprint per kWh ~ 0.013 kg co2 per kWh Sweden while others may have higher such as 0.819 kg CO2 per kWh Estonia [1].

Flatten the Curve from Tech and Internet usage

Rack servers tend to be the main perpetrators of wasting energy and represent the largest portion of the IT
energy load in a typical data center.

A decade ago, small enterprise IT facilities were quick to create data centres for hosting applications from hospitals, banks, insurance companies. While some of these is likely to have been upgrade to shared server instances runing on IaaS providers, most of them are still serving traffic or stays there for the lack of effort to upgrade.

With the advancement in p2p technlogies such as dApps , bitcoion network , p2p webrtc streaming , more edge computed ML continue to create disruptions in existing trend , most likely to result in in many fold increase in consumption.

According to the Cambridge Center for Alternative Finance (CCAF), Bitcoin currently consumes around 110 Terawatt Hours per year — 0.55% of global electricity production

Harward Business Review [12]

“the emissions generated by watching 30 minutes of Netflix (1.6 kg of CO2) is the same as driving almost four miles.” 

EnergyInnovation [13]

Cloud Computing and Energy efficiency

Cloud computing ( SaaS, PaaS , IaaS and also CPaaS) minimize power consumption and consequently IT costs via virtualization, clustering and dynamic configuration.

With cloud infrastrcture vendors such as Amazon , Google , microsoft .. and their adoption of energy efficiency computing and credible transparency has alleviated some of the stress that could have been made if onsite self – hosted data centres were used as often in mainstream as a decade ago.

Even as cloud providers gives on -demand access to shared resources in large scale distributed computing , the ease of getting on board has inturn created a surge in cloud hosted online applications consequently high power consumption, more operation costs and higher CO2 emissions.

Components of energy Consumption in Data Centre

As shown CPU, Memory, and Storage incur 45% of the costs and consume 26% of the total energy , however power distribution and colling cost 25% but consumer >50% of total energy.

Energy forcast for Data Centres

As reported by nature [3] the widely cited forcasts suggested thte total electrcity demand of ICT ( Informatioin and Communication technology ) will accelerate and while consumer devices such as smart TV , laptops and mobile are becoming energy effcient , the data centres and network devices will demand bigger portions. Reported in 2018 , 200 Twh( terawatt hours) of energy was being consumed by data centers . Although there are no figures for the telecom or specifically IP cloud telephony , the assumption that enormous multimedia data flows in every session is enouogh to assume the figure must be huge.

Energy eficiency in data centres have also been the subject of many papers and studies. Many of the tech advancements and measures have so far been able to keep the growth in energy requirnments by tech sector to a linear/ flat one.

past and projected growth rate of total US data center energy use from 2000 until 2020. It also illustrates how much faster data center energy use would grow if the industry, hypothetically, did not make any further efficiency improvements after 2010. (Source: US Department of Energy, Lawrence Berkeley National Laboratory)

Some noteworthy innovations made in Data centre for energy efficiency include –

  1. Star efficiency requirnments
  • Average server utilization
  • Server power scaling at low utilization
  • Average power draw of hard disk drives
  • Average power draw of network ports
  • Average infrastructure efficiency (i.e., PUE)

PUE = Total Facility power / IT equipment power

Standard 2.0, Good 1.4 , Better 1.1

Low PUE indicates greater efficicny since more power would then be used by It gear . Idealistically 1 should be the perfect score where all power was used only by the IT gears.

2. Optimizing the cooling system which takes a lot of focus is also not touched upon here but can be understood in great detail from very many sources including one here on how google uses AI for cooling its Datacentres [6]

3. Throttle-down drive ,a device that reduces energy consumption on idle processors, so that when a server is running at its typical 20% utilization it is not drawing full power

Energy efficiency is vital to not only productivity and performance but also to carbon neutral tech and economy. There is ample scope to designing energy efficient applications and platfroms. Some approaches are described below:

Energy Efficiency in VoIP Architecture and design

Low Energy consumption not only lowers operating cost but also helps the enviornment by reducing carbon emission.

1.Server Virtualization

By consolidating multiple independant servers to a single underlying physical server helps retain the logical sepration while also maintaining the energy costs and maximizinng utilization . VM’s( Virtual machines) are instances of virtaulized portions on the same server and can be independetly accesed using its own IP and network settings.

To reduce electricity usage in our labs and data centers, we use smart power distribution units to monitor
our lab equipment. We increase server utilization by using virtual machines. Our Cisco Customer
Experience labs use a check-in, check-out system of automation pods to allow lab employees to set up
configurations virtually and then release equipment when they are finished with it.

Cisco 2020 Environment Technical Review [20]

Models to place VMs on PM ( physical machine ) have been proposed by Dong et al[8] , Huang[9]  ,Tian et al [10]

2.Decommissioning old / outdated servers

While this is the most obvious way to increase efficiency , it is also the toughest since legacy applications or a small portion of it may be running on a server that service providers are not keen on updating or updates do not exist and it is past end of life yet somehow still in use. It is important to identify such components. Check if maybe an old glassfish or bea weblogic SIP servlet server needs updating and/or migration !

3.Plan HA ( high availability ) efficiently

Redundant servers take only if at all any , partial loads so they can be activated in full swing when failover happens in other server. With quick load up times and forward looking monitoring , the analyzers can monitor logs for upcoming failure or predictable downtime and infra script can bring up pre designed containers in seconds if not minutes. It isn’t wise to create more than 1 standby server which does no essential work but consumes as much power.

4.Consolidate individual applications on a Server

Map the maximum precitable load and deduce the percentage comsuption with teh same . In view of these figures it is best to consolidate applications servers to be run on a single server . A distributed microservice based architecture can also support consolidation by runing each major application in its own dockerized container. Consolidation ensures that

  • All data can be stored and accessed centrally, which reduces the likelihood of data duplication.
  • while a server is drawing full power , it is also showing relataible utilization.
  • Single point to prevent intrusion , provide security and fix vulnerabilities against malware like ( ransomware , viruses , spyware , trojans)

5.Reduce redundancy

While it is a common practise to store multiple copies of data such as CDR ( call detail records ) and archiev historical logs for later auditing , it is not the most energy efficient way since it ends up wasting stoarge space. It is infact a better approach to skim only the crtical parts and diacard the rest and definetely implement background tasks to compress the older and less referenced logs.

6.Power management

Powering down idle server or putting unused server to sleep is an effective way to reduce operating power but is often ignored by the IT department in view of risking slower performance and failure in call continuity in case a server does go down. However power management leads to potential energy savings and should be weighted accordingly.

7.Common Storage such as Network Attached Storage

Power consumption is roughly linear to the number of storage modules used. Storage redundancy needs to be
right-sized to avoid rapid consumption of avaible storage space , CPU cycles to refer and index them, its associated power consumption [7].

The process of maximizing storage capacity utilization by drawing from a common pool of shared storage on need baisis also allows for flexixbility.

It is sensible to take the data offline thereby reducing clutter on production system and make the existing data quickly retrievable.

8.Sharing other IT resources

Central Processing Units (CPU), disk drives, and memory optimizes electrical usage. Short term load shifting combined with throttling resources up and down as demand dictates improves long term hardware energy efficiency. [7]

Hardware based approaches such as energy star rating, air conditoning , placement of server racks , air flow , cabling etc have not been touched upon in this article they can be read from energystar report here [5] .

9. DMZ / Perimeter network

The perimeter network (also known as DMZ, demilitarized zone, and screened subnet) is a zone where resources and services accessible from outside the organization are available. Often used as barrier between internal secure green zone within company and outside partners / suppliers such as external organization gateways.

  • Load balancers
  • API gateways
  • SBC ( Session Border controllers)
  • Media Gateways

Ways to cut down on CPU consumption in DMZ machines

  1. Scrutinize incoming traffic only , trust outgoing traffic .

2. Use hardware / network firewalls to monitor and block instead of software defined ones . Hardware firewall can be a standalone physical device or form part of another device on your network. Physical devices like routers, for example, already have a built-in firewall. 

Other types of firewalls

  • Application-layer firewalls can be a physical appliance, or software-based, like a plug-in or a filter. These types of firewalls target your applications. For example, they could affect how requests for HTTP connections are inspected across each of your applications.
  • Packet filtering firewalls scrutinize each data packet as it travels through your network. Based on rules you configure, they decide whether to block the specific packet or not. For example firewalls can block SSH/RDP for remote management.
  • Circuit-level firewalls check whether TCP and UDP connections across your network are valid before data is exchanged. For example, this type of firewall might first check whether the source and destination addresses, the user, the time, and date meet certain defined rules.
  • Proxy server firewalls secure the traffic into and out of a network by monitor, filter, and cache data requests to and from the network.

Energy Efficiency in VoIP Applications and algorithms

In theory, energy efficient algorithms would take less processing power , run fewer CPU cycles and consume less memory. For the experiments with WebRTC and SIP VoIP systems CPU performance can be reliable factor to consider for carbon emissions . Here is list of approaches to include energy as of the parameters in programing for RTC applications.

  1. Take advanatge of Multi Core applications

Multi-core processor chips allow simultaneous processing of multiple tasks, which leads to higher efficiency. Same power source and shared cooling leads to better efficiency . It is the same logic which applied to consolidating one power supply for a rach isntead of individual power supply to each servers on rack.

2. Reduce Buffering

Input/Output buffer pile up comuted packets or blocks which will come inot use in near future but may be discarded all together in event of skip or shutdown. For example in case of video on Demand ( VoD) , a buffered video of 1 hour is of not much use if viewer decides to cancel the video session after 10 minutes .

3. Optimize memeory access algorithms

4. Network energy Management to vary as per demand

The newer generations of network equipment pack more throughput per unit of power. There are active energy management measures that can also be applied to reduce energy usage as network demand varies. In a telecoomunication system , almost always a tradeof between power consumption and network performance is made.

  1. Quick switching of speed of the network to match the amount of data that is currently transmitted. A demand following streaming session will maingtain the QoS , avoid imbalance while also reducing power consumption.

2. Avoid sudden burst and peaks and/or align them with energy availaibility .


  • computational performance (i.e., computations/second per server),
  • electrical efficiency of computations (i.e., computations per kWh),
  • storage capacity (i.e., TB per drive), and
  • port speeds (i.e., Gb per port)

5. Task Scheduling algorithms

Some recent researched frameworks and models take Co2 emission into prespective , while allocating resources according to queuing model. The most efficient ones not only bring down the carbon footprint but also the high operating cost [11].

Scheduling and monitoring techniques have been applied to achieve a cost effective and power-aware cloud environment by reducing the resource exploitation

6. Centralised operation – RTP topology ( Mesh , MCU and SFU)

Instead of operating many servers at low CPU utilization, at edge of client’s end, combines the processing
power onto fewer servers that operate at higher utilization.

Modern machine learning programs are computationally intensive, and their integration in VoIP systems for tagging , sentiment analysis , voice quality analysis is increasingly adding additional strain already heavy processing of media server in transcoding and multiplexing .

Media Server using SFU ( Selective Forwarding unit) to transmit mediastrem

As an example a SFU client sends one upstream but receives 4 downstreams which reduces the load on server but increases on clients .

7. Distributing workload based on server performance

Aggregating tasks and runing them as Serverless , asynchronous jobs instead of standalone processes is very efficient way to cut down idle runing wastage. Additioally catagorizing server workloads based on server performance can also reduce power consumption by using idle servers efficiently. Thermal aware workload distribution also helps reducing power consumption and consequently electricity consumption in cooling .

8 . Reduce reauthetication and challemge response mechanism when it can be avoided.

There exists multiple modes to authenticate and authorize users and application access to server content

Over the network

  • password based auth ,
  • third party based auth ( Oauth)
  • 2 factors authetication( phone/sms based) ,
  • multi factor auth ( sms / email / other media) ,
  • token auth ( custom USB device/ smart card ) ,
  • biometric auth (physical human charecteristics / scanners ) ,
  • transactional auth ( location , hour of day , browser/ machine type)

Computer recognition authentication

  • Single sign-on

Authentication protocols

  • Kerbos – Key Distribution Center (KDC) using a Ticker gransting Server ( TGS)

A callflow involves AAA while creating the session and may require occsional re authetication to reafform the user is intended one. Doing re-authtication too often increases the power consumption and can be countered by caching and timeout mechanism.

Point of presence and handover using Carbon footprint in different demographics

  1. Include Carbon emission from Datacentre in condieration before engaging the server in call path from load balancer gateway

2. Use point of presence ( PoP) for server according to their carbon emission factor in the demography .

Us states carbon emission rate from electricity generation (2018 report ) Source : [16]
UK greenhouse gas reporting source : [17]

Energy Efficiency in WebRTC browser applications and native applications

In a Video conferencing the over browser, WebRTC has emerged as te the default standard . The efficiency of sch webrtc browser based video conferencing web applications can be enhanced in the following ways :

1.Use VoIP Push Notifications to Avoid Persistent Connections

2. Voice Activity detection ( Mute the spectators ) and join with video true , audio false for attendeees

Energy efficiency in VoIP phones

If all eligible VoIP phones sold in the United States were ENERGY STAR certified, the energy cost savings would grow to more than $65 million each year and 1.2 billion pounds of annual greenhouse gas emissions would be prevented, equivalent to the emissions from more than 119,000 vehicles.

Energystart [15]

Low-energy-consuming embedded hardware on most phones keep the average consumption low . A analog phone can consume power between 0.07 W to 9.27 W while a VoIP phone can consume 0.1W to 3.5 W of standby power.

Off mode power is often less than standby power since phone is on low power model during idle hours such as night . According to energy star Sund transmission mechnism also plays a key role and hybrid phones consume more power.

Power allowance (W) for each of the below features of the device:

  • 1.0 watt for Gigabit Ethernet
  • 0.2 watt for Energy Efficiency Ethernet 802.3az compliant Gigabit Ethernet

Additional proxy incentive(W) for the ability to maintain network presence while in a low power mode and intelligently wake when needed

  • 0.3 watt for base capability
  • 0.5 watt for remote wake

Government bodies and groups to track Energy efficiency of Telecom and IP telephony

  • Alliance for Telecommunications Industry Solutions (ATIS)
  • Telecommunications Energy Efficiency Ratio (TEER)
  • measurement method covers all power conversion and power distribution from the front end of the
    system to the data wire plug, including application-specific integrated circuits (ASICs).
  • European Telecommunications Standards Institute (ETSI)
  • International Telecommunication Union (ITU)
  • U.S. Department of Energy (DOE), Environmental Protection Agency (EPA)

External links

Amazon : https://sustainability.aboutamazon.com/environment/sustainable-operations/carbon-footprint

Cisco : https://www.cisco.com/c/dam/m/en_us/about/csr/esg-hub/_pdf/2020_Environment_Technical_Review.pdf

3CX : https://askozia.com/voip/how-can-i-save-energy-with-green-voip-and-my-ip-pbx/

The purpose of the article is to raise awareness about carbon footprint from application programs to archietcture designs techniques to data centres and commuulative performance. It gives a direction to stakeholders (customers , programmers , architects , mangers , … ) to choose less carbon emitting approach whenever possible since every bit counts to help the environment.


[1] rensmart.com https://www.rensmart.com/Calculators/KWH-to-CO2

[2] https://www.zdnet.com/article/toolkit-calculate-datacenter-server-power-usage/

[3] nature : https://www.nature.com/articles/d41586-018-06610-y

[4] Center of Expertise for Energy Efficiency in Data Centers at the US Department of Energy’s Lawrence Berkeley National Laboratory in Berkeley, California. https://datacenters.lbl.gov/

[5] energy Star – https://www.energystar.gov/sites/default/files/asset/document/DataCenter-Top12-Brochure-Final.pdf

[6] https://www.blog.google/inside-google/infrastructure/safety-first-ai-autonomous-data-center-cooling-and-industrial-control/

[7] https://www.energy.gov/sites/default/files/2013/10/f3/eedatacenterbestpractices.pdf

[8] Yin K, Wang S, Wang G, Cai Z, Chen Y. Optimizing deployment of VMs in cloud computing environment. In: Proceedings of the 3rd international conference on computer science and network technology. IEEE; 2013. p. 703–06.

[9] Huang W, Li X, Qian Z. An energy efficient virtual machine placement algorithm with balanced resource utilization. In: Proceedings of the seventh IEEE international conference on innovative mobile and internet services in ubiquitous computing; 2013. p. 313–19.

[10] W. Tian, C.S. Yeo, R. Xue, Y. Zhong Power-aware schedulingof real-time virtual machines in cloud data centers considering fixed processing intervalsProc IEEE, 1 (2012), pp. 269-273

[11] H. Chen, X. Zhu, H. Guo, J. Zhu, X. Qin, J. Wu Towards energy-efficient scheduling for real-time tasks under uncertain Cloud computing environmentJ Syst Softw, 99 (2015), pp. 20-35

[12] https://hbr.org/2021/05/how-much-energy-does-bitcoin-actually-consume

[13] https://energyinnovation.org/2020/03/17/how-much-energy-do-data-centers-really-use/

[14] F. Bota, F. Khuhawar, M. Mellia and M. Meo, “Comparison of energy efficiency in PSTN and VoIP systems,” 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), 2012, pp. 1-4, doi: 10.1145/2208828.2208834. https://citeseerx.ist.psu.edu/viewdoc/download?doi=

[15] https://www.energystar.gov/products/office_equipment/voice_over_internet_protocol_voip_phone

[16] egrid summary table 2018 for carbon emission rate in Us states : https://www.epa.gov/sites/default/files/2020-01/documents/egrid2018_summary_tables.pdf

[17] UK greenhourse gas reporting – https://www.gov.uk/government/publications/greenhouse-gas-reporting-conversion-factors-2018

[19] http://assets.worldwildlife.org/publications/575/files/original/The_3_Percent_Solution_-_June_10.pdf?1371151781

[20] https://www.cisco.com/c/dam/m/en_us/about/csr/esg-hub/_pdf/2020_Environment_Technical_Review.pdf

[21] It’s Not Easy Being Green by Peter Xiang Gao, Andrew R. Curtis, Bernard Wong, S. Keshav
Cheriton School of Computer Sc https://dl.acm.org/doi/pdf/10.1145/2342356.2342398

Why Lua is a good choice for Scripting call configurations in SIP servers like Kamailio and Freeswitch

Programing in SIP servers enables the IP telephony provider to add complex control that is difficult to realise with simple dialplan XML and IVR menus. These are best handled by using a program that is compiled with the telecom application server and invoked by SIP requests or responses in the session. This may include

  • using policy control or dynamic input to control call routing or blacklisting
  • transcription for voicemail
  • media file playback with dynamic text to speech ….so on.

Common Freeswitch , opensips , Kamailio and Astersik suppored programing engines may include python, java, c++, javascript. Opensips and kamailio also include XML_RPC, HTTP API and Websockets as additional means of adding call control login in telephony sever.

Kamailo modules
Opensips modules
Freeswitch modules

Lua (https://www.lua.org) is a small, powerful and lightweight scripting language, mostly used for embedded and gaming use cases. Among many programming engines supported by FreeSWITCH and Kamailio, Lua is very handy to add business logic to call control by integrating with the telecom server.

Form the a multiple choice, Lua is the prefered language for scripting in SIP server which is due to

  1. Does not requie recompilation
    • Saves on the effort to resatrt the freeswitch server while loading updated script
    • this in turn saves service disruption for the time server woulve taken to shutdown and restart
  2. Can ve sync or asyn
    • lua : runs in current thread and waits for script completion
    • luarun : runs in seprate thread and returns immediately

Freeswitch Lua Integration

To load the program

<action aplication="lua" data="mainprog.lua">

1. In the program, we could get status and print to console log

local api = freeswitch.API()
local status = api:execute("status")

2. we could also check is session is active and play a file inot the call

if session:ready() then

3.Program to answer call , play file and hangup using session class methods

-- Answer call, play a prompt, hang up

-- Create a string with path and filename of a sound file
pathsep = '/'
-- Windows users do this instead pathsep = ''
prompt ="ivr" ..pathsep .."ivr-welcome_to_freeswitch.wav"

-- Play the prompt
freeswitch.consoleLog("WARNING","About to play '" .. prompt .."'n")

-- Hangup
freeswitch.consoleLog("WARNING","After hangup")


[INFO] mod_dialplan_xml.c:637 Processing altanai <altanai>->5000 in context public
EXECUTE sofia/internal/altanai@x.x.x.x lua(/etc/freeswitch/dialplan/lua_session_answer_prompt_hangup.lua)
[DEBUG] switch_channel.c:3781 (sofia/internal/altanai@x.x.x.x) Callstate Change EARLY -> ACTIVE
[WARNING] switch_cpp.cpp:1376 About to play 'ivr/ivr-welcome_to_freeswitch.wav
[DEBUG] switch_ivr_play_say.c:1942 done playing file /usr/share/freeswitch/sounds/en/us/callie/ivr/ivr-welcome_to_freeswitch.wav
[DEBUG] switch_cpp.cpp:731 CoreSession::hangup
[NOTICE] switch_cpp.cpp:733 Hangup sofia/internal/altanai@x.x.x.x [CS_EXECUTE] [NORMAL_CLEARING]
[WARNING] switch_cpp.cpp:1376 After hangup

other methods :

  • Initiate new session session:originate()
  • Record Audio session:recordFile()

5. Fire and consume Events

freeswitch.Event() and freeswitch.eventConsume() can be used to fire new events and consume events respectively. For instance to fire callback function on hangup session:setHangupHook()

6. IVR menus freeswitch:IVRMenu()

More examples



  1. lua https://www.lua.org/
  2. freeswitch https://freeswitch.org/confluence/display/FREESWITCH/Lua+API+Reference

TeleMedicine and WebRTC

Anywhere anytime Telemedicine communication tool accessible on any device.  The solution provides a low eight signalling server which drops out as soon as call is connected thus ensuring absolutely private calls without relaying or involving any central server in any call related data or media . This ensure doctor patient details are not processed , stored or recorded by our servers.

The solution enables doctors / nurses / medical practitioners and patients  to do

  • High definition Audio/video calls 
  • End to end encrypted p2p chats 
  • Integration with HMS ( hospital management system ) to fetch history of the patients 
  • Screens sharing to show reports without transferring them as files 
  • Include more concerned people of doctors using Mesh based peer to peer conferencing feature.      

Confidentialty and Privacy

For privacy and security of certain health information only HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant video-conferencing tools can only be used for Telemedicine in US.

Telemedicine scenario Callflow

Calllfow for Attended Call Transfer and 2 way conference in a Telemedicine scenario between Patient , hospital attendant , doctor and a nurse

References :

Performance of WebRTC sites and electron apps

This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.

WebRTC Clients

  1. Single-page applications (HTML5 + Js + CSS) on browser engine on OS
  2. Electron app 
    • Facebook Messenger, slack , twitch are some of the RTC based applications which have have electron clients as well.
  3. Web-view on mobile 
    • (-) doesn’t have advanced Webrtc API support eg, Media Recorder
  4. Native Applications on mobile OS( Android, iOS)
  5. Hybrid Applications (React Native)
  6. Embedded Device ( set-top box, IP camera, robots on raspberry pi)
    1. raw codecs libraries and gstt=reamer/FFmpeg script to create RTSP stream

Codecs weight

opus (111, minptime=10;useinbandfec=1)
VP8 (96)
frameWidth 640
frameHeight 480
framesPerSecond 30

DNS lookup Time

Services such as Pingdom (https://tools.pingdom.com/) or WebPageTest can quickly calculate your website’s DNS lookup times.

Load / sgtress testing for Caching and lookup times can be perfomed over tools such as LoadStorm , JMeter.

Alternatively use websoscket like setup inplace of non reusable TCP connection like HTTP or polling to set up signalling.

Bandwidth Estimation

Bandwidth can be estmated by

RTCP Receiver Reports which periodically summary to indicate packet loss rate and jitter etc from receiver.


TWCC (Transport Wide congestion Control ) calculates the intra-packet delays to estimate the Sender Side Bandwidth

a=rtcp-fb:96 transport-cc

REMB( Receiver Side Bandwidth Estimation) provide bandwidth estimation  by measuing the packet loss

  • used to configure the bitrate in video encoding
  • used to avoid congestion or slow media transmission
a=rtcp-fb:96 goog-remb

Best practices for WebRTC web clients

As a communication agent become a single HTML page driven client, a lot of authentication, heartbeat sync, web workers, signalling event-driven flow management resides on the same page along with the actual CPU consumption for the audio-video resources and media streams processing. This in turn can make the webpage heavy and many a time could result in a crash due to being ” unresponsive”.

Here are some my best to-dos for making sure the webrtc communication client page runs efficiently

Visual stability and CLS ( Cummulative Layout Shift)

CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.

To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .

Minimize main thread work

The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive.

Deprication of XMLHTTP request on main thread

Reduce Javacsipt execution Time

Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.

If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.

Some effective tips to spedding up JS execution include

  • minifying and compressing code
  • Removing the unused code and console.logs
  • Apply caching to save lookup time

Cookies – Security vs persistent state

Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.

While adding cookies we must ensure that if SameSite =None , the cookies must be secure

Set-Cookie: widget_session=abc123; SameSite=None; Secure

SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar. 

Set-Cookie: promo_shown=1; SameSite=Strict

You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.

Web Client Performance monitoring

Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .

Lighthouse results

Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App

Also shsows individual categories and comments

Time to render and Page load

Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has

  • Time to Title
  • Time to render
  • Time to inetract

Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports

Task interaction time

Other page interaction crtiteria includes the frames their inetraction and timings for the same.

In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.

Page’s total memeory



The above functions ( old and new ) estimates the memory usage of the entire web page

these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks. Can also use these memery metrics to do A/B testing .

Page weight and PRPL

Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the page light and active and prevent any chrome tab crashes.

PRPL expands to Push/preload , Render , PreCache , Lazy load

  • Render the initial route as soon as possible.
  • Pre-cache remaining assets.
  • Lazy load other routes and non-critical assets.

Preload is a declarative fetch request that tells the browser to request a resource as soon as possible. Hence should be used for crticial assests .

<link rel="preload" as="script" href="critical.js">

The non critical compoenents could then be loaded on async .

Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.

Web Workers

Web Workers are a simple means for web content to run scripts in background threads.The Worker interface spawns real OS-level threads

By acting as a proxy, service workers can fetch assets directly from the cache rather than the server on repeat visits. 

Native Applications on mobile OS

Threads and Cores



CPU profiling

Energy consumption

References :

5G and IMS

In the course of evolution of RAN ( Radio Access layer) technologies, 5G outsmarts 4G-2010 which comes in succession after 3G-2000, 2.5G, 2G -1990 and 1G/PSTN -1980 respectively. Among the most striking features of 5G are :-

  • IP based protocols
  • ability to connect 100x more devices ( IOT favourable )
  • speed upto 10 Gbit/s
  • high peak bit rate
  • high data volume per unit area
  • virtually 0 latency hence high response time

5G + IMS can accommodate the rapid growth of rich multimedia applications like OTT streaming of HD content, gaming, Augmented reality so on while enabling devices connected to the Internet of Things to onboard the telecommunication backbone with high system spectral efficiency and ubiquitous connectivity.


Infact 5G has seen maximum investment in year 2020 in revamping infrastrcuture as compared to other technologies such as IoT or even Cloud. This could be partly due to high rise in high speed communication for streaming and remote communication owining to steep rise in remote learning adn working from home scenarious.

img source statista – global-telecom-industry-priority-investment-areas


5G is specified to operate over range 1 GHz to 100 GHz.

  • Low-band spectrum (below 2.5 GHz) – excellent coverage,
  • mid- band spectrum (2.5–10 GHz) – a combination of good coverage and very high bitrates,
  • high band-spectrum (10–100 GHz) – the bandwidths needed for the highest bitrates (up to 20 Gb/s) and lowest latencies

Workplan for 5G standardisation and release

The Workplan started in 2014 and is ongoing as of now (2018). UPdate

image source : 3GPP “Getting ready for 5G”

3GPP is the standard defining body for telecom and has specified almost all RAN technologies like GSM , GPRS , W-CDMA , UMTS , EDGE , HSPAand LTE before .

5G Core Network

5G Core Network like LTE

5G + IMS

SDN + NFV for 5G deployment

SDN separates the virtualized network infrastructure from its logical architecture. which automates configuration for routing, security etc. 

It also helps in the management of infrastructure for scaling and availability.

Software-defined Networking (SDN) and Network Functions Virtualization (NFV) are advancing the deployment of 5G systems. The separation of user and control plane are essentially making the system very modular thereby increasing the application to various traffic types 

  • IMS signalling
  • Smart city sensors, cameras 
  • Web services 
  • Self-driving cars 
  • Real-Time Communications / VoIP
  • Augment Reality(AR) , Virtual Reality ( VR)
  • Real Time Gaming
  • Mission Critical Data / Push to Talk ( MCPTT)
  • buffered streaming ( non conversational Video)

Dynamic Network Slicing

Network Slicing allows mobile operators to partition a single network into multiple virtual networks. This allow network operator to use one physical network to cater to many kinds of service networks with varrying usecases around bandwidth, network latency, processing, resiliency, business requirnments.

Dynamic Network Slicing allows the network resources like radio networks, wire access, core, transport and edge networks to be divided into multiple logical networks to meet requirnments of diverse use cases. [2]

Horizontal Slicing (Infrastructure Sharing)Vertical Slicing (QoS Slicing)
The virtual infristructure is shared between different tenants for control and operations ( think IaaS)creating service instances

Service Based Architecture (SBA)

Virtualization and slicing allow us to create Service Based Architectures ( SBA). This allows control plane and user plane sepration( CUPS). It also allows sepration between access and core network.

The modular function design allows concurrent access to services as well as decoupling of stateless processors and statefull backend ( database).

  • (+) network capability exposure
  • (+) scalability
  • (+) redundancy

Applications of 5G

5G targets three main use case

  • enhanced mobile broadband (eMBB),
  • massive machine type communications (mMTC)
  • ultra-reliable low latency communications (URLLC) (also called critical machine type communications (cMTC))
sources : whitepaper ericsson


General Data Protection Regulation (GDPR) in VoIP

GDPR, Europe’s digital privacy legislation passed in 2018, replaces the 1995 EU Data Protection Directive. It is rules designed to give EU citizens more control over their personal data & strengthen privacy rights. It aims to simplify the regulatory environment for business and citizens.

To read about other Certificates , compliances and Security in VoIP which summaries

  • HIPAA (Health Insurance Portability and Accountability Act) ,
  • SOX( Sarbanes Oxley Act of 2002),
  • Privacy Related Compliance certificates like COPPA (Children’s Online Privacy Protection Act ) of 1998,
  • CPNI (Customer Proprietary Network Information) 2007,
  • GDPR (General Data Protection Regulation)  in European Union 2018,
  • California Consumer Privacy Act (CCPA) 2019,
  • Personal Data Protection Bill (PDP) – India 2018 and
  • also specifications against Robocalls and SPIT ( SPAM over Internet Telephony) among others

Multinational companies will predominantly be regulated by the supervisory authority where they have their “main establishment” or headquarter. However, the issue concerning GDPR is that it not only applies to any organisation operating within the EU, but also to any organisations outside of the EU which offer goods or services to customers or businesses in the EU.

Key Principles of GDPR are

  • Lawfulness, fairness and transparency
  • Purpose limitation
  • Data minimisation
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality (security)
  • Accountability

GDPR consists of 7 projects (DPO, Impact assessment, Portability, Notification of violations, Consent, Profiling, Certification and Lead authority) that will strengthen the control of personal data throughout the European Union.


stakeholders of data protection regulation are
Data Subject – an individual, a resident of the European Union, whose personal data are to be protected

Data Controller – an institution, business or a person processing the personal data e.g. e-commerce website.

Data Protection Officer – a person appointed by the Data Controller responsible for overseeing data protection practices.

Data Processor – a subject (company, institution) processing a data on behalf of the controller. It can be an online CRM app or company storing data in the cloud.

Data Authority – a public institution monitoring implementation of the regulations in the specific EU member country.

Extra-Territorial Scope

Any VoIP service provider may feel that since they are not based out of EU such as officially headquartered in the Asia Pacific or US region they may not be legally binding to GDPR. However, GDPR expands the territorial and material scope of EU data protection law.  It applies to both controllers and processors established in the EU, and those outside the EU, who offer goods or services to or monitor EU data subject.

VoIP service providers as Data Processors

A processor is a “person, public authority, agency or other body which processes personal data on behalf of the controller”.
Most VoIP service providers are multinational in nature with services offered directly or indirectly to all regions. The GDPR imposes direct statutory obligations on data processors, which means they will be subject to direct enforcement by supervisory authorities, fines, and compensation claims by data subjects. However, a processor’s liability will be limited to the extent that it has not complied with it’s statutory and contractual obligations.

Data minimization – It is now a good practise to store and process as less user’s personal data as necessary to render our services effectively. Also to maintain data for only a stipulated time ( approx 90 days of CDR for call details and logs )

Record Keeping, Accountability and governance

To show compliance with GDPR, a service provider maintain detailed records of processing activities. Also, they must implement technological and organisational measures to ensure, and be able to demonstrate, that processing is performed in accordance with the GDPR. Some ways to apply these are :

  • Contracts: putting written contracts in place with organisations that process personal data on your behalf
  • maintaining documentation of your processing activities
  • Organisational policies focus on Data protection by design and default – two-factor auth, strong passwords to guard against brute-force, encryption, focus on security in architecture
  • Risk analysis and impact assessments: for uses of personal data that are likely to result in a high risk to individuals’ interests
  • Audit by Data protection officer
  • Clear Codes of conduct
  • Certifications

As for a VOIP landscape thankfully every call or message session is followed by a CDR ( Calld Detail Record ) or MDR ( Message Detail Record).

Additionally, assign a unique signature to every data-access client the VoIP system and log every read/write operation carried out on data stores whether persistent datastores or system caches.

Privacy Notices to Subjects

User profile data such as :

  • Basic identity information, name, address and ID numbers
  • Web data such as location, IP address, cookie data and RFID tags
  • Health and genetic data
  • Bio-metric data
  • Racial or ethnic data
  • Political opinions
  • Sexual orientation

is protected strictly under GDPR rules

A service provider should provide indepth information to data subjects when collecting their personal data, to ensure fairness and transparency. They must provide the information in an easily accessible form, using clear and plain language.


The GDPR introduces a higher bar for relying on consent , requiring clear affirmative action. Silence, pre ticked boxes or inactivity will not be sufficient to constitute consent. Data subjects can withdraw their consent at any time, and it must be easy for them to do so.

Lawful basis for processing Data now include

In Article 6 of the GDPR , there are six available lawful bases for processing.

(a) Consent: the individual has given clear consent for you to process their personal data for a specific purpose.

(b) Contract: the processing is necessary for a contract you have with the individual, or because they have asked you to take specific steps before entering into a contract.

(c) Legal obligation: the processing is necessary for you to comply with the law (not including contractual obligations).

(d) Vital interests: the processing is necessary to protect someone’s life.

(e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law.

(f) Legitimate interests: the processing is necessary for your legitimate interests or the legitimate interests of a third party, unless there is a good reason to protect the individual’s personal data which overrides those legitimate interests.

File such as PCAPS , Recordings and transcripts of calls hold sensitive information from end users , these should be encryoted and inaccssible to even the dev teams within the org without explicit consent of end user .

Individuals’ Rights

The GDPR provides individuals with new and enhanced rights to Data subjects who will have more control over the processing of their personal data. A data subject access request can only be refused if it is manifestly unfounded or excessive, in particular because of its repetitive character.

Rights of Data Subjets include

  • Right of Access
  • Right to Rectification
  • Right to Be Forgotten
  • Right to Restriction of Processing
  • Right to Data Portability
  • Right to Object
  • Right to Object to Automated Decisionmaking

For a VoIP service provider if a user opts for redaction then none of his calls or messages should be traced in logs . Also replace distinguishable end user identifier such as phone number and sip uri with *** charecters

Provide option for “Account Deletion” and purge account – If a user wished to close his/her account , his/her detaisl should be deleted form the sustem except for the bare bones detaisl which are otherwise required for legal , taxation and accounting requirnments

Breach Notification

A controller is a “person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of processing of personal data”,

A controller will have a mandatory obligation to notify his supervisory authority of a data breach within 72 hours unless the breach is unlikely to result in a risk to the rights of data subjects. Will also have to notify affected data subjects where the breach is likely to result in a “high risk” to their rights. A processor, however, will only be obliged to report data breaches to controllers

International Data Transfers

Data transfers to countries outside the EEA(European Economic Area) continue to be prohibited unless that country ensures an adequate level of protection. The GDPR retains existing transfer mechanisms and provides for additional mechanisms, including approved codes of conduct and certification schemes.

The GDPR prohibits any non-EU court, tribunal or regulator from ordering the disclosure of personal data from EU companies unless it requests such disclosure under an international agreement, such as a mutual legal assistance treaty.

One of the biggest challenges for a service provider is the identification & categorization of GDPR impacted data sets in disparate locations across the enterprise. A dev team must flag tables, attributes and other data objects that are categorically covered under GDPR regulations and then ensure that they are not transferred to a server outside of EU.

In the present age of Virtual shared server instance, cloud computing and VoIP protocol it is operational a very tough task for a communication service provider to ensure that data is not transferred outside of EU such as a VoIP call from origination in US and destination in EU will require information exchanges via SDP, vcard , RTP stream via media proxies etc.


The GDPR provides supervisory authorities with wide-ranging powers to enforce compliance, including the power to impose significant fines. You will face fines of up to €20m or 4% of your total worldwide annual turnover of the preceding financial year. In addition, data subjects can sue you for pecuniary or non-pecuniary damages (i.e. distress). Supervisory authorities will have a discretion as to whether to impose a fine and the level of that fine.

Data Protection officer (DPO)

Under the terms of GDPR, an organisation must appoint a Data Protection Officer (DPO) if it carries out large-scale processing of special categories of data, carries out large scale monitoring of individuals such as behaviour tracking or is a public authority.

Reference :

Media Architecture, RTP topologies

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Point to Point

Endpoints communicating over unicast. RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session.

Advantages of P2p Disadvantages of p2p
(+) Facilitates private communication between the parties (-) Only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports

Point to Point via Middlebox

Same as above but with a middle-box involved. Middle Box type are :


Mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion. This does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

Subtypes of Multibox :

Transport/Relay Anchoring

Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server

Middleboxes for auditing or privacy control of participant’s IP

Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup

Transport translator

interconnecting networks like multicast to unicast

media packetization to allow other media to connect to the session like non-RTP protocols

Media translator

Modifies the media inside of RTP streams commonly known as transcoding.

It can do up to full encoding/decoding of RTP streams. In many cases it can also act on behalf of non-RTP supported endpoints, receiving and responding to feedback reports ad performing FEC ( forward error corrected )

Back-To-Back RTP Session

Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions.

Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs

Advantages of Back-To-Back RTP SessionDisadvantages of Back-To-Back RTP Session
(+) B2BUA / media bridge take responsibility tpo relay and manages congestion(-) It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations

Point to Point using Multicast

Any-Source Multicast (ASM)

traffic from any particpant sent to the multicat group address reaches all other partcipants

Source-Specific Multicast (SSM)

Selective Sender stream to the multicast group which streams it to the recibers

Point to Multipoint using Mesh

many unicast RTP streams making a mesh

Point to Multipoint + Translator

Some more variants of this topology are Point to Multipoint with Mixer

Media Mixing Mixer

receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through

static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

Media Switching Mixer

RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

The Mixer can reduce bitrate or switch between sources like active speakers.

SFU ( Selective Forwarding Unit)

Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.

Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

Advantges of SFUDisadvatages of SFU
(+) Low lanetncy and low jitter buffer requirnment by avoiding re enconding
(+) saves on encoding decoding CPU utilization at server
(-) unable to manage network and control bitrate
(-) creates higher load on receiver when compared with MCU

On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.

Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.


Encode in multiple variation and let SFU decide which endpoint should receive which stream type

Advantages of SFU +SimulcastDisadvantages of SFU +Simulcast
(+) Simulcast can ensure endpoints receive media stream depending on their requirnment/bandwidth/diaply(-) Uplink bandwidth reuirnment is high
(-) CPU intensive for sender for encoding many variations of outgoing stream

SVC ( scalable Video Coding)

Encodes in multiple layers based on various modalities such as

  • Signal to noise ration
  • temporal
  • Spatial
Advantages of SFU +SimulcastDisadvantages of SFU +Simulcast
(+) Simulcast can ensure endpoints receive media stream depending on their requirnment/bandwidth/diaply(-) Uplink bandwidth reuirnment is high
(-) CPU intensive for sender for encoding many variations of outgoing stream

Hybrid Topologies

There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan

Hybrid model of forwarding and mixed streamings

Some endpoints receive forwarded streams while others receive mixed/composited streams.

Serverless models

Centralized topology in which one endpoint serves as an MCU or SFU.

Used by Jitsi and Skype

Point to Multipoint Using Video-Switching MCUs

Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.

This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP

To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains

Cascaded SFUs

SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources

Transport Protocols

Before getting into an in-depth discussion of all possible types of Media Architectures in VoIP systems, let us learn about TCP vs UDP.

TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.

Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction is used due to its tunnelling methods where packets of other protocols are encapsulated inside of UDP packet. However to provide E2E security other methods for Auth and encryption are used.

Audio PCAP storage and Privacy constraints for Media Servers

A Call session produces various traces for offtime monitoring and analysis which can include

CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

References :

To learn about the difference between Media Server tologies

  • centralized vs decentralised,
  • SFU vs MCU ,
  • multicast vs unicast ,

Read – SIP conferecning and Media Bridge

SIP conferencing and Media Bridges

SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

To read more about buildinga scalable VoIP Server Side architecture and

  • Clustering the Servers with common cache for High availiability and prompt failure recovery
  • Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
  • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
  • Containerization and Autoscalling

Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

VoIP/ OTT / Telecom Solution startup’s strategy for building a scalable flexible SIP platform

Scalable and Flexible platform. Let’s go in-depth to discuss how can one go about achieving scalability in SIP platforms. ulti geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, uted Event management and Event Driven architecture , Containerization, autoscaling , security , policies and market differentiator, ticketing and issue tracking.

WebRTC Audio/Video Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session. The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

  1. RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.
  2. RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

WebRTC Browsers MUST implement the VP8 video codec as described in
RFC6386] and H.264 Constrained Baseline as described in [H264].

RFC 7442 WebRTC Video Codec and Processing Requirements
Media Flow in WebRTC Call

WebRTC Video Codecs

Most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding. Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123

a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100

a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=102

a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=127

a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125

a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 red/90000
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=124


Developed by on2 and then acquired and opensource by google .

libvpx encoder library.

  • Supported conatiner – 3GP, Ogg, WebM
  • (+) supported simulcast
  • (+) Now free of royality fees.

No limit on frame rate or data rate and provides maximum resolution of 16384×16384 pixels.

VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.
encode and decode pixels with an implied 1:1 (square) aspect ratio.


Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates.

  • supported Containers are – MP4, Ogg, WebM
  • (+) Open and free of royalties and any other licensing requirements

H264/AVC constrained

AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC.

  • propertiary, patented codec, mianted by MPEG / ITU

Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

Other profiles , which are not supporedt are Baseline(BP), Extended(XP), Main(MP) , High(HiP) , Progressive High(ProHiP) , High 10(Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

  • supported containers are 3GP, MP4, WebM

Parameter settings:

  • packetization-mode
  • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
  • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
  • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

AV1 (AOMedia Video 1)

open format designed by the Alliance for Open Media. It is royality free and especially designed for internet video HTML element and WebRTC.

  • higher data compression rates than VP9 and H.265/HEVC

offers 3 profiles in increasing support for color depths and chroma subsampling.
1. main,
2. high, and
3. professional

  • supports HDR
  • supports Varible Frame Rate
  • Supported container are ISOBMFF, MPEG-TS, MP4, WebM

Stats for Video based media stream track

timestamp 04/05/2020, 14:25:59
ssrc 3929649593
isRemote false
mediaType video
kind video
trackId RTCMediaStreamTrack_sender_2
transportId RTCTransport_0_1
codecId RTCCodec_1_Outbound_96
[codec] VP8 (payloadType: 96)
firCount 0
pliCount 9
nackCount 476
qpSum 912936
[qpSum/framesEncoded] 32.86666666666667
mediaSourceId RTCVideoSource_2
packetsSent 333664
[packetsSent/s] 29.021823604499957
retransmittedPacketsSent 0
bytesSent 342640589
[bytesSent/s] 3685.7715977714947
headerBytesSent 8157584
retransmittedBytesSent 0
framesEncoded 52837
[framesEncoded/s] 30.022576142586164
keyFramesEncoded 31
totalEncodeTime 438.752
[totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
totalEncodedBytesTarget 335009905
[totalEncodedBytesTarget/s] 3602.7091371103397
totalPacketSendDelay 20872.8
[totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
qualityLimitationReason bandwidth
qualityLimitationResolutionChanges 20
encoderImplementation libvpx
Graph for Video Track in chrome://webrtc-internals

Non WebRTC supported Video codecs

Need active realtime media transcoding


Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.

  • suited for low bandwidth networks
  • (-) not comaptible with WebRTC
    • but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

H.265 / HEVC

proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

  • Container – Mp4

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

WebRTC Audio Codecs

source : unknown

WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000


Opus is a lossy audio compression format developed by the Internet Engineering Task Force (IETF) targeting a broad range of interactive real-time applications over the Internet, from speech to music and supportes multiple compression algorithms

  • Constant and variable bitrate encoding – 6 kbit/s to 510 kbit/s
  • frame sizes – 2.5 ms to 60 ms
  • sampling rates – 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).
  • container- Ogg, WebM, MPEG-TS, MP4

As an open format standardized through RFC 6716, a reference implementation is provided under the 3-clause BSD license. All known software patents which cover Opus are licensed under royalty-free terms.

  • (+ ) flexible, suited for speech ( by SILK) and music ( CELT)
  • (+) support for mono and stereo
  • (+) inbuild FEC( Forward Error Correction) thus resilient to packet loss
  • (+) compression adjustability\ for unpredictable networks
  • (-) Highly CPU intensive ( unsuitable for embedded devices like rpi)
  • (-) processing and memory intensive

For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommends that Opus be offered before PCMA/PCMU.

AAC (Advanvced Audio Encoding)

part of the MPEG-4 (H.264) standard. Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

  • supported containers – MP4, ADTS, 3GP

G.711 (PCMA and PCMU)

G.711 is an ITU standard (1972) for audio compression. It is primarily used in telephony.

ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
vital to interface with the standard telecom network and carriers. G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU

It is the required standard in many voice-based systems and technologies, for example in H.320 and H.323 specifications.

  • Fixed 64Kbpd bit rate
  • supports 3GP container formats


ITU standard (1988) Encoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression

  • 7 kHz Wideband audio codec operating
  • Bitrate 48, 56 and 64 kbit/s.
  • containers used 3GP, AMR-WB

G722 improved speech quality due to a wider speech bandwidth of up to 50-7000 Hz compared to G.711 of 300–3400 Hz.

Comfort noise (CN)

artificial background noise which is used to fill gaps in a transmission instead of using pure silence. It prevents – jarring or RTP Timeout.

Should be used for streams encoded with G.711 or any other supported codec that does not provide its own CN. Use of Discontinuous Transmission (DTX) / CN by senders is optional

Internet Low Bitrate Codec (iLBC)

A opensource narrowband speech codec for VoIP and streaming audio.

  • 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
  • Defined by IETF RFCs 3951 and 3952.

Internet Speech Audio Codec (iSAC)

iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. It is designed for voice transmissions which are encapsulated within an RTP stream.

  • 16 kHz or 32 kHz sampling frequency
  • adaptive and variable bit rate of 12 to 52 kbps.


patent-free audio compression format designed for speech and also a free software speech codec that is used in VoIP applications and podcasts. May be obsolete, with Opus as its official successor.

AMR-WB Adaptive Multi-rate Wideband is a patented wideband speech coding standard that provides improved speech quality. This is codec is generally available on mobile phones.

  • wider speech bandwidth of 50–7000 Hz.
  • data rate is between 6-12 kbit/s, and the

DTMF and ‘audio/telephone-event’ media type

endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

DTMF events list
| 0 | DTMF digit “0”
| 1 | DTMF digit “1”
| 2 | DTMF digit “2”
| 3 | DTMF digit “3”
| 4 | DTMF digit “4”
| 5 | DTMF digit “5”
| 6 | DTMF digit “6”
| 7 | DTMF digit “7”
| 8 | DTMF digit “8”
| 9 | DTMF digit “9”
| 10 | DTMF digit “*”
| 11 | DTMF digit “#”
| 12 | DTMF digit “A”
| 13 | DTMF digit “B”
| 14 | DTMF digit “C”
| 15 | DTMF digit “D”

Stats for Audio Media track

Stats for Audio Media include

  • headerBytesSent
  • packetsSent
  • bytesSent
timestamp 04/05/2020, 14:25:59
ssrc 3005719707
isRemote fals
mediaType audio
kind audio
trackId RTCMediaStreamTrack_sender_1
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
[codec] opus (payloadType: 111)
mediaSourceId RTCAudioSource_1
packetsSent 88277
[packetsSent/s] 50.03762690431027
retransmittedPacketsSent 0
bytesSent 1977974
[bytesSent/s] 150.11288071293083
headerBytesSent 2118648
retransmittedBytesSent 0
Graphs in chrome://webrtc-internals for Audio


m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4
a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE

Stats for Datachannel

Statistics RTCDataChannel_1
timestamp 04/05/2020, 14:25:59
label sctp
datachannelid 1
state open
messagesSent 1
[messagesSent/s] 0
bytesSent 228
[bytesSent/s] 0
messagesReceived 1
[messagesReceived/s] 0
bytesReceived 228
[bytesReceived/s] 0

Refrenecs :

Quick links : If you are new to WebRTC read : Introduction to WebRTC is at https://telecom.altanai.com/2013/08/02/what-is-webrtc/

Layers of WebRTC at https://telecom.altanai.com/2013/07/31/webrtc/

Attacks on SIP Networks

Major standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling proMajor standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling protocol for services such as LTE, VoIP, conferencing, Video on Demand (VoD), IPTV (Internet Television), presence, and Instant Messaging (IM) etc. With the continuous evolution of SIP as the defacto VoIP protocol, we need to understand the risk mitigation practices around it.

Types of attacks on SIP based systems

Registration Hijacking

malicious registrations on registrar by a third party who modifies From header field of a SIP request.

exmaple implementation :
attacker de-registers all existing contacts for a URI
attacker can also register their own device as the appropriate contact address, thereby directing all requests for the affected user to him

solution – Autheticaion of user

Impersonating a Server

attacker impersonates the remote server
user’s request can now be intercepted by some other party
user’s request may be forwarded to insecure locations

Solution : confidentiality, integrity, and authentication of proxy servers

Proxy/redirect sever, and registrars SHOULD possess a site certificate issued by CA which could be validated by UA

Temparing Message bodies

If users are relying on SIP message bodies to communicate either of

  • session encryption keys for a media session
  • MIME bodies
  • SDP
  • encapsulated telephony signals
    Then the atackers on proxy server can modify the session key or can act as a man-in-the-middle and do eaves droppng

exmaple implementation :
attacker can point RTP media streams to a wiretapping device
can changes Subject header field to appear to users as spam

solution – end to end ecryption over TLS + Digest Authorization

Mid-session threats like tearing down session

Request forging attacker learns the params of the session like To , From tags etc then he can alter ongoing session parameters and even bring it down

example implementation :
attacker inserts a BYE in a ongoing session thereby tearing it down
can insert re INVITE and redierct the stream to wiretaping device

solution – authetication on every request
signing and encrypting of MIME bodies, and transference of credentials with S/MIME

DOS (Denial of Service) Amplification

DOS – rendering a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.
dDOS – multiple network hosts to flood a target host with a large amount of network traffic.

Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion. Some examples of implementation of DOS attacks :

  • Attackers creates a falsified source IP address and a corresponding Via header field that identify a targeted host as the originator of the request. Then send this to large number of SIP network element. This geneerates DOS aimed at target.
  • Attackers uses falsified Route header field values in a request that identify the target host and then send such messages to forking proxies that will amplify messaging sent to the target.
  • Flooding with register attacks can deplete available memory and disk resources of a registrar by registering huge numbers of bindings.
  • Flooding a stateful proxy server causes it to consume computational expense associated with processing a SIP transaction

Solution – detect flooding and pike in traffic and use ipban to block
challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required), forgoing the normal response retransmission algorithm, and thus behaving statelessly towards unauthenticated requests.

Security mechanisms

Full encryption vs hop by hop encrption

SIP mssages cannot be encrypted end-to-end in their entirety since
message fields such as the Request-URI, Route, and Via need to be visible to proxies in most network architectures
so that SIP requests are routed correctly.
proxy servers need to also update the message with via headers

Thus SIP uses low level security along with hop by hop encrption and auth headers to verify the identity of proxy servers

Transport and Network Layer Security

IPsec – used where set of hosts or administrative domains have an existing trust relationship with one another.

TLS – used where hop-by-hop security is required between hosts with no pre-existing trust association.


Used as an address-of-record for a particular user, signifies that each hop over which the request is forwarded, must be secured with TLS

HTTP Authentication

Reuse of the HTTP Digest authentication via 401 and 407 response codes that implement challenge for autehtication
provides replay protection and one-way authentication.


allows SIP UAs to encrypt MIME bodies within SIP, securing these bodies end-to-end without affecting message headers.
provides end-to-end confidentiality and integrity for message bodies


provides replay protection

SIP over TLS

SIP messages can be secured using TLS. There is also TLS for Datagrams called DTLS.

Security of SIP signalling is different from security of protocols used in concert with SIP like RTP , RTCP. and that will be covered in later topics of this article.

TLS operation consists of two phases: handshake phase and bulk data encryption phase

Handshake phase

Prepare algorithm to be used during TLS session

Server Authentication

server sends its certificate to the client, which then verifies the certificate using a certificate authority’s (CA’s) public key.

Client Authentication

Server sends an additional CertificateRequest message to request the client’s certificate. The client responds with

  1. Certificate message containing the client certificate with the client public key and
  2. CertificateVerify message containing a digest signature of the handshake messages signed by clients private key

Server authenticates client by client’s public key , since only client holding correct private key can sign the message.

Prepare the shared secret for bulk data encryption

client generate a pre_master_secret, and encrypt it using the server’s public key obtained from the server’s certificate. The server decrypts the pre_master_secret using its own private key.
Both the server and client then compute a master_secret they share based on the same pre_master_secret. The master_secret is further used to generate the shared symmetric keys for bulk data encryption and message authentication

Public key cryptographic operations such as RSA are much more expensive than shared key cryptography. This is why TLS uses public key cryptography to establish the shared secret key in the handshake phase, and then uses symmetric key cryptography with the negotiated shared secret as the data encryption key.

Stateless proxy servers do not maintain state information about the SIP session and therefore tend to be more scalable. However, many standard application functionalities, such as authentication, authorization, accounting, and call forking require the proxy server to operate in a stateful
mode by keeping different levels of session state information.

Steps :

  1. The SIP proxy server enforces proxy authentication with
    407 Proxy Authentication Required challenge.
  2. UAC provides credentials that verify its claimed identity (e.g., based on MD5 [34] digest algorithm) and retransmits in authorization header

Security of RTP

confidentiality protection of the RTP session and integrity protection of the RTP/RTCP packets requires source authentication of all the packets to ensure no man-in-the-middle (MITM) attack is taking place.

end to end media encryption – SRTP ( Secure RTP )

encodes the voice into encrypted IP packages and transport those via the internet from the transmitter  to receive


  • The Impact of TLS on SIP Server Performance – Charles Shen† Erich Nahum‡ Henning Schulzrinne† Charles Wright , Department of Computer Science, Columbia University,IBM T.J. Watson Research Center

I have written about VoIP and security in these blogs before

For security around web browser-based calling via webrtc, the articles below discuss security practices in general

  • Webrtc Security , which describes browser threat modal , access to local resource , Same Orogin Policy (SOP) and Cross Resource Sharing ( CORS) as well as Location sharing , ICE , TUEN and threats to privacy with screen sharing , microgone camera long term access and probable mid call attacks .
  • Genric secrutity of web Application build around hosting platform of webrtc. Includs concepts like Identity management , browser security – cross site security amd clickjacking , Authetication of devices and applications , Media Encryption and regex checking.

HTTP/2 – offer/answer signaling for WebRTC call

HTTP ( Hyper Text Transfer Protocol ) is the top application layer protocol atop the Tarnsport layer ( TCP ) and the Network layer ( IP ).


HTTP/1.1 was released in 1997. HTTP/1 allowed only 1 req at a time. But HTTP/1.1 allows one one outstanding connection on a TCP session but allowed request pieplinig to achieve concurency.


HTTP/2 was released in 2015, it aimed at reducing latency while delivering heavy graphics, videos and other media cpmponents on web page especially on mobile sites .
optimizes server push and service workers.


  • Header compression (HPACK)
  • reuse connection TCP connection
    • All frames (e.g. HEADERS, DATA, etc) are sent over single TCP connection
  • Binary framing layer
    • Prioritization
    • Flow control
    • Server push
  • Request → Stream
    • Streams multiplexed
    • Streams prioritized
  • (+) low latency / iproves end user perceived latency
  • (+) retain semantics of HTTP1.1


A key differenet between HTTP/1.1 and HTTP/2 is the fact that former transmites requests and reponses in plaintext whereas the later encapsulates them into binary format, proving more features and scope for optimzation. Thus at protocol level, it is all about frames of bytes which are part of stream.

  • HTTP messages are decomposed into one or more frames
    • HEADERS for meta-data (9-byte, length prefixed)
    • DATA for payload
    • RST_STREAM to cancel
    • …..

“enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.”

Hypertext Transfer Protocol Version 2 (HTTP/2) draft-ietf-httpbis-http2-latest

It is important to know that Browsers only implement HTTP/2 under HTTPS, thus TLS connection is must for whichw e need certs ad keys signed by CA ( either self signed using openssl , signed by public CA like godaddy , verisign or letsencrypt).

Data Flow

DATA frames are subject to per-stream and connection flow control

  • Flow control allows the client to pause stream delivery, and resume it later

Compatibility Layer between HTTP1.1 and HTTP2.0 in node

Nodejs >9 provides http2 as native module. Exmaple of using http2 with compatibility layer

const http2 = require('http2');
const options = {
 key: 'ss/key', // path to key
 cert: 'ssl/cert' // path to cert

const server = http2.createSecureServer(options, (req, res) => {
    req.addListener('end', function () {
        file.serve(req, res);

in replacement for existing server http/https server

const https = require('https');
app = https.createServer(options, function (request, response) {
    request.addListener('end', function () {
        file.serve(request, response);


Socket.io/ Websocket over HTTP2

The WebSocket Protocol uses the HTTP/1.1 Upgrade mechanism to transition a TCP connection from HTTP into a WebSocket connection

Due to its multiplexing nature, HTTP/2 does not allow connection-wide header fields or status codes, such as the Upgrade and Connection request-header fields or the 101 (Switching Protocols) response code. These are all required for opening handshake.

Ideally the code shouldve looekd like this with backward compatiability layer , but continue reading update ..

var app = http2
    .createSecureServer(options, (req, res) => {
        req.addListener('end', function () {
            file.serve(req, res);

var io = require('socket.io').listen(app);
io.on('connection', onConnection); // evenet handler onconnection

Error during WebSocket handshake: Unexpected response code: 403

update May 2020 : I tried using the http2 server with websocket like mentioned above ,h owever many many hours of working around WSS over HTTP2 secure server , I consistencly kept faccing the ECONNRESET issues after couple of seconds , which would crash the server

client 403

Therefore leaving the web server to server htmll conetnt I reverted the siganlling back to HTTPs/1.1 given the reasons for sticking with WSS is low latency and existing work that was already put in.

Example Repo : https://github.com/altanai/webrtcdevelopment/tree/htt2.0

Reading Further of exploring HTTP CONNECT methods for setting WS handshake . Will update this section in future if it works .


A “stream” is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection.
A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.

Core http2 module provides new core API (Http2Stream), accessed via a “stream” listener:

const http2 = require('http2');
const options = {
 key: 'ss/key', // path to key
 cert: 'ssl/cert' // path to cert

const server = http2.createSecureServer(options, (stream, headers) => {
    stream.respond({ ':status': 200 });
    stream.end('some text!');

Other features

  • stream multiplexing
  • stream Prioritization
  • header compression
  • Flow Control
  • support for trailer

Persistent , one connection per origin.

With the new binary framing mechanism in place, HTTP/2 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP/2 connections are persistent, and only one connection per origin is required,

Server Push

bundling multiple assets and resources into a single HTTP/2  and lets the srever proactively push resources to client’s cache .

Server issues PUSH_PROMISE , client validates whether it needs the resource of not. If the client matches it then they will load like regular GET call

The PUSH_PROMISE frame includes a header block that contains a complete set of request header fields that the server attributes to the request.

After sending the PUSH_PROMISE frame, the server can begin delivering the pushed response as a response on a server-initiated stream that uses the promised stream identifier.

Client receives a PUSH_PROMISE frame and can either chooses to accept the pushed response or if it does not wish to receive the pushed response from the server it can can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream’s identifier.

Push Stream Support


respondWithFile() and respondWithFD() APIs can send raw file data that bypasses the Streams API.

Related technologies


Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.

Email messages + MIME : transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

MIME in HTTP in WWW : servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.

MIME header fields

  • MIME version
MIME-Version: 1.0
  • Content Type
Content-Type: text/plain

multipart/mixed , text/html, image/jpeg, audio/mp3, video/mp4, and application/msword

  • content disposition
Content-Disposition: attachment; filename=genome.jpeg;
  modification-date="Wed, 12 Feb 1997 16:29:51 -0500";
  • Content-Transfer-Encoding


source : istlsfastyet.com

References :

Certificates, compliances and Security in VoIP

This article describes various Certificates and compliances, Bill and Acts on data privacy, Security and prevention of Robocalls as adopted by countries around the world pertaining to Interconnected VoIP providers, telecommunications services, wireless telephone companies etc

Compliance certificates by Industry types

HIPAA (Health Insurance Portability and Accountability Act)

Deals with privacy and security of personal medical records and electronic health care transaction

Applicability  : If voip company handles medical information

Includes : 

  • Not allowed Voice mail transcription
  • Should have End-to-End Encryption
  • Restrict  using unsecured WiFi networks to prevent Snooping
  • User security , strong password rules  and mandatory monthly change
  • Secure Firmware on VoIP phones
  • Maintaining Call and Access Logs

SOX( Sarbanes Oxley Act of 2002)

Also known as SOX, SarbOX or Public Company Accounting Reform and Investor Protection Act

Applicability : if managing the communications operations of a regulated, publicly traded company 

Includes : 

  • Retain records which include financial and other sensitive data
  • ways employees are provided or denied access to records or data based on their roles and responsibilities
  • do information audit by a trusted third party. 
  • Retention and deletion of files such as audio files like voicemails, text messages, video clips, declared paper records, storage, and logs of communications activities
  • Physical and digital security controls around cloud-based VoIP applications and the networks

Privacy Related Compliance certificates

COPPA (Children’s Online Privacy Protection Act ) of 1998 

prohibits deceptive marketing to children under the age of 13, or collecting personal information without disclosure to their parents. 

any information is to be passed on to a third party, must be easy for the child’s guardian to review and/or protect

2011 amendment  requires that the data collected was erased after a period of time,

2014 FTC issued guidelines that apps and app stores require “verifiable parental consent.”

CPNI (Customer Proprietary Network Information) 2007

CPNI (Customer Proprietary Network Information) in united states is the information that communication providers  acquire about their subscribers. This Individually identifiable information that is created by a customer’s relationship with a provider, such as data about the frequency, duration, and timing of calls, the information on a customer’s bill, and call identifying information. This processing information is governed strictly by FCC and certification should be renewed on an annual basis

Provider can pass along that information to marketers to sell other services, as long as the customer is notified

In 2007, the FCC explicitly extended the application of the Commission’s CPNI rules of the Telecommunications Act of 1996 to providers of interconnected VoIP service.


Communications Assistance for Law Enforcement Act (CALEA) conduct electronic surveillance by imposing specific obligations on “telecommunications carriers” for assisting law enforcement, including delivering call interception and call identification functionality to the government with a minimum of interference to customer service and privacy.

Read more about CALEA and its roles in VoIP here Regulatory and Legal Considerations with WebRTC development

GDPR (General Data Protection Regulation)  in European Union 2018

Supersedes the 1995 Data Protection Directive

Establishes requirements of organizations that process data, defines the rights of individuals to manage their data, and outlines penalties for those who violate these rights.

No personal data may be processed unless this processing is done under one of six lawful bases specified by the regulation (consent, contract, public task, vital interest, legitimate interest or legal requirement). When the processing is based on consent the data subject has the right to revoke it at any time.

Controllers must notify Supervising Authorities (SA)s of a personal data breach within 72 hours of learning of the breach.

California Consumer Privacy Act (CCPA) 2019

consumer rights relating to the access to, deletion of, and sharing of personal information that is collected by businesses. 

Allows consumers to know whether their personal data is sold or disclosed , to whom .

Allows opt-out right for sales of personal information

Right to deletion – to request a business to delete any personal information about a consumer collected from that consumer

Personal Data Protection Bill (PDP) – India 2018

This bill introduces various private and sensitive protection frameworks  like restriction on retention of personal data, Right to correction and erasure (such as right to be forgotten) , Prohibition and transparency of processing of personal data. It also classifies data fiduciaries  including certain social media intermediaries. 

The Bill amends the Information Technology Act, 2000 to delete the provisions related to compensation payable by companies for failure to protect personal data.

Other data privacy acts similar to GDPR 

  • South Korea’s Personal Information Protection Act  2011
  • Brazil’s Lei Geral de Proteçao de Dados (LGPD)  2020
  • Privacy Amendment (Notifiable Data Breaches) to Australia’s Privacy Act 2018
  • Japan’s Act on Protection of Personal Information 2017
  • Thailand Personal Data Protection Act (PDPA) 2020

Features offered by VOIP companies for Data privacy 

  • Access Control & Logging
  • Auto Data Redaction / Account Deletion policy 
  • SIEM (Security information and event management) alerts 
  • Information security , Encrypted Storage For Recordings & Transcripts
  • Disclosing all third party services that are involved in data processing too
  • Role Based Access Control and 2 Factor Authentication
  • Data Security Audits and appointing  data protection officer to oversee GDPR compliance

Against Robocalls and SPIT ( SPAM over Internet Telephony)

 2009 Truth in Caller ID Act 

Telephone Consumer Protection Act of 1991

Implementation of Do not call registry against use of robocalls, automatic dialers, and other methods of communication

Do-Not-Call Implementation Act of 2003

if a business has an established relationship with a customer, it can continue to call them for up to 18 months. If a consumer calls the company, say, to ask for information about the product or service, the company has three months to get back to him.

if the customer asks to not receive calls, the company must stop calling, or be subject to fines.

Exemptions – Calls from a not-for-profit B organisation , informational messages as flight cancellations , Calls from sales and debt collectors etc

Personal Data Privacy and Security Act 2009

Implemented to curb  identity theft and computer hacking. Sensitive personal identifiable information includes : victim’s name, social security number, home address, fingerprint/biometrics data, date of birth, and bank account numbers.

Any company that is breached must notify the affected individuals by mail, telephone, or email, and the message must include information on the company and how to get in touch with credit reporting agencies

If the breach involves government or national security , company must also contact the Secret Service within fourteen days 

TRACED Act (Telephone Robocall Abuse Criminal Enforcement and Deterrence) 2019

Canadian Radio-television and Telecommunications Commission (CRTC) 2018 -32

A solution mechanism has already been standardised and active in adoption called STIR / SHAKEN ( Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs) described in another article here.

Emergency services 

FCC E911 E911 / VoIP E911 rules

Unlike traditional telephone connections, which are tied to a physical location, VOIP’s packet switched technology allows a particular number to be anywhere making it more difficult for it to reach localised services like emergency numbers of Public Safety Answering Points (PSAPs) . Thus FCC regulations as well as the New and Emerging Technologies 911 Improvement Act of 2008 (NET 911 Act), interconnected VoIP providers are required to provide 911 and E911 service. 

Ref : 


To understand the need for implementing an identification verification technique in Internet protocol based network to network communication system , we need to evaluate the existing problem plaguing the VoIP setup .

What is Call ID spoofing ? 

Vulnerability of existing interconnection phone system which is used by robo-callers to mask their identity or to make it appear the call is from a legitimate source, usually originates from voice-over-IP (VOIP) systems.

In this context understand the Caller Line identification CLI/ NCLI techniques used by VoIP and OTT( over the top) providers today.

CLI (Caller Line Identification)

If call goes out on a CLI route ( White Route ) the received party will likely see your callerID information

  • Lawful – Termination is legal on the remote end ie abiding country’s telco infrastructure and stable
  • Expensive – usually with direct or via leased line (TDM) interconnections with the tier-1 carriers.

Non-CLI (Non-Caller Line Identification)

The Caller ID is not visible at the call
If call goes out on a Non-CLI route (Grey Route) goes out on a non-CLI routes they will see either a blocked call or some generic number.

  • Unlawful – questionable legality or maybe violating some providers AUP(Acceptable Use Policy ) on the remote end.
  • Cheaper – low quality , usually via VoIP-GSM gateways

Example include robocalls , tele-marketting / spam etc which are unwilling to share their Caller Id for call receiver, to not be blocked or cancelled.

To overcome the problem of non-verifiable spam , robocalls a suite of protocols and procedures are proposed that can combat caller ID spoofing on VOIP and connected public telephone networks.


Secure Telephony Identity Revisited (STIR) / Signature-based Handling of Asserted information using toKENs (SHAKEN)

Used by robocallers to mask their identity or to make it appear the call is from a legitimate source
usually orignates from voice-over-IP (VOIP) systems


Standards developed by the Internet Engineering Task Force (IETF) 

For telecommunication service providers implement  certificate management system to create and manage the public and private keys, digital certificates used to sign and verify Caller ID details. 

Adds information to the SIP headers that allow the endpoints along the system to positively identify the origin of the data , such as JSON web tokens encrypted with the provider’s private key, encoded using Base64,

There are three levels of verification, or “attestation”

  • A : Full Attestation
    indicates that the provider recognizes the entire phone number as being registered with the originating subscriber.
  • B : Partial Attestation
    call originated with a known customer but the entire number cannot be verified,
  • C : Gateway Attestation
    call can only be verified as coming from a known gateway

How can the Public Key Infrastructure be used ? 

In an interconnection network , each telephone service provider will obtain its digital certificate from a certificate authority (CA)  that is trusted by other telephone service providers. Calling party signs the SIP Header  caller ID as legitimate . The called party verifies that the calling number is authentic


Originating service provider’s encrypted SIP Identity Header includes the following data:

  1. Attestation level
  2. Date and Time
  3. Calling and Called Numbers
  4. Orig ID for analytics and/or traceback purposes among others
  5. Location of certificate repository
  6. Signature
  7. Encryption algorithm

FCC has also assigned the role of a Secure Telephone Identity Policy Administrator (STI-PA) which oversees that CAs do not provide certificate to spoofing robocallers and enforce the framework for STIR /SHAKEN .

Sample Identity header in SIP requst

INVITE sip:bob@biloxi.example.org SIP/2.0
Via: SIP/2.0/TLS pc33.atlanta.example.com;branch=z9hG4bKnashds8
To: Bob
From: Alice ;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Max-Forwards: 70
Date: Thu, 21 Feb 2002 13:02:03 GMT
Identity-Info: https://atlanta.example.com/atlanta.cer;alg=rsa-sha1
Content-Type: application/sdp
Content-Length: 147

o=UserA 2890844526 2890844526 IN IP4 pc33.atlanta.example.com
s=Session SDP
c=IN IP4 pc33.atlanta.example.com
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000


STIR is based on the SIP protocol and is designed to work with calls being routed through a VOIP network. Since traditional endpoints like POTS and SS7 networks also should be covered under this call authenticity framework , SHAKEN was developed to manage call via IP-to-telephone gateways .

Developed by the Alliance of Telecommunications Industry Solutions (ATIS)

Working Steps  :

  1. When a call is initiated, a SIP INVITE is received by the originating service provider.
  2. Originating service provider verifies the call source and number to determine how to confirm validity.
    1. Full Attestation (A) — The service provider authenticates the calling party AND confirms they are authorized to use this number. An example would be a registered subscriber.
    2. Partial Attestation (B) — The service provider verifies the call origination but cannot confirm that the call source is authorized to use the calling number. An example would be a calling number from behind an enterprise PBX.
    3. Gateway Attestation (C) — The service provider authenticates the call’s origin but cannot verify the source. An example would be a call received from an international gateway.
  3. Create a SIP Identity header that contains information on the calling number, called number, attestation level, and call origination, along with the certificate thus caller ID “signed” as legitimate
  4. SIP INVITE with the SIP Identity header with the certificate is sent to the destination service provider.
  5. Destination service provider verifies the identity of the header and certificate.

Diagrammatic depiction of flow of how Telecom carriers to digitally validates authenticity before receiving or handoff through their network