Low Latency Media streaming

Low latency is imperative for use cases that require mission-critical communication such as the emergency call for first responders, interactive collaboration and communication services, real-time remote object detection etc. Other use cases where low latency is essential are banking communication, financial trading communication, VR gaming etc. When low latency streaming is combined with high definition (HD) quality, the complication grows tenfold. Some instance where good video quality is as important as sensitivity to delay is telehealth for patient-doctor communication. 

Measuring Latency

A NTP Time server  measures the number of seconds that have elapsed since January 1, 1970. NTP time stamp can represent time values to an accuracy of ± 0.1 nanoseconds (ns). In RTP spec it is a 64-bit double word where top 32 bits represent seconds, and the bottom 32 bits represent fractions of a second.

For measuring latency in RTP howver NTP time server is not used , instead the audio capture clock forms the basis for NTP time calcuilation.

RTP/(RTP sample rate) = (NTP + offset) x scale

The latency is then calculated with accurate mappings between the NTP time and RTP media time stamps by sending RTCP packets for each media stream.

Latency can be induced at various points in the systems

  1. Transmitter Latency in the capture, encoding and/or packetization
  2.  Network latency including gateways, load balancing, buffering
  3.  Media path having TURN servers for Network address traversal, delay due to low bandwidth, transcoding delays in media servers.
  4.  Receiver delays in playback due to buffer delay, playout delay by decoder due to hardware constraints

The delay can be caused by one or many stages of the path in the media stream and would be a cumulative sum of all individual delays. For this reason, TCP is a bad candidate due to its latency incurred in packet reordering and fair congestion control mechanisms. While TCP continues to provide signalling transport, the media is streamed over RTP/UDP.

Latency reduction

Although modern media stacks such as WebRTC are designed to be adpatble for dynamic network condition, the issues of bandwidth unpredictbaility leads to packet loss and evenetually low Qos.

Effective techniques to reduce latency

  • Dynamic network analysis and bandwidth estimation for adaptive streaming ensure low latency stream reception at the remote end.
  • Silence suppression: This is an effective way to save bandwidth
    • (+) typical bandwidth reduction after using silence suppression ~50%
  • Noise filtering and Background blurring are also efficient ways to reduce the network traffic 
  • Forward error correction or redundant encoding, techniques help to recover from packet loss faster than retransmission requests via NACK 
  • Increased Compression can also optimize packetization and transmission of raw data that targets low latency
  • Predictive decoding and end point controlled congestion

Ineffective techniques that do not improve QoS even if reduce latency 

  • minimizing headers in every PDU: Some extra headers such as CSRC or timestamp can be removed to create RTPLite but significant disadvantages include having to functionalities using custom logic as
    • (-) removing timestamp would lead to issues in cross-media sync ( lip-sync) or jitter, packet loss sync
    • (-) Removing contributing source identifiers (SSRC ) could lead to issues in managing source identity in multicast or via media gateway.
  • Too many TURN, STUN servers and candidates collection 
  • lowering resolution or bitrate may achieve low latency but is far from the hi-definition experience that users expect.

Lip Sync ( Audio Video Synchronization)

Many real time communicationa nd streaming platforms have seprate audio and video processing pipelines. The output of these two can go out of sync due to differing latency or speed and may may appear out of sync at the playback on freceivers end. As teh skew increases the viewers perceive it as bad quality.

According to convention, at the input to the encoder, the audio should not lead the video by more than 15 ms and should not lag the video by more than 45 ms. Generally the lip sync tolerance is at +- 15 ms.

Synchronize clocks

The clock helps in various ways to make the streaming faster by calculating delays with precision

  • NTP timestamps help the endpoints measure their delays.
  • Sequence numbers detect losses as it increases by 1 for each packet transmitted. The timestamps increase by the time covered by a packet which is useful for correcting timing order, playout delay compensation etc.

To compute transmission time the sender and receiver clocks need to be synchronized with miliseconds of precision. But this is unliley in a heterogenious enviornment as hosts may have different clock speeds. Clock Synchronization can be acheived in various ways

  1. NTP synchronization : For multimedia conferences the NTP timestamp from RTCP SR is used to give a common time reference that can associate these independant timesatmps witha wall clock shared time. Thei allows media synchronization between sender in a single session. Additionally RFC 3550 specifies one media-timestamp in the RTP data header and a mapping between such timestamp and a globally synchronized clock( NTP), carried as RTCP timestamp mappings.

2. MultiCast synchronization: receivers synronize to sender’s RTP Header media timestamp

  • (-) good approach for multicast sessions

3. Round Trip Time measurement as a workaround to calculate clock cync. A round trip propagation delay can help the host sync its clock with peers.

roundtrip_time = send timestamp - reflected timestamp
tramission_time = roundtrip_time / 2
  • (-) this approach assumes equal time for sending and receiving packet which is not the case in cellular networks. thus not suited for networks which are time assymetrical.
  • (-) transmission time can also vary with jitter. ( some packets arrive in bursts and some be delayed)
  • (-) subjected to [acket loss

4. Adjust playout delay via Marker Bit : Marker bit indicates begiining of talk spurt. This lets the receiver adjust the playout elay to compensate fir the different click rates between sender and receiver and/or network delay jitter.

Marker Bit Header in RTP with GSM payload

Receivers can perform delay adaption using marker bit as long as the reordering time of market bit packer with respect to other packets is less than the playout delay. Else the receiver waits for the next talkspurt.

Sequence of RTP with GSM payload

Simillar examples frmo WebRTC RTP dumps

Congestion Control in Real Time Systems

Congestion is when we have reached the peak limit the network can supportor the peak bandwidth the network path can handle. There could be many reasons for congestion as limits by ISP, high usage at certain time, failure on some network resources causing other relay to be overloaded so on. Congestion can result in

  • droppping excess packets => high packet loss
  • increased buffering -> overloaded packets will queue and cause eratic delivery -> high jitter
  • progressively increasing round trip time
  • congestion control algorithsm send explicit notification which trigger other nodes to actiavte congestion control.

A real time communication system maybe efficient performing encoding/decoding but will be eventually limited by the network. Sending congestion dynmically helps the platform to adapt and ensure a satisfactory quality without lossing too many packet at network path. There has been extensive reserach on the subject of congestion control in both TCP and UDP transports. Simplistic methods use ACK’s for packet drop and OWD (one way delay) to derieve that some congestion may be occuring and go into avoidance mode by reducing the bitrate and/or sending window.

UDP/RTP streams have the support of well deisgned RTCP feedbacks to proactively deduce congestion situation before it happenes. Some WebRTC approaches work around the problem of congestion by providing simulcast , SVC(Temporal/frame rate, spatial/picture size , SNR/Quality/Fidelity ) , redundant encoding etc. Following attributes can help to infer congestion in a network

  • increasing RTT ( Round Trip Time)
  • increasing OWD( One Way Delay)
  • occurance of Packet Loss
  • Queing Delay Gradient = queue length in bits / Capacity of bottleneck link

Feedback loop

A feedback loop between video encoder and congestion controller can significantly help the host from experiencing bad QoS.

MTU determination

Maximum transmission Unit ( MTU) determine how large a packet can be to be send via a network. MTU differs on the path and path discovery is an effective way to determine the largest packet size that can be sent.

RTCP Feedback

To avoid having to expend on retransmission and faulty gaps in playback, the system needs to cummulatively detect congestion. The ways to detect congestion are looking at self’s sending buffer and observing receivers feedback. RTP supports mechanisms that allow a form of congestion control on longer time scales.

Resolving congestion

Some popular approaches to overcome congestion are limiting speed and sending less

  • Throttling video frame acquistion at the sender when the send buffer is full
  • change the audio/video encoding rate
  • Reduce video frame rate, or video image size at the transmitter
  • modifying the source encoder

The aim of these algorithms is usually a tradeoff between Throughout and Latency. Hence maximizaing throughout and penalizing delays is a formula use often to come up with more mordern congestion cntrol algorithms.

  • NADA ( Network Assisted Dynamic Adaption) which uses a loss vs delay algorithm using OWD,
  • SCREAM ( Sefl Clocked Rate Adaptation for multimedia)
  • GCC ( google congestion control) uses kalman filter on end to end OWD( one way delay) and compares against an adaptive threhsold to throttle the sending rate.

Transport Protocol Optimization

A low latency transport such as UDP is most appropriate for real time transmission of media packet, due to smaller packet and ack less opeartion. A TCP transport for such delay sensitive enviornments is not ideal. Some points that show TCP unsuited for RTP are :

  • Missing packet discovery and retrasmission will take atleast one round trip time using ack which either results in audible gap in playout or the retransmitted packet being discarded by altogether by decoder buffer.
  • TCP packets are heavier than UDP.
  • TCP cannot support multicast
  • TCP congestion control is inaplicable to real time media as it reduces the congestion window when packet loss is detected. This is unsuited to codecs which have a specific sampling like PCM is 64 kb/s + header overhead.

Fasten session establishment

Lower layer protocols are always required to have control over resources in switches, routers or other network relay points. However, RTP provides ways to ensure a fast real-time stream and its feedback including timestamps and control synchronization of different streams to the application layer.

  1. Mux the stream: RTP and RTCP share the same socket and connection, instead of using two separate connections.
  • (+) Since the same port is used fewer ICE candidates gathering is required in WebRTC.

2. Prioritize PoP ( points of presence) under quality control over open internet relays points.

3. Parallelize AAA ( authentication and authorization) with session establishing instead of serializing it.

TradeOff of Latency vs Quality

To achieve reliable transmission the media needs to be compressed as much ( made smaller) which may lead to loss of certain picture quality.

Lossy vs Lossless compression

Loss Less Compression will incur higher latency as compared to lossy compression.

Loss Less CompressionLossy Compression
(+) Better pictue quality(-) lower picture quality
(-) higher power consumption(+) lower power consumption at encoder and decoder
suited for file storagesuited for real time streaming

Intra frame vs Inter frame compressssion

Intra frameInter frame
Intra frame compression reduce bits to decribe a single frame ( lie JPEG)Reduce the bit to decode a series of frame by femoving duplicate information
I – frame : a complete picture without any loss
P – frame : partical picture with delat from previous frame
B – frame : a partical pictureusing modification from previous and future pictures.
suited for images

Container Formats for Streaming

Some time ago Flash + RTMP was the popular choice for streaming. Streaming involves segmenting a audio or audio inot smaller chunks which can be easily transmistted over networks. Container formats contain an encoded video and audio track in a single file which is then streamed using the streaming protocol. It is encoded in different resolutions and bitrates. Once receivevd it is again stored in a container format (MP4, FLV).

ABR formats are HTTP-based, media-streaming communications protocols. As the connection gets slower, the protocol adjusts the requested bitrate to the available bandwidth. Therefore, it can work on different network bandwidths, such as 3G or 4G.

TCP based streaming

  • (-) slow start due to three way handshake , TCP Slow Start and congestion avoidance phase
  • (+) SACK to overcoming reseding the whole chain for a lost packet

UDP based streaming

One of the first television broadcasting techinques, such as in IPTV over fibre with repeaters , was multicast broadcasting with MPEG Transport Stream content over UDP.  This is suited for internal closed networks but not as much for external networks with issues such as interference, shaping, traffic congestion channels, hardware errors, damaged cables, and software-level problems. In this case, not only low latency is required, but also retransmission of lost packets.

  • (-) needs FEC for ovevrcomming lost packets which causes overheads

Real-Time Messaging Protocol (RTMP)

Developed by Macromedia and acquired by Adobe in 2005. Orignally developed to support Flash streaming, RTMP enables segmented streaming. RTMP Codecs

  • Audio Codecs: AAC, AAC-LC, HE-AAC+V1 and V2, OPUS, MP3, SPEEX, VORBIS
  • Video Codecs: H.264, VP6, VP8

RTMP is widely used for ingest and usually has low latency ~5s. RTMP works on TCP usind default port 1935. RTMFP uses UDP by replaced chunked stream. It has many variations such as

  • (-) HTTP incompatible.
  • (-) may get blocked by some frewalls
  • (-) delay from 2-3 s , upto 30 s
  • (+) multicast supported
  • (+) Low buffering
  • (-) non support for vp9/HEVC/AV1

RTMP forms several virtual channels on which audio, video, metadata, etc. are transmitted.

RTSP (Real-Time Streaming Protocol)

RTSP is primarily used for streaming and controlling media such as audio and video on IP networks. It relies on extrenal codecs and security. It is

not typically used for low-latency streaming because of its design and the way it handles data transfer:
RTSP uses TCP (Transmission Control Protocol) as its transport protocol. This protocol is designed to provide reliability and error correction, but it also introduces additional overhead and latency compared to UDP.
RTSP cannot adjust the video and audio bitrate, resolution, and frame rate in real-time to minimize the impact of network congestion and to achieve the best possible quality under the current network conditions as modern streaming technologies do such as WebRTC.

  • (-) legacy requies system software
  • (+) often used by Survillance system or IOT systes with higher latency tolerance
  • (-) mostly compatible with android mobile clients

LL ( Low Latency) – HTTP Live Streaming (HLS)

HLS can adapt the bitrate of the video to the actual speed of the connection using container format such as mp4. Apple’s HLS protocol used the MPEG transport stream container format or .ts (MPEG-TS)

  • Audio Codecs: AAC-LC, HE-AAC+ v1 & v2, xHE-AAC, FLAC, Apple Lossless
  • Video Codecs: H.265, H.264

The sub 2 seond latency using fragmented 200ms chunks .

  • (+) relies on ABR to produce ultra high quality streams
  • (+) widely compatible even HTML5 video players
  • (+) secure
  • (-) higher latency ( sub 2 second) than WebRTC
  • (-) propertiary : HLS capable encoders are not accessible or affordable


MPEG DASH (MPEG DASH (Dynamic Adaptive Streaming over HTTP)) is primarily used for streaming on-demand video and audio content over HTTP. A HTTP based streaming protocol by MPEG, as an alternative to HLS. MPEG-DASH uses .mp4 containers as HLS streams are delivered in .ts format.  

MPEG DASH uses HTTP (Hypertext Transfer Protocol) as its transport protocol, which allows for better compatibility with firewalls and other network devices, but it also introduces additional overhead and latency, hence unsuitable for low latency streaming

  • (+) supports adaptive streaming (ABR)
  • (+) open source
  • (-) incompatible to apple devvices
  • (-) not provided out of box with browsers, requires sytem software for playback

WebRTC (Web Based Real Time Communication)

WebRTC is the most used p2p Web bsed streaming with sub second latency. Inbuild Latency control in WebRTC

Transport protocol

WebRTC uses UDP as its transport protocol, which allows for faster and more efficient data transfer compared to TCP (Transmission Control Protocol). UDP does not require the same level of error correction and retransmission as TCP, which results in lower latency.

P2P (Peer-to-Peer) Connections

WebRTC allows for P2P connections between clients, which reduces the number of hops that data must travel through and thus reduces latency. WebRTC also use Data channel API which can transmit data p2p swiftly such as state changes or messages.

Choice of Codecs

Most WebRTC providers use vp9 codec ( succesor to vp8) for video compression which is great at providing high quality with reduced data.

Network Adaptation

WebRTC adapts to networks in realtime by adjusting bitrate, resolution, framework with changes in networks conditions. This auto adjusting quality also helps WebRTC mitigate the losses under congestion buildup.

  • (+) open source and support for other open source standardized techs such as Vp8, Vp9 , Av1 video codecs , OPUS audio codec. Also supported H264 certains profiles and other telecom codecs
  • (+) secure E2E SRTP
  • (-) evolving still

SRT (Secure Reliable Transport)

Streaming protocol by SRT alliance. Originally developed as open source video streaming potocol by  Haivision and Wowza. This technology has found use primarily in streaming high-quality, low-latency video over the internet, including live events and enterprise video applications.

Packet recovery using  selective and immediate (NAK-based) re-transmission for low latency streaming , advanced codecs and video processing technologies and extended Interoperability have helped its cause.

  • (+) Sub second latency as WebRTC.
  • (+) codec agnostic
  • (+) secure
  • (+) compatible

 Each unit of SRT media or control data created by an application begins with the SRT packet header.

Scalable Video Encoding ( SVC) for low latency decoding

SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. 

RTCRtpCodecCapability dictionary provides information about codec capabilities such as codec MIME media type/subtype,codec clock rate expressed in Hertz, maximum number of channels (mono=1, stereo=2). With the advent of SVC there has been a rising interest in motion-compensation and intra prediction, transform and entropy coding, deblocking, unit packetization. The base layer of an SVC bit-stream is generally coded in compliance with H.264/MPEG4-AVC.

An MP4 file contains media sample data, sample timing information, sample size and location information, and sample packetization information, etc.

CMAF (Common Media Application Format)

Format to enable HTTP based streaming. It is compatible with DASH and HLS players by using a new uniform transport container file. Apple and Microsoft proposed CMAF to the Moving Pictures Expert Group (MPEG) in 2016 . CMAF features

  • (+) Simpler
  • (+) chunked encoder and chunked tranfer
  • (-) 3-5 s of latency
  • (+) increases CDN efficiency

CMAF Addressable MEdia Objects : CMAF header , CMAF segment , CMAF chunk , CMAF track file.

Logical compoennets of CMAF

CMAF Track: contains encoded media samples, including audio, video, and subtitles. Media samples are stored in a CMAF specified container. Tracks are made up of a CMAF Header and one or more CMAF Fragments.

CMAF Switching Set: contains alternative tracks that can be switched and spliced at CMAF Fragment boundaries to adaptively stream the same content at different bit rates and resolutions.

Aligned CMAF Switching Set: two or more CMAF Switching Sets encoded from the same source with alternative encodings; for example, different codecs, and time aligned to each other.

CMAF Selection Set: a group of switching sets of the same media type that may include alternative content (for example, different languages or camera angles) or alternative encodings (for example,different codecs).

CMAF Presentation: one or more presentation time synchronized selection sets.

Fanout Multicast / Anycast

Connection-oriented protocols find it difficult to scale out to large numbers as compared to connectionless protocols like IP/UDP. It is not advisable to implement hybrid, multipoint or cascading architectures for low latency networks as every proxy node will add its buffering delays. More on various RTP topologies is mentioned in the article below.

Ref :

Fault Tolerance and Error Correction in WebRTC

Fluctuating Networks

WebRTC has build in capabilities to detect network glitches and adapt itself to changing situations. Some of the methodologies used are listed below.

Dynamic Bandwidth estimation

Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.


An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.

A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.

  • (+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
  • (-) can introduce latency for packets that arrive early
  • (-) Need active resisizing by means of feedback
    • for hi speed and goog network jitterbuffer can ve small sized
    • for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
  • (-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.

SDP renegotiation


Demand for High Quality Video

Applications telehealth, advertising or broadcasting on WebRTC media streams

Tradeoff between Latency vs Quality

Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.

Layering for adaptive streaming

using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.

  • (-) more complex compression algorithms

Better compression algorithms vs CPU compute

A better performing compression algorithm produces fewer bits to encode the same video quality as its predecessor.

  • (-) Higher performing compression engines most always has higher energy consumption and carbon footprint
  • (+) resilent to network fluctuations

Full INTRA-frame Request (FIR)

Requests a key frame to decode the frame. Can be used when a new peer joins the conference a key frane is required to start decoding its video strea,.

Picture Loss Indication (PLI)

Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
request a full key frame from the sender , when new memeber enters the session.request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss

Redundant Encoding (RED) in Media Packets

Recovers packet loss under lossy networks by adding extra bits of information in following packets.

  • (+) good for unpredictable networks

LBRR ( low bit-rate redundancy) – tbd


Congestion is created when a network path has reached its maximum limits which could be due to

  • failures(switches, routers, cables, fibres ..)
  • over subscription and operating at peak bandwidth.
  • broadcast storms
  • Inapt BGP routing and congestion detection
  • BGP is responsisble for finiding the shortest routable path for a packet

The direct consequences of congestion for any network transport can be

  • High Latency
  • Connection Timeouts
  • Low throughput
  • Packet loss
  • Queueing delay

With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.

Feedback Loop

A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.

Overcome congestion with lower bitrate

Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems

Reduce frame quality and resolution

Full HD constraint
vga constraints

Congestion control Algorithms : Google Congestion Control ( GCC)

Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.

  • On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
  • On the sender side TWCC(Transport wide congestion control) can be used.

Other congestion control algorithms

  • QUIC Loss Detection and Congestion Control RFC 9002
  • Coupled Congestion Control for RTP Media rfc8699
  • NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
  • Self-Clocked Rate Adaptation for Multimedia RMCAT WG
  • SCReAM – Mobile optimised congestion control algorithm by Ericson

Low Network Strength and High Packet Loss

Packet loss is the loss of packets in transmission which could be owing to

  • network resources and path
  • transmission medium congestion
  • applications inability to absord delayed packets.
  • Maximum Transmission Unit size : measure of how large a single apcket can be.

Recovering Lost packets

High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.

Acknowledgement to identify packet loss

A receiver can notifiy the sender of the possible concerns around packet loss by means of sendings acks.

  • Selective Acknowledgement (SACK) : notifies the sender of multiple packets and thereby indicating gaps
  • Negative Acknowledgements (NACK ) : notifies the sender of packets lost
    • RTCP Packet Type 193 denotes NACK.
  • (+) higher NACK count is suggestive of high packet loss
  • (-) round trip time for NACK to send and waiting for packet to be retransmitted and receive in response can cause significant delay

Forward Error Correction (FEC)

The sender proactively send redundant data such that lost packets dont affact the stream on receiver’s end.

  • (+) receiver doesnt have to request for exgtra data to be sent , the sender does it by itself at RTP level
  • (+) less delay than NACK which incurs round trip time
  • (-) involve extra bandwidth.

Long distance Calls and High Round Trip Time

Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.

transmission_time = timestamp_send - timestamp_receive

For this reason RTT( Round Trip Time)is a better means to avoid clock synchoronization errors.

transmission_time = rtt /2 

Using Receiver reports and Sender Reports from RTCP to adjust to network conditions

Sender and receiver reports (SR and RR) provide a highlight of the connection and media quality streaming on this connection.

RTCP Senders report for WebRTC media stream
RTCP receivers report for WebRTC media stream

Low Latency Media Streaming

Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.

Optimize Compute resource

  • mobile agents have lesser computative power
  • Camera with features such as auto focus or other adjustments will taker more time to cappture
  • network should be of suited bandwidth and strength

Reduce information to be encoded and sent

  • Subject focus and blurring backgroud
  • Filtering noise at source
  • Voice Activity Detection (VAD)
    • send extra data in FEC only is there is voice activity detected in packet
  • Echo Cancellation

Measuring latency

Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization

NTP Synchronization of Audio Video Sync

During the buffereng of incoming [ackets ( which canrage from few ten of miliseconds to few hundred milisecond ) the streams are synchronized.

Time used by RTP for sync is NTP and RTP based ( which are not required to be in sync).

  • NTP Timestamp : 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. Formatted as fractional seconds since Jan 1, 1900
  • RTP Timestamp : RTP timestamp corresponds to the same instant as the NTP timestamp. Expressed in the units of the RTP media clock.
    • Majority of video formats use a 90kHz clock.
    • For receiver to sync audio and video streams these two streasm must be from same clock
Frame 300: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0 (outbound)
Packet type: Sender Report (200)
Length: 6 (28 bytes)
Sender SSRC: 0x39a659b4 (967203252)
Timestamp, MSW: 3855754463 (0xe5d224df)
Timestamp, LSW: 2364654374 (0x8cf1c326)
[MSW and LSW as NTP timestamp: Mar 8, 2022 18:54:23.550563999 UTC]
RTP timestamp: 1110449770

Demand for higher security on WebRTC’s CPaaS

Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.

Features :

  • multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
  • Multistreaming transmit several independent streams of chunks in parallel
  • SCTP has similarities to TCP retransmission and partial reliability like UDP.
  • Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
  • Validation and acknowledgment mechanisms protect against flooding attack

SCTP frames data as datagrams and not as a byte stream

  • (+) SCTP enables WebRTC to be multiplexing
  • (+) It has flow control and congestion avoidance support
  • (+)  

End to End Encryption

End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.

Minimize Public-private mapping pairs vai RTCP-mux

Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.

With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .

  • (+) easier to manage security by gathering ICE candidates for a single port only instead of 2
  • (+) increases the systesm capacity for media session using the same number of ports
  • (+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
  • WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing

References :

AEC (Echo Cancellation) and AGC (Gain Control) in WebRTC

Echo is the sound of your own voice reverberating. If the amplitude of such a sound is high and intervals exceed 25 ms, it becomes disruptive to the conversation. Its types can be acoustic or hybrid. Echo cancellers need to eliminate the echo while still preserving call quality and not disrupting tones such as DTMF.

Acoustic Echo 

Usually the background or reflected noise which is an undesired voiceband energy transfers from the speaker to the microphone and into the communication network. Mostly found in a hands-free set or speakerphone. In a multiparty call scenario, it could also occur due to unmatched volume levels, challenging network conditions on one party, background noise, double talk or even proximity between user and microphone

Hybrid / Electronic Echo in PSTN phones

In a public telephone system, local loop wiring is done using two-wire connections carrying bidirectional voice signals. In PBX, a two-to-four wire conversion is done using a hybrid circuit which does not perform perfect impedance matches resulting in a Hybrid echo.

echo AEC
Hybrid / Electronic Echo in PSTN phones

Echo Cancellation

An efficient echo canceller should cancel out the entire echo tail while not leading to any packet loss. It needs to be adaptive to changing IP network bandwidth and algorithm should function equally well in conference scenarios  where there may be more than one echo sources. Benchmarking tools like MOS (Mean Opinion scores ) are used to gauge the  results. Often voice quality enhancement technologies are also integrated into AEC modules, such as :

  • automatic Gain control ( AGC) ,
  • Noise Reduction
  • Confort Noise Generator ( CNG)
  • Non linear processor
  • tone Disabler for SS& and DTMF tones
echo AEC 2
Automatic Echo Cancellation

WebRTC Echo Cancellation

WebRTC now actively detects and removes echo especially the local system echo resonance.

Noise Suppression in WebRTC

Noise suppression automatically filters the audio to remove background noise.

Automatic Gain Control (AGC)

AGC works as a circuit. When the average audio level is low , circuit raises it and if the audio level is high the circuit brings it down.

  • (+) AGC frees the user from manually tuning the audio level.
  • (-) During a pause too , agc tries to bring audio level to standard setting making background noises louder.
  • (-) subesquent audio processing make gain control progressively worse.

Audio Compressor : Due to the drawbacks with AGC , Audio Compressers carry the operation more sophistically by looking at amplitude of the sound.

(-) not ideal for music which had varrying sound amplitude.

Audio Peak Limiter : Limiters simply keep the audio from exceeding a set maximum level.

(+) well suited for avoiding loud noise such as door slam from entering the processing pipeline.

Audio Expanders :increase the dynamic (loudness) range of audio that has been overly processed.

(+) suited for over compressed audio transmissiono such as Satellite relays

Audio Filters :attenuate audio frequencies either above or below certain points within the audio range.

AGC in webRTC


aspectRatio: true
autoGainControl: true
brightness: true
channelCount: true
colorTemperature: true
contrast: true
deviceId: true
echoCancellation: true
exposureCompensation: true
exposureMode: true
exposureTime: true
facingMode: true
focusDistance: true
focusMode: true
frameRate: true
groupId: true
height: true
iso: true
latency: true
noiseSuppression: true
pan: true
pointsOfInterest: true
resizeMode: true
sampleRate: true
sampleSize: true
saturation: true
sharpness: true
tilt: true
torch: true
whiteBalanceMode: true
width: true
zoom: true

WebRTC Get User MEdia with various values of autoGainControl

References :

VoIP API design

  • Public API endpoints
  • Internal API gateways
  • API Rate Limiter
    • Token based Rate Limiting
    • Token bucket filter
    • Hierarchical Token Bucket (HTB)
    • Fair Queing
    • CBQ (Class Based Queing)
    • Modular QoS command-Line interface (MQC) Shaping
  • Throttling

VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.

Public API endpoints

Programmatic call control APIs

  1. Making a Call

HTTP POST https://www.altteelcom.com/voice/call


to: '+14155551212',
from: '+18668675310'

Calback params

statusCallback: 'https://www.myapp.com/events',
statusCallbackEvent: ['initiated', 'answered'],
statusCallbackMethod: 'POST'


"from": "+9999999999"
"to": "+111111111",
"status": "ongoing"

"date_created": "Mon, 5 Sep 2020 20:36:28 +0000"
"start_time": "Mon, 5 Sep 2020 20:36:29 +0000"
"date_updated": "Mon, 5 Sep 2020 20:36:44 +0000"
"direction": "outbound",
"duration": ""
"end_time": ""

"price": "-0.03000"
"price_unit": "USD"

The response can additional have SID and app version and other URI for recording , transcription , apyment and other services for this call .

2. Ending an ongoing Call

HTTP UPDATE https://www.altteelcom.com/voice/call/callid001


status: 'end'

This updates the end time of the call and sets the evenst for CDR processing

Services API

  • Call Reording
  • Call transcription

Confernece APIs
HTTP POST https://www.altteelcom.com/voice/conferences

  • creating a conf
  • fetching conf based on date or room name
  • updating a ongoing conf
  • ending a conf
  • set IVR announcement on ongoing conf

Auth API


HTTP POST https://www.altteelcom.com/cdr

  • get CDR ( filtered per cal or acc to specific date or account)
  • bulk export of CDR

Internal API gateways

API Rate Limiter

Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.

Auto scaling Load balancerRate Limiter
horizotal or vertical scalling can countger incoming trafficLB can limit number of simultaneous requests. It can reject or send to queue for later operationCan intelligently understand the cost of each operation and perform throttling.
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately(-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations)
(-) LB cannot ensure uniform distribution of distribution of operations among all servers.

A rate limiter should have low latency, accurate and scalable.

RateLimiter inside the serviceprocessRate Limiter as its own process outside as a daemon
(+) faster , no IPC
(+) reisstnt to interprocess call failures
(+) programming langiage agnostic daemon
(+) uses its own memory space, more predictable
(-) service meory needs to allocate space for rate limiters
widely used for auto discovery of service host

Token based Rate Limiting

 provides admission contro

Token bucket filter

define a users quota in terms average rate and burst capacity

Hierarchical Token Bucket ( HTB)

 uses the deficit round-robin algorithm for fair queuing

Fair Queing

give paying users a bandwidth fraction of 25%

priority queuing

decide 1 packet/ms for free or reduce rate user

distributes that sender’s bandwidth among the other senders

CBQ ( Class Based Queing)

Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.

Modular QoS command-Line interface (MQC) Shaping

mplement traffic shaping for a specific type of traffic using a traffic policy

  • When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
  • When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
  • When the buffer queue is full, the device discards the buffered packets.


  • delay the packet until the bucket is ready / shaping
  • drop the packet / Policing
  • mark the packet as non-compliant

Failure management on Rate Limiter

  • Node Crash : just less requests trolled
  • Leaky bucket
  • tokens can go into -ve

System Design for API gateway

Important points for design API gateway

  • Serialize data in company binary format
  • allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
  • aggregation on API gateway on the fly
Frontend ServicePartitioned ServiceBackend Service
Lightweight web service
Request Validation
Auth / Authorization
TLS(SSL ) termination
Server sode encryption
Rate Limiting(throttling)
Request deduplication
Caching layer between frontend and backend
Leader Selection + Quorem

Distributed messaging system( fast and slow paths) for API

A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.

Applications of this system design

  • Find heavy hitters ( Top K problem )
  • Popular products / trends
  • Voltaile stocks
  • DDoS Attack Prevention

References :

High availiability and Scalibility in VoIP platforms

Load Balancers

Load Balancer(LB) is the initial point of interaction between the client application and the core system. It is pivotal in the distribution of the load across multiple servers and ensuring the client is connected to the nearest VoIP/SIP application server to minimize latency. However, the load balancers are also susceptible to security breaches and DOS attacks as they have a public-facing interface. This section lists the protocols, types and algorithms used popularly in Load balancers of VoIP systems.

software LBLayer 4 / hardware LB
Amazon ELB ( eleastic load balanecr)
F5 BIG-IP load balancer
CISCO system catalyst
Barracuda load balancer
used by applications in cloud
ADN (Application delivery network)
used by  network address translators (NATs) 
DNS load balancing
examples and roles of software and hardware based load balancers

Load Balancers(LB) ping each server for health status and greylists servers that are unhealthy( respond late) as they may be overloaded or experiencing congestion. The LB monitors it rechecks after a while and if a server is healthy ( ie if a server responds with responds with status update) it can resume sending traffic to it. LB should also be distributed to different data centres in primary-secondary setup for HA.

Networking protocol

TCP LoadbalancerHTTP load balancersSIP based LB as Kamailio/ Opensips
can forwrad the packet without inspecting the content of the packet.terminate the connection and look inside the request to make a load balcing decsiion for exmaple by using a cookie or a header.domain specific to VoIP
(+) fast, can handle million of req per second(+) handle SIP routing based on SIP headers and prevent flooding atacks and other malicious malformed packets from reaching application server

Load balancing algorithms

  • Weighted Scheduling Algorithm
  • Round Robin Algorithm
  • Least Connection First Scheduling
  • Lest response time algorithms
  • Hash based algorithm ( send req based on hashed value such as suing IP address of request URL)
LoadbalancerReverse Proxy
forward proxy server allows multiple clients to route traffic to an external serveraccepts clients requestd for server and also returns the server’s response to the client ie routes traffic on behalf of multiple servers.
Balances load and incoming traffic endpointpublic facing endpoint for outgoing traffic
 additional level of abstraction and security, compression
used in SBC (session border controllers) and gateways

Service Discovery

Client-side or even backend service discovery uses a broadcasting or heartbeat mechanism to keep track of active servers and deactivates unresponsive or failed servers. This process of maintaining active servers helps in faster connection time. Some approaches to Service Discovery

  1. Mesh
    1. (-) exponentially incresing network traffic
  2. Gossip
  3. Distributed cache
  4. Coordination service with Service
    • (-) requesres coordination service for leader selection
    • (-) needs consensus
    • (-) RAFT and pbFT for mnaging failures
  5. Random leader selection
    • (+) quicker
    • (-) may not gurantee one leader
    • (-)split brain problem

Keepalive, unregistering unhealthy nodes

Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint.


Usuallay there is a tradeoff between liveness and safety.

  1. Single leader replication
    • (-) vulnerable to loss of data is leader goes down before replication completes
    • used to in sql
  2. multileader replication and
  3. leaderless replication
    • (-) increases latencies
    • (-) quorem based on majority , cannot function is majority node are not down
    • used in cassandra

Data Store Replication

For Relatonal Dataabase

For NoSQL databse replication and HA

Quick Response / Low latency

Message format

Textual Message formatBinary Message Format
human readbale like
json xml
diff to comprehend , need shared schema between sender and receiver to serilaize and deserialze ,
names for every field adds to size no field name or only tags , reduces message size

Gateways for faster routing and caching to services

gateways are single entry point to route user requests to backend services .

Separate hot storage from cold storage

hot storage is frquently accessed data which must be near to server

cold storage is less frequently accessed data such as archives

  • object storage
  • slow access


To make a system :-

  • scalable : use partitioing
  • reliable : use replication and checkpointing to not loose data in failures
  • fast : use in -memory usage

According to CAP theorem Consistency and Availiability are difficult to achieve together and there has to be a tradeof acc to requirnments.


Partition strategy can be based on various ways such as :-

  • Name based partition
  • geographic partition
  • names’ hashed value based on identifier
    • (-) can lead to hot partitions ( high density in areas of freq accessible identioers )
    • (-) high density spots for example all messages with a null key to go to the same partition
    • (-) doesnt scale
  • event time based hash
    • (+) data is spread evenly over time

To create a well distributed partition we could spread hot partition into 2 partitions or dedicate partitions for freq accessible items. An effective partitioning keys uses

  • Cardinality : total num of unique keys for a usecase. High cardinality leads to better distribution.
    • high cardinatility keys : names , email address , url since they have high variatioln
    • low cardinatlity keys : boolean flags such as gender M/F
  • Selectivity : number of message with each key. High selectivity leads to hotspots and hence low selectivity is better for even distribution.


Scale Out not Up !

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management used in DevOps. I have mentioned this in detail on the article on VoIP and DevOps below.

Multiple PoPs (point of presence)

for a VOIP system catering to many clients accross the globe or accessing multiple carriers meant for different counteries based on Prefix matching , there should be alocal PoP in most used regions . typically these regions include – US east – west coasts, UK – germany of London , Asia Pacific – Mumbai ,Hong Kong and Australia.

Minimal Latency and lowest amount of tarffic via public internet

Creating multiple POPs and enabling private traffic via VPN in between them ensures that we use the backbone of our cloud provider such as AWS or datacentre instead of traversing via public internet which is slower and more insecure. Hopping on a private interface between the cloud server and maintaining a private connection and keepalive between them helps optimize the traffic flow while keeping the RTT and latency low.

HA ( High-availability )

Some factors affecting Dependability are

  • Eventual Consistency
  • MultiRegion failover
  • Disaster Recovery

A high-availability (HA) architecture implies Dependability.Usually via existence of redundant applications servers for backups: a primary and a standby. These applications are configured so that if primary fails, the other can take over its operations without significant loss of data or impact to business operations.

Downtime / SLA of 5 9’s in aggregate failures

4 9’s of availiability on each service components gives a downtime of 53 mins per service each year. However in aggregate failure this could amlount to (99.99)10 = 99.9 downtime which is 8-10 hours each year.

Thus, aggregate failure should be taken into consideration while designing reliable systems.

HA for Proxy / Load balancer (LB)

A LB is the first point of contact for outbound calls and usually does not save the dialogue information into memory or database but still contain the transaction information in memory. In case the LB crashes and has to restart, it should

  • have a quick uptime
  • be able to handle in dialogue requests
  • handle new incoming dialogue requests in a stateless manner
  • verify auth/authorization details from requests even after restart

HA for Call Control app server

App server is where all the business logic for call flow management resides and it maintains the dialog information in memory.

Issues with in-memory call states : If the VM or server hosting the call control app server is down or disconnected, then live calls are affected, this, in turn, causes revenue loss. Primarily since the state variable holding the call duration would be able to pass onto the CDR/ billing service upon the termination of the call. For long-distance, multi telco endpoint calls running hours this could be a significant loss.

  • Standby app server configuration and shared memory : If the primary app server crashes the standby app server should be ready to take its place and reads the dialog states from the shared memory.
  • Live load balanced secondary app server + external cache for state varaibles : External cache for state variables: a cluster of master-slave caches like Redis is a good way of maintaining the dialogue state and reading from it once the app server recovers from a failed state or when a secondary server figures it has a missing variable in local memory.

Media Server HA

Assuming the kamailio-RTPengine duo as App server and Media Server. These components can reside in same or different VMs. Incase of media server crash, during the process of restoring restarted RTpengine or assigning a secondary backup RTpengine , it should load the state of all live calls without dropping any and causing loss of revenue . This is achived by

  • external cache such as Redis ,
  • quick switchover from primary to secondary/fallback media server and
  • floating IPs for media servers that ensures call continuity inspite of failure on active media server.

Architecturally it looks the same as fig above on HA for the SIP app server.

Security against malicious attacks

Attacks and security compromisation pose a very signficant threat to a VoIP platform.

MITM attacks

Man in midddle attacks can be counetred by

  • End to end encryption of media using SRTP and signals using TLS
  • Strong SIP auth mechanism using challenges and creds where password is composed of mixed alphanumeric charecters and atleast 12 digits long
  • Authorization / whitelisting based on IP which adheres to CIDR notation

DDOS attacks

DDOS renders a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.

dDOS – multiple network hosts to flood a target host with a large amount of network traffic. Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

Can be counetred by

  • detect flooding and q in traffic and use Fail2ban to block
  • challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required)

Read about SIP security practices in deatils https://telecom.altanai.com/2020/04/12/sip-security/

Other important factors leading to security

  • Keystores and certificate expiry tracker
  • priveligges and roles
  • Test cases and code coverage
  • Reviewers approval before code merge
  • Window for QA setup and testing , to give go ahead before deployment

Identifying outages and Alerting

Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion .

Logging and Alerting for a VoIP CPaaS platform .
Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion.

Sources for alert manager

  • Build failed ( code crashes, Jenkins error)
  • Deployment failed ( from Kubernetes , codechef, docker ..)
  • configuration errors ( setting VPN etc )
  • Server logs
  • Server health
  • homer alerts ( SIP calls responses 4xx,5xx,6xx)
  • PCAP alerts ( Malformed SIP SDP ..)
  • Internal Smoke test ( auto testing procedure done routinely to check live systems )
  • Support tickets from customer complaints ( treat these as high priority since they are directly impacting customers)


The test bed and QA framework play a very crticial role in final product’s credibility and quality.

Performance Testing

  • Stress Testing : take to breaking
  • Load Testing : 2x to 3x testing
  • Soak Testing : typical network load to long time ( identify leaks )

Robust QA framework( stress and monkey testing) to identify potential bottlenecks before going live

A QA framework basically validates the services and callflows on staging envrionment before pushing changes to production. Any architectural changes should especially be validated throughly on staginng QA framework befire making the cut. The qualities of an efficient QA platform are :

Genric nature – QA framework should be adatable to different envrionments such as dev , staging , prod

Containerized – it should be easy to spn the QA env to do large scale or small scale testing and hence it should be dockerized

CICD Integration and Automation – integrate the testcases tightly with gt post push and pull request creation . Minimal Latency and lowest amount of tarffic via public internet

Keep as less external dependecies as possible for exmaple a telecom carrier can be simulated by using an PBX like freeswitch or asterix

Asynchronous Run – Test cases should be able to run asynchronously. Such as seprate sipp xml script for reach usecase

Sample Testcases for VoIP

  • Authentication before establish a session
  • Balance and account check before establishing a session like whitelisting , blacklisting , restricted permission in a particular geography
  • Transport security and adaptibility checks , TLS , UDP , TCP
  • codec support validation
  • DTMF and detection
  • Cross checking CDR values with actual call initiator and terminator party
  • cross checking call uuid and stats
  • Validating for media and related timeouts

QA frameworks tools – Robot framework

traffic monitor – VOIP monitor

customer simulator – sipP scripts

network traffic analyser – wireshark

pcap collevcter – tcpdump , sngrep

Distributed Data Store

A Distributed Database Design could have many components. It could work on static datastore like

  • SQL DB where schema is important
    • MySQL
    • postgress
    • Spanner – Globally-distributed database from Google
  • NoSQL DB for to store records in json
    • Cassandra – Distributed column-oriented database
  • Cache for low latency retrivals
    • Memcached – Distributed memory caching system
    • Redis – Distributed memory caching system with persistence and value types
  • Data lakes for heavy sized data
    • AWS s3 object storage
    • blob storage
  • File System
    • Google File System (GFS) – Distributed file system
    • Hadoop File System (HDFS) – Open source Distributed file system

or work on realtime data streams

  • Batch processing ( Hadoop Mapreduce)
  • Stream processing ( Kafka + spark)
    • Kafka – Pub/sub message queue
  • Cloud native stream processing ( kinesis)

Each component has its own pros and cons. The choice depends on requirnments and scope for system behaviour like

  • users/customer usuage and expectation ,
  • Scale ( read and write )
  • Performnace
  • Cost
Users/customersScale ( read / write)PerformanceCost
Who uses the system ?
How the system will be used?
Read / writes per second ?
Size of data per request ?
cps ( calls or click per second) ?
write to read delay ?
p99 latency for read querries ?
should design minimize the cost of development ?

should design mikn ize the cost of mantainance ?
spikes in traffic eventual consistency ( prefer quick stale data ) as compared to no data at all
redundancy for failure management

Some fundamental constrains while design distributed data structure :-

p99 latency : 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.

Request latency:
    min: 0.1
    max: 7.2
    median: 0.2
    p95: 0.5
    p99: 1.3

Inidiviual Events vs Aggregate Data

Inidividual Events ( like every click or every call metric)Aggregate Data ( clicks per minute, outgoing calls per minute)
(+) fast write
(+) can customize/ recalculate data from raw
(+) faster reads
(+) data is fready for decision making / statistics
(-) slow reads
(-) costlier for large scale implementations ( many events )
(-) can only query in the data as was aggregates ( no raw )
(-) requires data aggregation pipeline
(-) hard to fix errors
suitable for realtime / data on fly
low expected data delay ( minutes )
suitable for batch processing in background where delay is acceptable from mintes to hours

Push vs Pull Architecture

Push : A processing server manages state of varaible in memory and pushes them to data store.

  • (-) crashed processingserver means all data is lost

Pull : A temporary data strcyture such as a queue manages the stream of data and processing service pull from it to process before pusging to data stoore.

  • (+) a crashed server has to effect on temporarily queue held data and new server can simply take on where previous processing server left.
  • (+) can use checkpointing
Structured and Strict schema
Relational data with joins
Semi-structured data
Dynamic or flexible schema
(+) faster lookup by index(-) data intensive workload
(+) high throughput for IOPS (Input/output operations per second )
used for
Account information
best suitable for
Rapid ingest of clickstream and log data
Leaderboard or scoring data
Metadata/lookup tables
DynamoDB – Document-oriented database from Amazon
MongoDB – Document-oriented database

A NoSQL databse can be of type

  • Quorem
  • Document
  • Key value
  • Graph

Cassandra is wide column supports asyn master less replication

Hinge base also a quorem based db also has master based preplication

MongoDB documente orientd DB used leacder based replication

SQL scaling patterns include:

  • Federation/ federated database system : transparently maps multiple autonomous database systems into a single virtual/federated database.
    • (-) slow since it access multiple data storages to get the value
  • Sharding / horizontal partition
  • Denormalization : Even though normalization is more memory efficient denormalization can enhance read performance by additing redundant pre computed data in db or grouping related data.
    • Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.
  • SQL Tuning : “iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals”

Influx DB : to store time series data

AWS Redshift

Apache Hadoop


Embeed Data : RocksDB

Message Queues(Buffering) vs Batch Processing

Distributed event management, monitoring and working on incoming realtime data instead of stored Database is the preferred way to churn realtime analysis and updates. The multiple ways to handle incoming data are

  1. Batch processing – has lags to produce results, not time crtical
  2. Data stream – realtime response
  3. Message Queues – ensures timely sequence and order
Add events to buffer that can be read Add events to batch and send when batch is full
(+) can handle each event(+) cost effective
(+) ensures throughput
(-) if some events in batch fail should whole batch fail ?
(-) not suited for real time processing
S3 like objects storage + Hadoop Mapreduce for processing


  • Connection timeout : use latency percentiles to calculate this
  • Request timeout


  • exponential backoff : increase waiting time each try
  • jitter : adds rabdomness to retry intervals to spread out the load.

Grouping events into object storage and Message Brokers

slower than stream processing but faster than batch processing.

Distributed Event management and Event Driven architecture using streams

In event driven archietcture a produce components performs and action which creates an event thata consumer/listener would subscribes to consume.

  • (+) time sensitive
  • (+)Asynch
  • (+) Decoupled
  • (+) Easy scaling and Elasticity
  • (+) Heterogeneous
  • (+) contginious

Expanding the stream pipeline

Event Streams decouple the source and sink applications. The event source and event sinks (such as webhooks) can asynchronously communicate with each other through events.

Options for stream processing architectures

  • Apache Kafka
  • Apache Spark
  • Amazon kinesis
  • Google Cloud Data Flow
  • Spring Cloud Data Flow

Here is a post from earlier which discusses – Scalable and Flexible SIP platform building, Multi geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Failure Recovery, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, Load Balancer / Message Dispatcher, Back end Dynamic Routing and REST API services, Containerization and Auto Deployment, Auto scaling Cloud Servers using containerized images.

Lambda Architecture

Stream processing on top of map reduce and stream processing engine. In lambda architecture we can send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Lambda Archietcture : stream processing on top of map reduce and stream processing engine. Send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Apache Kafka is used as source which is a framework implementation of a software bus using stream-processing. “.. high-throughput, low-latency platform for handling real-time data feeds”.

Apache Spark : Data partitioning and in memory aggregation.

Distributed cache for call control Servers

Dedicated Cache ClusterCo located cache
Isolates cache fro service
Cache and service do not share memory and CPU
can scale independently
can be used by many microservices
flexibility in choosing hardware
doesnt require seprate hardware
low operational and hardware cost
scales together with the service

Choosing cache host

  • Mod function
    • (-) behaves differently when a new client is added or one is removed , unsuitable for prod
  • Consistent hashing ( chord)
    • maps each value to a point on circle

Cache Replacement

Least Recently Used Cache Replacement

Consistency and High Availiability in Cache setup

ReadReplicas live in differenet data centre for disaster recovery.

Strong consistency using Master Slave

Circuits – fail fast, wait for circuit to recover before using again

Design patterns for a circuit base setup to gracefully handle exceptions using fallback.

Circuit breaker : stops client from repeatedly trying to exceute by calculate the error threshold.

Isolated thread pool in circuits and ensure full recovery before calling the service again.

(+) Circuit breaker event causes the entire circuit to repair itself before attempting operations.

References :

Software Defined Networks ( SDN) and Network Function Virtulaization ( NFV) for Communication networks

Innovations in telecommunication today are largely driven by the advancements in Open source tech tools, standards and stacks. IP-based video and voice communication systems, Unified Communication systems such as Enterprise CPaaS platforms or even an external independent VoIP provider. The challenge for service providers today is that operating costs are growing faster than revenues. A large number of growing systems and vendors make operation a complex and expensive process.

Discrepancies between traffic growth and revenue growth (Source: Accenture)

Maintaining a network for communication service providers can be a complex and challenging task for several reasons:

  1. Network maintenance and upgrades: Service providers must constantly maintain and upgrade their networks to ensure that they are able to provide reliable service to their customers. This can involve replacing outdated equipment, installing new technology, and troubleshooting issues that arise.
  2. Managing traffic: Service providers must manage the traffic on their networks to ensure that it is distributed efficiently and that users are able to access the services they need. This can be a challenge, especially when the network is congested or there are unexpected spikes in traffic.
  3. Ensuring security: Communication networks are vulnerable to a variety of security threats, including hacking, malware, and denial of service attacks. Service providers must take measures to protect their networks and their customers’ data from these threats.
  4. Managing costs: Maintaining a communication network can be expensive, and service providers must find ways to manage costs while still providing high-quality service to their customers.
  5. Meeting regulatory requirements: Service providers must comply with a variety of regulations, including those related to privacy, data protection, and network security. Failing to comply with these regulations can have serious consequences, including fines and reputational damage.

Network Virtualisation

Network virtualization is the process of creating a virtual version of a network, including the hardware, network topology, and protocols, using software. This allows multiple virtual networks to be created and run on the same physical infrastructure, which can be used to isolate different network environments, test new network configurations, or provide network resources as a service.


  • NFV is SW-defined network functions with separation of HW and SW. Once network elements are SW-based, network HW can be managed as a pool of resources
  • SDN is Interconnecting Virtual Network Functions with separation of control and data plane. Orchestration together with SW domain

There are several ways to implement network virtualization, including using software-defined networking (SDN) technologies, which allow the network to be controlled and managed using software, and using virtualization technologies such as virtual LANs (VLANs) or virtual private networks (VPNs) to create isolated network segments within a larger network. In a virtualized network the setup network functionalities are SW-based over COTS HW. Multiple roles can be made over same HW.

Network Virtualisation for Telcos

Network Virtualisation is an opportunity to build mouldable networks and redefine the architecture to make the infrastructure uniform.Virtual network services lowered CAPEX. Lessening dependencies on proprietary hardware and dedicated appliances.

  • (+) Improves management of risk in a changing and ambiguous environment
  • (+) capacity alteration Network flexibility
  • (+) scalability
  • (+) Service provisioning speed
  • (+) holistic management:
  • (+) granular security

There are several approaches to network virtualization that service providers can use, including:

  1. Network Function Virtualization (NFV): NFV involves virtualizing network functions, such as routers, firewalls, and load balancers, and running them on standard servers or other off-the-shelf hardware using virtualization platforms like VMware or OpenStack.
  2. Software-Defined Networking (SDN): SDN involves separating the control plane (which determines how data is routed through the network) from the data plane (which carries the actual data). This allows the control plane to be more flexible and responsive to changes in the network.
  3. Virtual Private Network (VPN): A VPN allows service providers to create virtual private networks (VPNs) over the public Internet, allowing them to securely connect users to the resources they need.

Service providers can use network virtualization to reduce costs, increase flexibility, and improve the scalability and reliability of their networks. Managed Service Providers (MSPs) can use a single viewpoint and toolset to manage virtual networking, computing and storage resources. However, implementing network virtualization can also be complex and require significant investments in hardware, software, and training.

Software Defined Network (SDN)

A software-defined network (SDN) is a networking architecture that uses software provisioning interfaces to control and manage the flow of traffic in a network. In an SDN, the control plane, which determines how data is routed through the network, is separated from the data plane, which carries the actual data traffic.

The main benefit of an SDN is that it allows the control of the network to be abstracted from the underlying hardware. This makes it possible to use software to dynamically configure the network, rather than relying on fixed configurations that are set using hardware switches and routers. SDN allows network administrators to easily and quickly change the way that data is routed through the network, which can be useful in a variety of scenarios. For example, an SDN can be used to optimize the flow of traffic in a data center, or to quickly reconfigure a network in response to changing traffic patterns or security threats such as DDoS.

SDN planes

Image Credits : Shqip: Arkitektura SDN, 27 June 2021, From Wikimedia Commons, the free media repository Source https://www.researchgate.net/publication/332970813_Security_for_5G_and_Beyond
  1. Control plane: The control plane is the part of the SDN that determines how data is routed through the network. It consists of a central controller, which is a software application that runs on a server, and a series of software agents that run on the network devices (such as switches and routers). The controller communicates with the agents using a protocol such as OpenFlow, which allows it to control the flow of traffic in the network.
  2. Data plane: The data plane is the part of the SDN that carries the actual data traffic. It consists of the network devices (such as switches and routers) that forward data packets through the network.
  3. Management plane: The management plane is the part of the SDN that is responsible for configuring and managing the network. It consists of a set of tools and applications that allow network administrators to monitor and control the network.
  4. Application plane: The application plane is the part of the SDN that consists of the applications that run on the network. These applications may include things like web servers, email servers, and database servers.

Software-defined network functions separates hardware and software. Once network elements are Software-based, network harware can be managed as a pool of resources. Separating route/switching intelligence from packet forwarding reduces hardware prices as routers and switches must compete on price-performance features.

SDN interconnects Virtual Network Function and orchestrated with SW domain. Enables separation of control and data plane.Setting up networks in an SDN can be as easy as creating VM instances, and the way SDNs can be set up is a far better complement to VMs than plain old physical networks. SDNs enable “network experimentation without impact”. Overcome SNMP limitations and experiment with new network configurations without being hamstrung by their consequences.

  • Infrastructure Savings
  • Reducing margin of Error : By eliminating manual intervention, SDNs enable resellers to reduce configuration and deployment errors that can impact the network.
  • Operational Savings: SDNs lower operating expenses. Network services can be packaged for application owners, freeing up the networking team.
  • Flexibility: SDNs create flexibility in how the network can be used and operated. Resellers can write their own network services using standard development tools.
  • Better Management gives Better visibility into the network, computing, and storage

SDN protocols : OpenFlow, NETCONF. Its applications could be

  • Bandwidth on Demand or test networks.
  • Platform Virtualization for emulation/simulation of Network Nodes (BSS/MSS)
  • SDN based Application Layer Traffic Optimization
  • Intrusion Detection System that can interact with controller in terms of capturing packets, analyzing them for anomaly and sharing results real-time / near real-time with controller.
  • Software-Defined Branch and SD-WAN
  • IP Multi-Media Subsystem (IMS)
  • Session Border Control (SBC)
  • Video Servers
  • Voice Servers
  • Universal Customer Premises Equipment (uCPE)
  • Content Delivery Networks (CDN)
  • Network Monitoring
  • Network Slicing
  • Service Delivery
  • Network security functions such as firewalls, IDS, IPS, vRR, NAT 

Network functions virtualization (NFV)

NFV provides the basic networking functions and SDN assumes higher-level management responsibility to orchestrate overall network operations.


Network Function Virtualization (NFV) is a technology that allows network functions, such as routers, firewalls, and load balancers, to be implemented in software rather than hardware. This allows these functions to be run on standard servers or other off-the-shelf hardware, rather than dedicated appliances.

In an NFV system, network functions are implemented as software called Virtual Network Functions (VNFs). These VNFs are run on virtualization platforms, such as VMware or OpenStack, which allow multiple VNFs to be run on the same physical hardware. To use NFV, a service provider will first define the network functions that it needs in its network, and then create VNFs for each of these functions. These VNFs can then be deployed on virtualization platforms and used to build the service provider’s network.

One of the main benefits of NFV is that it allows service providers to be more flexible and agile in building and managing their networks. Because VNFs can be easily added, removed, or scaled up or down as needed, service providers can quickly respond to changes in demand or new business opportunities. NFV decouples network functions from proprietary hardware appliances (routers, firewalls, VPN terminators, SD-WAN, etc.) and delivers equivalent network functionality without the need for specialized hardware. And this way it helps service providers reduce costs, as they can use standard hardware rather than specialized appliances ( vendor lockins) to implement their network functions.

IMS Virtual Network Functions (VNFs)

IMS. Image Credits Unknown

A traditional appliance based IMS setup is dedicated to every single service, limited hardware/people/process leveraging.Some drawbacks of this approach is

  • Not suited for Heterogeneous Networks that are evolving – inflexible
  • Higher footprint cost per customer/service – high OPEX
  • New services would need a new dedicated network thus high maintenance cost for solios of operation

Virtualisation will help to redesign the network architecture. In an IMS (IP Multimedia Subsystem) system, VNFs might be used to implement a variety of functions, including:

  1. Call Session Control Function (CSCF): The CSCF is responsible for managing call sessions and routing signaling messages between the IMS network and other networks.
  2. Media Gateway Control Function (MGCF): The MGCF is responsible for translating between different media formats, such as voice and video, and for controlling media gateways that connect the IMS network to other networks.
  3. Home Subscriber Server (HSS): The HSS is a database that stores information about IMS subscribers, including their profiles and service subscriptions.
  4. Serving Gateway (S-GW): The S-GW is responsible for routing data packets between the IMS network and the user’s device.
  5. Policy and Charging Rules Function (PCRF): The PCRF is responsible for enforcing policy decisions and charging rules for IMS services.
  6. IP-SM-GW (SMS Gateway): The IP-SM-GW is responsible for routing SMS messages between the IMS network and other networks.
  7. Presence Server: The presence server is responsible for managing presence information (such as availability status) for IMS subscribers.
Multi-tenant subscriber and service environment. Keeping traffic local but with common services & management

Local Data Centre can rapidly build Network Intelligence rationalisation using Real Time Network Analytics on virtul STB, EPC, NAT, BRAS, PE, DHCP , PCRF etc. Core can be simplified and centralised with common and standard interfaces within core network and services to interact with OSS and BSS (standardized billing and fulfillment process).


OpenStack is an open-source virtualization platform. It enables service providers to deploy virtual network functions (VNFs) using commercial off-the-shelf (COTS) server hardware.  OpenStack is widely used in the telecommunications industry, as it allows service providers to build and manage large-scale cloud computing environments that can be used to deliver a wide range of services, including virtualized infrastructure, NFV, and containerized applications. Applying Openstack to virtualize networks :

  1. Infrastructure as a Service (IaaS): OpenStack can be used to create and manage virtualized infrastructure, including compute, storage, and networking resources. This allows service providers to offer users the ability to spin up and manage virtual machines, storage volumes, and other resources on demand.
  2. Network Function Virtualization (NFV): OpenStack can be used as a platform for virtualizing network functions, such as routers, firewalls, and load balancers, and running them on standard servers or other off-the-shelf hardware.
  3. Container orchestration: OpenStack can be used to manage containerized applications, allowing service providers to deploy and scale applications more quickly and efficiently.
Image Credits OpenStack Wiki
Example of  OpenStack implementation. Image source: OpenStack Wiki


More to read :

EEP (formely HEP) Extensible Encapsulation Protocol with HOMER

EEP duplicates and IP datagram and encapsulates and sends for remote relatime monitoring for SIP specific alerts and notifications . HEP is popular among many SIP servers including Freeswitch , Opensips, Kamailio, RTP engine as an external module .

  • intended for passive duplicated for remote collection
  • can be used for audit storage and analysis
  • does not alter the orignal datagram or headers

HOMER is Packet and Event capture system popular fpr VOIP/RTC Monitoring based on HEP/EEP (Extensible Encapsulation protocol)

SIP Server Integration

Homer and homer encapsulation protocl (HEP) integration with sip server brings the capabilities to SIP/SDP payload retention with precise timestamping better monitor and detect anomilies in call tarffic and events correlation of session ,logs , reports also the power to bring charts and statictics for SIP and RTP/RTCP packets etc. We read about sipcapture and sip trace modules in project sipcapture_siptrace_hep.

Both Kamailio and Opensips HEP Integration are structurally simmilar. In kamailio SIPCAPTURE [2] module enables support for –

● Monitoring/mirroring port
● HEP encapsulation protocol mode (HEP v1, v2, v3)

Figure Opensips Capturing ( credits http://www.opensips.org)

Figure showing Opensips integartion with external capturing agent via proxy agent ( which can be HOMER)

To achieve that, load and configure the SipCapture module in the routing script.

Snippets fro Kamailio Homer docker installation as a collector

git clone https://github.com/sipcapture/homer-docker.git
cd homer-docker
docker-compose build
docker-compose up

Outsnippets from screen while the installation takes place

Creating network "homer-docker_default" with the default driver
Creating volume "homer-docker_homer-data-semaphore" with default driver
Creating volume "homer-docker_homer-data-mysql" with default driver
Creating volume "homer-docker_homer-data-dashboard" with default driver
Pulling mysql (mysql:5.6)...
5.6: Pulling from library/mysql
Creating mysql ... done
Creating homer-webapp   ... done
Creating homer-cron      ... done
Creating homer-kamailio  ... done
Creating bootstrap-mysql ... done
Attaching to mysql, homer-webapp, bootstrap-mysql, homer-cron, homer-kamailio
homer-webapp | Homer web app, waiting for MySQL
homer-cron   | Homer cron container, waiting for MySQL
homer-kamailio | Kamailio, waiting for MySQL
bootstrap-mysql | Mysql is now running.
bootstrap-mysql | Beginning initial data load....
bootstrap-mysql | Creating Databases...
bootstrap-mysql | Creating Tables...
omer-kamailio | Kamailio container detected MySQL is running & bootstrapped
homer-kamailio |  0(22) INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
homer-kamailio |  0(22) WARNING: <core> [core/socket_info.c:1315]: fix_hostname(): could not rev. resolve
homer-kamailio | config file ok, exiting...
homer-kamailio | loading modules under config path: //usr/lib/x86_64-linux-gnu/kamailio/modules/
homer-kamailio | Listening on 
homer-kamailio |              udp:
homer-kamailio | Aliases: 
homer-kamailio | 
homer-kamailio |  0(23) INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module
homer-kamailio |  0(23) WARNING: <core> [core/socket_info.c:1315]: fix_hostname(): could not rev. resolve
homer-kamailio | loading modules under config path: //usr/lib/x86_64-linux-gnu/kamailio/modules/
homer-kamailio | Listening on 
homer-kamailio |              udp:
homer-kamailio | Aliases: 
homer-kamailio | 
homer-kamailio |  0(23) INFO: sipcapture [sipcapture.c:480]: parse_table_names(): INFO: table name:sip_capture
homer-webapp | Homer web app container detected MySQL is running & bootstrapped
homer-webapp | Module php5 already enabled

Capture tools

Dialoge module

storing dialogs in mysql DB , requires initialising mysql

#!define WITH_MYSQL
#!ifdef WITH_MYSQL
loadmodule "db_mysql.so"
#!ifdef WITH_MYSQL
# - database URL - used to connect to database server by modules such
#       as: auth_db, acc, usrloc, a.s.o.
#!ifndef DBURL
#!define DBURL "mysql://root:kamailio@localhost/kamailio"
loadmodule "dialog.so"
# ----- dialog params ------
modparam("dialog", "dlg_flag", 10)
modparam("dialog", "track_cseq_updates", 0)
modparam("dialog", "dlg_match_mode", 2)
modparam("dialog", "timeout_avp", "$avp(i:10)")
modparam("dialog", "enable_stats", 1)
modparam("dialog", "db_url", DBURL)
modparam("dialog", "db_mode", 1)
modparam("dialog", "db_update_period", 120)
modparam("dialog", "table_name", "dialog")

seting db_mode – synchronisation of dialog information from memory to an underlying database has following options
0 – NO_DB – the memory content is not flushed into DB;
1 – REALTIME – any dialog information changes will be reflected into the database immediately.
2 – DELAYED – the dialog information changes will be flushed into DB periodically, based on a timer routine.
3 – SHUTDOWN – the dialog information will be flushed into DB only at shutdown – no runtime updates.

note :

  • use the same hash_size while using diff kamailio to restore dialogs

database table for dialogue

  1. install mysql
  2. define root ( with db create permissions ) and user ( with database read wrote ) permission in kamctlrc
vi /usr/local/etc/kamailio/kamctlrc
  • Dialogue table schema *
name type size default null key extra attributes description
id unsigned int 10 no primary autoincrement unique ID
hash_entry unsigned int 10 no Number of the hash entry in the dialog hash table
hash_id unsigned int 10 no The ID on the hash entry
callid string 255 no Call-ID of the dialog
from_uri string 128 no URI of the FROM header (as per INVITE)
from_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog.
to_uri string 128 no URI of the TO header (as per INVITE)
to_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog.
caller_cseq string 20 no Last Cseq number on the caller side.
callee_cseq string 20 no Last Cseq number on the caller side.
caller_route_set string 512 yes Route set on the caller side.
callee_route_set string 512 yes Route set on on the caller side.
caller_contact string 128 no Caller's contact uri.
callee_contact string 128 no Callee's contact uri.
caller_sock string 64 no Local socket used to communicate with caller
callee_sock string 64 no Local socket used to communicate with callee
state unsigned int 10 no The state of the dialog.
start_time unsigned int 10 no The timestamp (unix time) when the dialog was confirmed.
timeout unsigned int 10 0 no The timestamp (unix time) when the dialog will expire.
sflags unsigned int 10 0 no The flags to set for dialog and accesible from config file.
iflags unsigned int 10 0 no The internal flags for dialog.
toroute_name string 32 yes The name of route to be executed at dialog timeout.
req_uri string 128 no The URI of initial request in dialog
xdata string 512 yes Extra data associated to the dialog (e.g., serialized profiles).

Siptrace module

SIPtrace module offer a possibility to store incoming and outgoing SIP messages in a database and/or duplicate to the capturing server (using HEP, the Homer encapsulation protocol, or plain SIP mode).

loadmodule "siptrace.so"
modparam("siptrace", "duplicate_uri", "sip:")
modparam("siptrace", "hep_mode_on", 1)
modparam("siptrace", "trace_to_database", 0)
modparam("siptrace", "trace_flag", 22)
modparam("siptrace", "trace_on", 1)

integrating iut with request route to start duplicating the sip messages


  • trace_mode * 1 – uses core events triggered when receiving or sending SIP traffic to mirror traffic to a SIP capture server using HEP 0 – no automatic mirroring of SIP traffic via HEP.


address in form of a SIP URI where to send a duplicate of traced message. It uses UDP all the time.

modparam("siptrace", "duplicate_uri", "sip:")

to check the duplicate messages arriving

ngrep -W byline -d any port 9060 -q

RPC commands

Can ruen sip trace on or off

kamcmd> siptrace.status on   

and to check

kamcmd> siptrace.status check

Store sip_trace in database

modparam("siptrace", "trace_to_database", 1)
modparam("siptrace", "db_url", DBURL)
modparam("siptrace", "table", "sip_trace")

where the sip_trace tabel description is

| Field       | Type             | Null | Key | Default             | Extra          |
| id          | int(10) unsigned | NO   | PRI | NULL                | auto_increment |
| time_stamp  | datetime         | NO   | MUL | 2000-01-01 00:00:01 |                |
| time_us     | int(10) unsigned | NO   |     | 0                   |                |
| callid      | varchar(255)     | NO   | MUL |                     |                |
| traced_user | varchar(128)     | NO   | MUL |                     |                |
| msg         | mediumtext       | NO   |     | NULL                |                |
| method      | varchar(50)      | NO   |     |                     |                |
| status      | varchar(128)     | NO   |     |                     |                |
| fromip      | varchar(50)      | NO   | MUL |                     |                |
| toip        | varchar(50)      | NO   |     |                     |                |
| fromtag     | varchar(64)      | NO   |     |                     |                |
| totag       | varchar(64)      | NO   |     |                     |                |
| direction   | varchar(4)       | NO   |     |                     |                |

sample databse storage for sip traces

select * from sip_trace;

| id | time_stamp          | time_us | callid  | traced_user | msg         | method | status | fromip                   | toip                     | fromtag  | totag    | direction |
|  1 | 2019-07-18 09:00:18 |  417484 | MTlhY2VmNDdjN2QxZGM5ZDFhMWRhZThhZDU4YjE0MGM |             | INVITE sip:altanai@sip_addr;transport=udp SIP/2.0
Via: SIP/2.0/UDP local_addr:25584;branch=z9hG4bK-d8754z-1f5a337092a84122-1---d8754z-;rport
Max-Forwards: 70
Contact: <sip:derek@call_addr:7086;transport=udp>
To: <sip:altanai@sip_addr>
From: <sip:derek@sip_addr>;tag=de523549
Content-Type: application/sdp
Supported: replaces
User-Agent: Bria 3 release 3.5.5 stamp 71243
Content-Length: 214

o=- 1563440415743829 1 IN IP4 local_addr
s=Bria 3 release 3.5.5 stamp 71243
c=IN IP4 local_addr
t=0 0
m=audio 59814 RTP/AVP 9 8 0 101
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendrecv                                                                                                                                                                                      | INVITE |        | udp:caller_addr:27982 | udp:sip_pvt_addr:5060   | de523549 |          | in        |

|  2 | 2019-07-18 09:00:18 |  421675 | MTlhY2VmNDdjN2QxZGM5ZDFhMWRhZThhZDU4YjE0MGM |             | SIP/2.0 100 trying -- your call is important to us
Via: SIP/2.0/UDP local_addr:25584;branch=z9hG4bK-d8754z-1f5a337092a84122-1---d8754z-;rport=27982;received=caller_addr
To: <sip:altanai@sip_addr>
From: <sip:derek@sip_addr>;tag=de523549
Server: kamailio (5.2.3 (x86_64/linux))
Content-Length: 0                                                                                                                                                                                                                                                                                                                                                                                                                                                           | ACK    |        | udp:caller_addr:27982 | udp:local_addr:5060   | de523549 | b2d8ad3f | in       |


Multi-Protocol Go HEP Capture Agent made   https://github.com/sipcapture/heplify

wget https://dl.google.com/go/go1.11.2.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.11.2.linux-amd64.tar.gz

move package to /usr/local/go

mv go 

Either add go bin to ~/.profile

export PATH=$PATH:/usr/local/go/bin

and apply

source ~/.profile

or set GO ROOT , and GOPATH

export GOROOT=/usr/local/go
export GOPATH=$HOME/heplify
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

installation of dependencies

go get

clone heplify repo and make



New OSS Capture-Agent framework with capture suitable for SIP, XMPP and more. With internal method filtering , encryption and authetication this does look very promising howevr since I have perosnally not tried it yet , I will leave this space TBD for future



Other include Sipgrep , HEPipe and nProbe


Multi-Protocol HEP Server & Switch in NodeJS. stand-alone HEP Capture Server designed for HOMER7 capable of emitting indexed datasets and tagged timeseries to multiple backends


node hepop.js -c /app/myconfig.js

PCAP monitoring -> Homer Server -> Notification and Fraud Prevention

A realtime monitoring and alerting setup fom homer can best safeguard on VoIP specific attacks and suspecious activity by early warning . Some list of attacks such as DDOS , SIP SQL injections , parser , remote manipulation hijacking as cell as resource enumeration are common ifor a cloud telephony provider.

Adiitionally homer provide session quality using varables that include [1]

SD = Session Defects

ISA = Ineffective Session Attempts

AHR = Average HOP Requests

ASR = Answer Seizure Ratio
[(‘200’ / (INVITES – AUTH – SUM(3XX))) * 100]

NER = Network Efficiency Ratio
[(‘200’ + (‘486′,’487′,’603’) / (INVITES -AUTH-(SUM(30x)) * 100]

HOMER Web Interface or Custom Dashboard

Some more visualization for inter team communication such as NOC team can include

Homer Integration with influx DB

time series Reltiem DB install

wget https://dl.influxdata.com/influxdb/releases/influxdb_1.7.7_amd64.deb
sudo dpkg -i influxdb_1.7.7_amd64.deb


 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

2019-07-19T07:03:04.603494Z	info	InfluxDB starting	{"log_id": "0GjGVvbW000", "version": "1.7.7", "branch": "1.7", "commit": "f8fdf652f348fc9980997fe1c972e2b79ddd13b0"}
2019-07-19T07:03:04.603756Z	info	Go runtime	{"log_id": "0GjGVvbW000", "version": "go1.11", "maxprocs": 1}
2019-07-19T07:03:04.707567Z	info	Using data dir	{"log_id": "0GjGVvbW000", "service": "store", "path": "/var/lib/influxdb/data"}

For Kamailio integration follow github instructions on https://github.com/altanai/kamailioexamples

References :

[1] https://www.kamailio.org/events/2013-KamailioWorld/13-Alexandr.Dubovikov-Homer-SIP-Capture.pdf

[2] HEP/EEP – https://github.com/sipcapture/hep

[3] kamailio sipdump module – https://www.kamailio.org/docs/modules/devel/modules/sipdump.html

[4] https://github.com/sipcapture/HEPop

[5] HOMER Big Data – https://github.com/sipcapture/homer/wiki/Homer-Bigdata

Energy Efficient VoIP systems

Data Centres are the concentrated processing units for the amazing Internet that is driving the technological innovation of our generation and has become the backbone of our global economy. DataCentres not only process , store and carry textual data rather a vast amount of computing is for multimedia content which could range from social media to, video streaming or VoIP calls. In this article let us analyze the energy effiiciency , carbon footprint and scope of improvements for a VoIP related data centre which hosts SIP and related RTC technology signalling and media servers and process CDRs and/or media files for playback or recordings.

Just like a regular IT datacentre , storage, computing power and network capacity define the usage of the server.Also unobstructed electricty of of paramount importance as any blackout could drop ongoing calls and lead to loss of revenue for the service provider not to forget the loss caused to parties engaged in call.

Increasing Power consumption by telecom Sector over the years

Global PPA(power purchase agreements) volumes by sector, 2009-2019, IEA, Global PPA volumes by sector, 2009-2019, IEA, Paris https://www.iea.org/data-and-statistics/charts/global-ppa-volumes-by-sector-2009-2019
source : CDP The 3% solution 2013 [19]

Typical VoIP Setup : Whether a cloud Infrstrcture provider of a hosted data centre , an aproximate number of 7 servers is required even for SME ( small to medium enterprises ) communication system and VoIP systems

  • 2 signalling servers primary and standby for HA ,
  • 2 media server for MCU of media bridges or IVR playback etc ,
  • 1 for CDR , logs or call analytics , stats and other supplementary operation
  • 1 for dev or engineering team .
  • 1 edge server could be API server or a gateway or laodbalancer.
Sample voIP system

VoIP solutions are more energy expensive, unless aggressive power saving schemes are in place

Comparison of energy efficiency in PSTN and VoIP systems [14]

While PSTN and other hybrid scenarios relied on audio only communication the embedded systems involved took great pain to make then energy efficient which is not really the case with all digital and software based VoIP.

Power Consumption

Mobile phone :  Typical smartphone with 4,000mAh ( 4 Ah) battery that gets 1 full cycle of usage a day. Daily consumption =4Ah*3.7V=14.8 Wh

Laptop : With  14–15″ screen, a laptop can draw 60 watts power in active use depending on model. Runing 8 hours a day can be 60 * 8 = 480 Wh ( 0.480 kWh) energy consumed in a day.

Desktop PC : Runing at 50-60 Hz frequency , can upto draw 200 W power in active use. For 8 hours energy usage 200 * 8 = 1600 Wh ( 1.6 kWh ) energy a day.

Server : Even though servers are virtual to the request maker , they caters to the request on the other end of the internet.

ServerPurposeServer CPU consumptionClientsClient CPU consumption
ApplicationHosts an application, which can be run through a web browser or customized client software.mediumAny network device with access.low
ComputingMakes available CPU and memory to the client. This type of server might be a supercomputer or mainframe.highAny networked computer that requires more CPU power and RAM to complete an activity.medium
DatabaseMaintains and provides access to any database.lowAny form of software that requires access to structured data.low
FileMakes available shared files and folders across a network.mediumAny client that needs access to shared resources.low
GameProvisions a multiplayer game environment.highPersonal computers, tablets, smartphones, or game consoles.high
MailHosts your email and makes it available across the network.mediumUser of email applications.low
MediaEnables media streaming of digital video or audio over a network.highWeb and mobile applications.high
PrintShares printers over a network.lowAny device that needs to print.low
WebHosts webpages either on the internet or on private internal networks.mediumAny device with a browser.medium
CPU consumption of various server types and their clients

Typically runing on 850 Wh ( 0.850 kwh ) of energy in an hour and since server are usually up 24*7 that totals to

0.850 * 24 = 20.4 kWh a day [2].

VoIP System ( 7 VM’s) : For a setup of 7 VM’s ( could on a the same PM), total energy consumed in a day

20.4 * 7 = 142.8 kWh.

Data centre: The data centre building consists of the infrastructure to support the servers, disks and networking equipment it contains. However, for simplicity, I will only use the consumption of servers and ignore the cooling units, networking, backup batteries charging, generators, lightning, fire suppression, maintenance etc.

High tier DC can have 100 Megawatts of capacity having each rack was using 25 kW of power in a 52U Rack. 100,000 kW / 25 kW = 4,000 racks * 52(U) = 208000 1U servers. This number scales down depending on how much energy each server uses and idle servers.

Total energy 100,000 kW * 24 hours = 2400000 kWh

Carbon Footprint

Carbon footprint in the context of this article refers to the amount of greenhouse gas ( consisting majorly of Co2) caused by electricity consumption. The unit is carbon emission equivalent of the total amount of electricity consumed kg CO2 per kWh.

In doing this calculation I have assumed 0.233 kg CO2 per kWh which could be less or more depending on the generation profile of the electricity provider as well as the heat produced by the machine.

Laptop: Aside from the production which could be 61.4 kg (135.5 lbs) of Co2, a 60W laptop will produce 0.112 kg co2 eq per day.

Desktop PC: Aside from production cost and heating, the GWG and co2 eq emission from running a desktop for a day ( 8 hours) produces 1.6* 0.233 = 0.3728 kg CO2 per kWh

Server : 20.4 * 0.233 = 4.7532 kg CO2 per kWh per day .

VoIP System ( 7 VM’s): Again ignoring the GWG emission of associated components, 142.8 * 0.233 = 33.2724 kg CO2 per kWh per day. It is to be noted that DC’s ( datacentres) use the term PUE ( Power Usage Effectiveness) to showcase their energy efficiency and energy efficiency certification uses the same in ratings.

Data centre: electrical carbon footprint( approximate calculation not counting the cooling, infra maintenance, lightning and possibly idle servers in datacentre) is 2400000 * 0.233 = 559200 kg CO2 per kWh per day

It is to be noted that a common figure should not be extrapolated like this to derive carbon emission. The emission depends on the fuel mix of the electricity generation as well as the life cycle assessment (LCA) of carbon equivalent emission. Countries with heavy reliance on renewables have lower co2 footprint per kWh ~ 0.013 kg co2 per kWh Sweden while others may have higher such as 0.819 kg CO2 per kWh Estonia [1].

Flatten the Curve from Tech and Internet usage

Rack servers tend to be the main perpetrators of wasting energy and represent the largest portion of the IT
energy load in a typical data center.

A decade ago, small enterprise IT facilities were quick to create data centres for hosting applications from hospitals, banks, insurance companies. While some of these is likely to have been upgrade to shared server instances runing on IaaS providers, most of them are still serving traffic or stays there for the lack of effort to upgrade.

With the advancement in p2p technlogies such as dApps , bitcoion network , p2p webrtc streaming , more edge computed ML continue to create disruptions in existing trend , most likely to result in in many fold increase in consumption.

According to the Cambridge Center for Alternative Finance (CCAF), Bitcoin currently consumes around 110 Terawatt Hours per year — 0.55% of global electricity production

Harward Business Review [12]

“the emissions generated by watching 30 minutes of Netflix (1.6 kg of CO2) is the same as driving almost four miles.” 

EnergyInnovation [13]

Cloud Computing and Energy efficiency

Cloud computing ( SaaS, PaaS , IaaS and also CPaaS) minimize power consumption and consequently IT costs via virtualization, clustering and dynamic configuration.

With cloud infrastrcture vendors such as Amazon , Google , microsoft .. and their adoption of energy efficiency computing and credible transparency has alleviated some of the stress that could have been made if onsite self – hosted data centres were used as often in mainstream as a decade ago.

Even as cloud providers gives on -demand access to shared resources in large scale distributed computing , the ease of getting on board has inturn created a surge in cloud hosted online applications consequently high power consumption, more operation costs and higher CO2 emissions.

Components of energy Consumption in Data Centre

As shown CPU, Memory, and Storage incur 45% of the costs and consume 26% of the total energy , however power distribution and colling cost 25% but consumer >50% of total energy.

Energy forcast for Data Centres

As reported by nature [3] the widely cited forcasts suggested thte total electrcity demand of ICT ( Informatioin and Communication technology ) will accelerate and while consumer devices such as smart TV , laptops and mobile are becoming energy effcient , the data centres and network devices will demand bigger portions. Reported in 2018 , 200 Twh( terawatt hours) of energy was being consumed by data centers . Although there are no figures for the telecom or specifically IP cloud telephony , the assumption that enormous multimedia data flows in every session is enouogh to assume the figure must be huge.

Energy eficiency in data centres have also been the subject of many papers and studies. Many of the tech advancements and measures have so far been able to keep the growth in energy requirnments by tech sector to a linear/ flat one.

past and projected growth rate of total US data center energy use from 2000 until 2020. It also illustrates how much faster data center energy use would grow if the industry, hypothetically, did not make any further efficiency improvements after 2010. (Source: US Department of Energy, Lawrence Berkeley National Laboratory)

Some noteworthy innovations made in Data centre for energy efficiency include –

  1. Star efficiency requirnments
  • Average server utilization
  • Server power scaling at low utilization
  • Average power draw of hard disk drives
  • Average power draw of network ports
  • Average infrastructure efficiency (i.e., PUE)

PUE = Total Facility power / IT equipment power

Standard 2.0, Good 1.4 , Better 1.1

Low PUE indicates greater efficicny since more power would then be used by It gear . Idealistically 1 should be the perfect score where all power was used only by the IT gears.

2. Optimizing the cooling system which takes a lot of focus is also not touched upon here but can be understood in great detail from very many sources including one here on how google uses AI for cooling its Datacentres [6]

3. Throttle-down drive ,a device that reduces energy consumption on idle processors, so that when a server is running at its typical 20% utilization it is not drawing full power

Energy efficiency is vital to not only productivity and performance but also to carbon neutral tech and economy. There is ample scope to designing energy efficient applications and platfroms. Some approaches are described below:

Energy Efficiency in VoIP Architecture and design

Low Energy consumption not only lowers operating cost but also helps the enviornment by reducing carbon emission.

1.Server Virtualization

By consolidating multiple independant servers to a single underlying physical server helps retain the logical sepration while also maintaining the energy costs and maximizinng utilization . VM’s( Virtual machines) are instances of virtaulized portions on the same server and can be independetly accesed using its own IP and network settings.

To reduce electricity usage in our labs and data centers, we use smart power distribution units to monitor
our lab equipment. We increase server utilization by using virtual machines. Our Cisco Customer
Experience labs use a check-in, check-out system of automation pods to allow lab employees to set up
configurations virtually and then release equipment when they are finished with it.

Cisco 2020 Environment Technical Review [20]

Models to place VMs on PM ( physical machine ) have been proposed by Dong et al[8] , Huang[9]  ,Tian et al [10]

2.Decommissioning old / outdated servers

While this is the most obvious way to increase efficiency , it is also the toughest since legacy applications or a small portion of it may be running on a server that service providers are not keen on updating or updates do not exist and it is past end of life yet somehow still in use. It is important to identify such components. Check if maybe an old glassfish or bea weblogic SIP servlet server needs updating and/or migration !

3.Plan HA ( high availability ) efficiently

Redundant servers take only if at all any , partial loads so they can be activated in full swing when failover happens in other server. With quick load up times and forward looking monitoring , the analyzers can monitor logs for upcoming failure or predictable downtime and infra script can bring up pre designed containers in seconds if not minutes. It isn’t wise to create more than 1 standby server which does no essential work but consumes as much power.

4.Consolidate individual applications on a Server

Map the maximum precitable load and deduce the percentage comsuption with teh same . In view of these figures it is best to consolidate applications servers to be run on a single server . A distributed microservice based architecture can also support consolidation by runing each major application in its own dockerized container. Consolidation ensures that

  • All data can be stored and accessed centrally, which reduces the likelihood of data duplication.
  • while a server is drawing full power , it is also showing relataible utilization.
  • Single point to prevent intrusion , provide security and fix vulnerabilities against malware like ( ransomware , viruses , spyware , trojans)

5.Reduce redundancy

While it is a common practise to store multiple copies of data such as CDR ( call detail records ) and archiev historical logs for later auditing , it is not the most energy efficient way since it ends up wasting stoarge space. It is infact a better approach to skim only the crtical parts and diacard the rest and definetely implement background tasks to compress the older and less referenced logs.

6.Power management

Powering down idle server or putting unused server to sleep is an effective way to reduce operating power but is often ignored by the IT department in view of risking slower performance and failure in call continuity in case a server does go down. However power management leads to potential energy savings and should be weighted accordingly.

7.Common Storage such as Network Attached Storage

Power consumption is roughly linear to the number of storage modules used. Storage redundancy needs to be
right-sized to avoid rapid consumption of avaible storage space , CPU cycles to refer and index them, its associated power consumption [7].

The process of maximizing storage capacity utilization by drawing from a common pool of shared storage on need baisis also allows for flexixbility.

It is sensible to take the data offline thereby reducing clutter on production system and make the existing data quickly retrievable.

8.Sharing other IT resources

Central Processing Units (CPU), disk drives, and memory optimizes electrical usage. Short term load shifting combined with throttling resources up and down as demand dictates improves long term hardware energy efficiency. [7]

Hardware based approaches such as energy star rating, air conditoning , placement of server racks , air flow , cabling etc have not been touched upon in this article they can be read from energystar report here [5] .

9. DMZ / Perimeter network

The perimeter network (also known as DMZ, demilitarized zone, and screened subnet) is a zone where resources and services accessible from outside the organization are available. Often used as barrier between internal secure green zone within company and outside partners / suppliers such as external organization gateways.

  • Load balancers
  • API gateways
  • SBC ( Session Border controllers)
  • Media Gateways

Ways to cut down on CPU consumption in DMZ machines

  1. Scrutinize incoming traffic only , trust outgoing traffic .

2. Use hardware / network firewalls to monitor and block instead of software defined ones . Hardware firewall can be a standalone physical device or form part of another device on your network. Physical devices like routers, for example, already have a built-in firewall. 

Other types of firewalls

  • Application-layer firewalls can be a physical appliance, or software-based, like a plug-in or a filter. These types of firewalls target your applications. For example, they could affect how requests for HTTP connections are inspected across each of your applications.
  • Packet filtering firewalls scrutinize each data packet as it travels through your network. Based on rules you configure, they decide whether to block the specific packet or not. For example firewalls can block SSH/RDP for remote management.
  • Circuit-level firewalls check whether TCP and UDP connections across your network are valid before data is exchanged. For example, this type of firewall might first check whether the source and destination addresses, the user, the time, and date meet certain defined rules.
  • Proxy server firewalls secure the traffic into and out of a network by monitor, filter, and cache data requests to and from the network.

Energy Efficiency in VoIP Applications and algorithms

In theory, energy efficient algorithms would take less processing power , run fewer CPU cycles and consume less memory. For the experiments with WebRTC and SIP VoIP systems CPU performance can be reliable factor to consider for carbon emissions . Here is list of approaches to include energy as of the parameters in programing for RTC applications.

  1. Take advanatge of Multi Core applications

Multi-core processor chips allow simultaneous processing of multiple tasks, which leads to higher efficiency. Same power source and shared cooling leads to better efficiency . It is the same logic which applied to consolidating one power supply for a rach isntead of individual power supply to each servers on rack.

2. Reduce Buffering

Input/Output buffer pile up comuted packets or blocks which will come inot use in near future but may be discarded all together in event of skip or shutdown. For example in case of video on Demand ( VoD) , a buffered video of 1 hour is of not much use if viewer decides to cancel the video session after 10 minutes .

3. Optimize memeory access algorithms

4. Network energy Management to vary as per demand

The newer generations of network equipment pack more throughput per unit of power. There are active energy management measures that can also be applied to reduce energy usage as network demand varies. In a telecoomunication system , almost always a tradeof between power consumption and network performance is made.

  1. Quick switching of speed of the network to match the amount of data that is currently transmitted. A demand following streaming session will maingtain the QoS , avoid imbalance while also reducing power consumption.

2. Avoid sudden burst and peaks and/or align them with energy availaibility .


  • computational performance (i.e., computations/second per server),
  • electrical efficiency of computations (i.e., computations per kWh),
  • storage capacity (i.e., TB per drive), and
  • port speeds (i.e., Gb per port)

5. Task Scheduling algorithms

Some recent researched frameworks and models take Co2 emission into prespective , while allocating resources according to queuing model. The most efficient ones not only bring down the carbon footprint but also the high operating cost [11].

Scheduling and monitoring techniques have been applied to achieve a cost effective and power-aware cloud environment by reducing the resource exploitation

6. Centralised operation – RTP topology ( Mesh , MCU and SFU)

Instead of operating many servers at low CPU utilization, at edge of client’s end, combines the processing
power onto fewer servers that operate at higher utilization.

Modern machine learning programs are computationally intensive, and their integration in VoIP systems for tagging , sentiment analysis , voice quality analysis is increasingly adding additional strain already heavy processing of media server in transcoding and multiplexing .

Media Server using SFU ( Selective Forwarding unit) to transmit mediastrem

As an example a SFU client sends one upstream but receives 4 downstreams which reduces the load on server but increases on clients .

7. Distributing workload based on server performance

Aggregating tasks and runing them as Serverless , asynchronous jobs instead of standalone processes is very efficient way to cut down idle runing wastage. Additioally catagorizing server workloads based on server performance can also reduce power consumption by using idle servers efficiently. Thermal aware workload distribution also helps reducing power consumption and consequently electricity consumption in cooling .

8 . Reduce reauthetication and challemge response mechanism when it can be avoided.

There exists multiple modes to authenticate and authorize users and application access to server content

Over the network

  • password based auth ,
  • third party based auth ( Oauth)
  • 2 factors authetication( phone/sms based) ,
  • multi factor auth ( sms / email / other media) ,
  • token auth ( custom USB device/ smart card ) ,
  • biometric auth (physical human charecteristics / scanners ) ,
  • transactional auth ( location , hour of day , browser/ machine type)

Computer recognition authentication

  • Single sign-on

Authentication protocols

  • Kerbos – Key Distribution Center (KDC) using a Ticker gransting Server ( TGS)

A callflow involves AAA while creating the session and may require occsional re authetication to reafform the user is intended one. Doing re-authtication too often increases the power consumption and can be countered by caching and timeout mechanism.

Point of presence and handover using Carbon footprint in different demographics

  1. Include Carbon emission from Datacentre in condieration before engaging the server in call path from load balancer gateway

2. Use point of presence ( PoP) for server according to their carbon emission factor in the demography .

Us states carbon emission rate from electricity generation (2018 report ) Source : [16]
UK greenhouse gas reporting source : [17]

Energy Efficiency in WebRTC browser applications and native applications

In a Video conferencing the over browser, WebRTC has emerged as te the default standard . The efficiency of sch webrtc browser based video conferencing web applications can be enhanced in the following ways :

1.Use VoIP Push Notifications to Avoid Persistent Connections

2. Voice Activity detection ( Mute the spectators ) and join with video true , audio false for attendeees

Energy efficiency in VoIP phones

If all eligible VoIP phones sold in the United States were ENERGY STAR certified, the energy cost savings would grow to more than $65 million each year and 1.2 billion pounds of annual greenhouse gas emissions would be prevented, equivalent to the emissions from more than 119,000 vehicles.

Energystart [15]

Low-energy-consuming embedded hardware on most phones keep the average consumption low . A analog phone can consume power between 0.07 W to 9.27 W while a VoIP phone can consume 0.1W to 3.5 W of standby power.

Off mode power is often less than standby power since phone is on low power model during idle hours such as night . According to energy star Sund transmission mechnism also plays a key role and hybrid phones consume more power.

Power allowance (W) for each of the below features of the device:

  • 1.0 watt for Gigabit Ethernet
  • 0.2 watt for Energy Efficiency Ethernet 802.3az compliant Gigabit Ethernet

Additional proxy incentive(W) for the ability to maintain network presence while in a low power mode and intelligently wake when needed

  • 0.3 watt for base capability
  • 0.5 watt for remote wake

Government bodies and groups to track Energy efficiency of Telecom and IP telephony

  • Alliance for Telecommunications Industry Solutions (ATIS)
  • Telecommunications Energy Efficiency Ratio (TEER)
  • measurement method covers all power conversion and power distribution from the front end of the
    system to the data wire plug, including application-specific integrated circuits (ASICs).
  • European Telecommunications Standards Institute (ETSI)
  • International Telecommunication Union (ITU)
  • U.S. Department of Energy (DOE), Environmental Protection Agency (EPA)

External links

Amazon : https://sustainability.aboutamazon.com/environment/sustainable-operations/carbon-footprint

Cisco : https://www.cisco.com/c/dam/m/en_us/about/csr/esg-hub/_pdf/2020_Environment_Technical_Review.pdf

3CX : https://askozia.com/voip/how-can-i-save-energy-with-green-voip-and-my-ip-pbx/

The purpose of the article is to raise awareness about carbon footprint from application programs to archietcture designs techniques to data centres and commuulative performance. It gives a direction to stakeholders (customers , programmers , architects , mangers , … ) to choose less carbon emitting approach whenever possible since every bit counts to help the environment.


[1] rensmart.com https://www.rensmart.com/Calculators/KWH-to-CO2

[2] https://www.zdnet.com/article/toolkit-calculate-datacenter-server-power-usage/

[3] nature : https://www.nature.com/articles/d41586-018-06610-y

[4] Center of Expertise for Energy Efficiency in Data Centers at the US Department of Energy’s Lawrence Berkeley National Laboratory in Berkeley, California. https://datacenters.lbl.gov/

[5] energy Star – https://www.energystar.gov/sites/default/files/asset/document/DataCenter-Top12-Brochure-Final.pdf

[6] https://www.blog.google/inside-google/infrastructure/safety-first-ai-autonomous-data-center-cooling-and-industrial-control/

[7] https://www.energy.gov/sites/default/files/2013/10/f3/eedatacenterbestpractices.pdf

[8] Yin K, Wang S, Wang G, Cai Z, Chen Y. Optimizing deployment of VMs in cloud computing environment. In: Proceedings of the 3rd international conference on computer science and network technology. IEEE; 2013. p. 703–06.

[9] Huang W, Li X, Qian Z. An energy efficient virtual machine placement algorithm with balanced resource utilization. In: Proceedings of the seventh IEEE international conference on innovative mobile and internet services in ubiquitous computing; 2013. p. 313–19.

[10] W. Tian, C.S. Yeo, R. Xue, Y. Zhong Power-aware schedulingof real-time virtual machines in cloud data centers considering fixed processing intervalsProc IEEE, 1 (2012), pp. 269-273

[11] H. Chen, X. Zhu, H. Guo, J. Zhu, X. Qin, J. Wu Towards energy-efficient scheduling for real-time tasks under uncertain Cloud computing environmentJ Syst Softw, 99 (2015), pp. 20-35

[12] https://hbr.org/2021/05/how-much-energy-does-bitcoin-actually-consume

[13] https://energyinnovation.org/2020/03/17/how-much-energy-do-data-centers-really-use/

[14] F. Bota, F. Khuhawar, M. Mellia and M. Meo, “Comparison of energy efficiency in PSTN and VoIP systems,” 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), 2012, pp. 1-4, doi: 10.1145/2208828.2208834. https://citeseerx.ist.psu.edu/viewdoc/download?doi=

[15] https://www.energystar.gov/products/office_equipment/voice_over_internet_protocol_voip_phone

[16] egrid summary table 2018 for carbon emission rate in Us states : https://www.epa.gov/sites/default/files/2020-01/documents/egrid2018_summary_tables.pdf

[17] UK greenhourse gas reporting – https://www.gov.uk/government/publications/greenhouse-gas-reporting-conversion-factors-2018

[19] http://assets.worldwildlife.org/publications/575/files/original/The_3_Percent_Solution_-_June_10.pdf?1371151781

[20] https://www.cisco.com/c/dam/m/en_us/about/csr/esg-hub/_pdf/2020_Environment_Technical_Review.pdf

[21] It’s Not Easy Being Green by Peter Xiang Gao, Andrew R. Curtis, Bernard Wong, S. Keshav
Cheriton School of Computer Sc https://dl.acm.org/doi/pdf/10.1145/2342356.2342398

SIP Trunks

With the dawn of IP telephony service and cloud communication platforms in recent years, the SIP has caught the attention of many application developers. while SIP is essentially a session management multimedia signalling protocol its generic stack can be used for various use cases from IoT camera streaming sessions to call centres even auto calling for purpose of sharing OTP(one-time password) etc. In this I will highlight the usecase of large calltraffic and the use of SIP trunks.

SIP based trunking can provide significant cost savings and business process improvements by supporting the native SIP protocol that controls the VoIP systems used in call centres and business communication platforms.

  • (+) unified communication
  • (+) lower telco network
  • (+) streamline operations for multicountry/ geography

Traditional trunk call

In the past, telephone systems used trunk lines to connect different parts of the network. Trunk lines were long-distance communication lines that connected telephone exchanges in different locations. Trunk calls were calls made over these trunk lines. They were typically used for long-distance communication, as they allowed calls to be made between exchanges that were geographically far apart. Trunk calls were generally more expensive than local calls, as they involved the use of long-distance communication lines.

Traditional trunk calls operated like a circuit with local loops , trunk lines and switching offices. The telco acted as carriers that sell of lease communication lines to facilitate communication over long distances using local exchanges and interexchange carriers.

In the early days of telephone systems, trunk lines were typically made of copper wires or cables. Later, trunk lines were replaced with satellite links and fiber optic cables, which provided higher capacity and faster transmission speeds. Today, with the widespread adoption of VoIP (Voice over Internet Protocol) technology, many telephone systems no longer use trunk lines in the traditional sense. Instead, they use virtual connections, such as SIP trunks (Session Initiation Protocol trunks), which allow organizations to make and receive phone calls over the internet. SIP trunks are generally more flexible and cost-effective than traditional trunk lines, and do not require the installation of additional hardware.

Voice trunk Lines in SS7 based Next Generations IN networks used media gate ways and MGCP, H323 protocols

Image credits : Unknown

SIP trunk (older) systems

SIP is a protocol that is commonly used in VoIP (Voice over Internet Protocol) systems to set up, modify, and terminate sessions that involve the exchange of audio, video, and other media. SIP Trunks are virtual voice channels (or paths) which deliver media (voice, video, IM) over an IP network to a designated endpoint. SIP Trunks can be thought of as a virtual line or concurrent call path. SIP Trunks are delivered over an IP connection like Tier One Carrier or Voice Optimized Recommended or UDP. SIP Trunk may be over-subscribed ie can have more numbers than trunks for example G.711 – 17 calls over T1 or G.729a – 45 calls over T1. SIP Trunking can be provided as one-way or two-way lines. Direct Inward Dialing (DIDs) can be used for toll-free number service.

Centralized SIP Trunk Model

Centralized SIP Trunk Model is designed to aggregate all calls from all sites and funnels them into a single entry point. Each site has its own SIP trunk termination of the appropriate capacity for calls to and from that site.

Such SIP trunks models offer benefits in three significant areas:

  1. Cost savings, arising from many factors including reduced telecommunications network charges and streamlined operations.
  2. Unified communications, where voice, video, email, text and other messaging technologies are combined to provide greater flexibility for users by enabling new ways to transfer information and manage connectivity. Many SIP trunk providers offer advanced features such as call forwarding, call waiting, and voicemail, which can improve the overall communication experience for employees.
  3. Business Continuity and Disaster Recovery, where the right physical configuration in conjunction with intelligence in the network can be leveraged to provide uninterrupted communications and alternative means to stay connected for employees in the event of system bottlenecks or failures.

SIP trunking is an IP-based alternative to ISDN trunking services

SIP Trunking is a low-cost IP-based alternative to ISDN offering for medium to large businesses needing upwards of several tens of channels in a trunk, often across multiple sites, with IP VPN access. 

  • (+) Optimal utilization of bandwidth by delivering both data and voice in the same bandwidth

A telephony company such as a telecom service provider may expose SIP trunks as a means of connecting inbound or outbound calls through its telecom network. For the integrator ( or the service provider managing the other enedpoint of the call leg ) it can be no different that a traditional phone call.The SIP signalling however is useful for enabling better session understaning using standard SIP requests and responses as compared to SS7 or PRI lines.

Planning to set up SIP trunk

•Cost analysis
•Assess traffic volumes and patterns
•Assess network design implications
•Emergency call policy
•Define production user community phases
•Define user community to pilot
•Evaluate future new services
•Assess security precautions

The steps to set up a SIP trunk connection may vary depending on the specific provider and the equipment being used. However, here are some general steps that are often involved in the process:

  1. Choose a SIP trunk provider: Research and compare different SIP trunk providers to find one that meets your organization’s needs and budget.
  2. Sign up for a SIP trunk account: Follow the provider’s instructions to sign up for a SIP trunk account. This may involve completing an online form, providing contact information and payment details, and selecting the desired features and services.
  3. Configure your VoIP phone system: Consult your VoIP phone system’s documentation to learn how to configure it to work with a SIP trunk. This may involve specifying the SIP trunk’s IP address and port number, as well as any authentication credentials that are required.
  4. Test the connection: Once the SIP trunk is set up, it is a good idea to test the connection to ensure that it is working properly. Make a few test calls to verify that the connection is functioning as expected.
  5. Use the SIP trunk: Once the SIP trunk is set up and tested, it can be used to make and receive calls using your VoIP phone system.

SIP Trunking platform has to integrate with multiple networks seamlessly. Components for setting up a SIP trunking system requires atleast these

  • Compliance with standrad signalling protol, like SIP.
  • SBC( Session Border Controller ) facing the private PBX
  • Gateway for specific endpoints such as PSTN gateway , public internet gateway etc
  • L3/L4 Layer switches
  • Telco operator lines
  • Codec support

Kamailio is an open-source SIP (Session Initiation Protocol) server that can be used to create a SIP trunk. Kamailio can be PBX used to connect different locations within an organization, enabling employees to communicate with each other using their VoIP phones. Kamailio can also be used to set up a SIP trunk in a number of ways. For example, it can be used to connect an organization’s VoIP phone system to the public telephone network, allowing employees to make and receive calls from outside the organization.


Kamailio is a highly flexible and customizable SIP server that can be configured to meet the specific needs of an organization. It offers a range of features and functionality, including call routing, load balancing, and security. Kamailio is a popular choice for organizations that want to set up a SIP trunk because it is open-source and can be customized to meet their specific needs.

Features of SIP trunking

SIP trunk with VoIP phone systems are often preferred over traditional phone systems because they are generally more flexible and cost-effective. They allow employees to make and receive calls from any device with an internet connection, including desk phones, smartphones, and laptops. They can be easily scaled up or down to meet changing communication needs and do not require the installation of additional physical hardware. Some factors to consider when evaluating SIP trunks include:

  1. Cost: It is important to compare the costs of different SIP trunk providers and consider factors such as monthly fees, per-minute charges, and any additional fees for features or services.
  2. Coverage: Make sure that the SIP trunk provider has coverage in the areas where your organization needs to make and receive calls.
  3. Quality: The quality of a SIP trunk can vary greatly depending on the provider and the connection. Be sure to research the provider’s reputation for call quality and reliability.
  4. Features: Different SIP trunk providers may offer different features, such as call forwarding, call waiting, and voicemail. Consider which features are important to your organization and make sure that the SIP trunk provider offers them.
  5. Customer support: It is important to choose a SIP trunk provider that offers reliable customer support in case you experience any issues with your service.

Other features that are good to have is integration to existing backend for OSS/BSS stack. Some of the feature set for a carrier grade SIP trunking solution are listed here

  • Inbound and outbound trunks
  • Number Import/Export
  • Security
    • Dynamic registeration of users
    • Authentication and Authorization
    • Security (SRTP)
  • Cost Savings
    • Low cost for large traffic volumes instead of charges of call per second
    • CDR for tracing and monitoring call failures
  • Clear media stream ( no robotic or choopy audio). Good MOS score
  • realtime traffic monitoring to rule out bad players.
  • Inbound and Outbound call – Call Establishment, Rejection, Termination
  • DDI: Direct Dialling-In ranges can be provided on the SIP Trunk
  • CLIP(Calling Line Identification Presentation )/CLIR Calling Line Identification Presentation Restriction) for Inbound and Outbound
  • Call Management
    • AUTH Code Screening
    • Combined Screening
    • Data Call Screening
    • Local Screening
    • Anonymous Call Rejection: Anonymous Call Rejection
    • Incoming Call Barring: bar receiving of calls to certain extensions
    • Outgoing Call Barring: Restrict calls to certain numbers
    • Incoming Call Diversion – unconditional, busy, and unreachable
    • Call Admission Control: Call Admission Control (CAC) is a mechanism to restrict the number of simultaneous sessions (calls) 
    • Incoming Call Diversion (DestNo not reachable, CAC exceeded, unconditional)
  • Geographic and Non-Geographic Number Support
  • Multiple Codec Support
  • Emergency Calling: Emergency Calls are routed on a priority basis irrespective of the customer’s available channel

Trunking inbound services voice can be used to support contact centres, conferencing, number translation services etc. Regulatory requirements for the operation of the customer in the PSTN of respective countries must be met with Country Specific Emergency Calling support Enhanced feature set for SIP trunking should include the features of the SIP Trunking with Multicountry support

  • Enhanced CAC(Call Admission Control) – Directional & Network
  • Global Dial Plan Support
  • Proactive MCID (Malicious CallerId) Identification and tracing
  • Call Distribution(CD)
  • Intelligent Routing involving machine learning and constant feedback
    • Origin Based Routing
    • Menu Routing
    • Origin Dependent Routing (ODR)
    • PIN Routing
    • Dynamic Route Select
    • Time-Dependent Routing (TDR)
    • Uniform Load Distribution(ULD)
    • International Routing
    • Mobile Routing
    • Payphone Routing
  • Product Association

Ultimately, the most useful SIP trunk for your organization will depend on your specific needs and budget. It is a good idea to research and compare different SIP trunk providers to find the one that best meets your organization’s needs.

Future of SIP trunks

SIP trunking systems are likely to continue to be an important part of the telecommunications landscape in the future. As more and more organizations adopt WebRTC or SRT based VoIP (Voice over Internet Protocol) technology for their phone systems, the demand for SIP trunks is likely to continue to grow. One trend that is expected to shape the future of SIP trunking is the increasing adoption of cloud-based communication systems. As more organizations move their communication systems to the cloud, they are likely to turn to SIP trunks as a way to connect their phone systems to the public telephone network and enable remote communication. Another trend that is expected to impact the future of SIP trunking is the increasing adoption of 5G technology. 5G networks offer faster speeds and lower latency, which may make it possible to use SIP trunks for real-time communication applications such as interactive and/or immersive video conferencing.

Why Lua is a good choice for Scripting call configurations in SIP servers like Kamailio and Freeswitch

Programing in SIP servers enables the IP telephony provider to add complex control that is difficult to realise with simple dialplan XML and IVR menus. These are best handled by using a program that is compiled with the telecom application server and invoked by SIP requests or responses in the session. This may include

  • using policy control or dynamic input to control call routing or blacklisting
  • transcription for voicemail
  • media file playback with dynamic text to speech ….so on.

Common Freeswitch , opensips , Kamailio and Astersik suppored programing engines may include python, java, c++, javascript. Opensips and kamailio also include XML_RPC, HTTP API and Websockets as additional means of adding call control login in telephony sever.

Kamailo modules
Opensips modules
Freeswitch modules

Lua (https://www.lua.org) is a small, powerful and lightweight scripting language, mostly used for embedded and gaming use cases. Among many programming engines supported by FreeSWITCH and Kamailio, Lua is very handy to add business logic to call control by integrating with the telecom server.

Form the a multiple choice, Lua is the prefered language for scripting in SIP server which is due to

  1. Does not requie recompilation
    • Saves on the effort to resatrt the freeswitch server while loading updated script
    • this in turn saves service disruption for the time server woulve taken to shutdown and restart
  2. Can ve sync or asyn
    • lua : runs in current thread and waits for script completion
    • luarun : runs in seprate thread and returns immediately

Freeswitch Lua Integration

To load the program

<action aplication="lua" data="mainprog.lua">

1. In the program, we could get status and print to console log

local api = freeswitch.API()
local status = api:execute("status")

2. we could also check is session is active and play a file inot the call

if session:ready() then

3.Program to answer call , play file and hangup using session class methods

-- Answer call, play a prompt, hang up

-- Create a string with path and filename of a sound file
pathsep = '/'
-- Windows users do this instead pathsep = ''
prompt ="ivr" ..pathsep .."ivr-welcome_to_freeswitch.wav"

-- Play the prompt
freeswitch.consoleLog("WARNING","About to play '" .. prompt .."'n")

-- Hangup
freeswitch.consoleLog("WARNING","After hangup")


[INFO] mod_dialplan_xml.c:637 Processing altanai <altanai>->5000 in context public
EXECUTE sofia/internal/altanai@x.x.x.x lua(/etc/freeswitch/dialplan/lua_session_answer_prompt_hangup.lua)
[DEBUG] switch_channel.c:3781 (sofia/internal/altanai@x.x.x.x) Callstate Change EARLY -> ACTIVE
[WARNING] switch_cpp.cpp:1376 About to play 'ivr/ivr-welcome_to_freeswitch.wav
[DEBUG] switch_ivr_play_say.c:1942 done playing file /usr/share/freeswitch/sounds/en/us/callie/ivr/ivr-welcome_to_freeswitch.wav
[DEBUG] switch_cpp.cpp:731 CoreSession::hangup
[NOTICE] switch_cpp.cpp:733 Hangup sofia/internal/altanai@x.x.x.x [CS_EXECUTE] [NORMAL_CLEARING]
[WARNING] switch_cpp.cpp:1376 After hangup

other methods :

  • Initiate new session session:originate()
  • Record Audio session:recordFile()

5. Fire and consume Events

freeswitch.Event() and freeswitch.eventConsume() can be used to fire new events and consume events respectively. For instance to fire callback function on hangup session:setHangupHook()

6. IVR menus freeswitch:IVRMenu()

More examples



  1. lua https://www.lua.org/
  2. freeswitch https://freeswitch.org/confluence/display/FREESWITCH/Lua+API+Reference

TeleMedicine and WebRTC

Anywhere anytime Telemedicine communication tool accessible on any device.  The solution provides a low eight signalling server which drops out as soon as call is connected thus ensuring absolutely private calls without relaying or involving any central server in any call related data or media . This ensure doctor patient details are not processed , stored or recorded by our servers.

The solution enables doctors / nurses / medical practitioners and patients  to do

  • High definition Audio/video calls 
  • End to end encrypted p2p chats 
  • Integration with HMS ( hospital management system ) to fetch history of the patients 
  • Screens sharing to show reports without transferring them as files 
  • Include more concerned people of doctors using Mesh based peer to peer conferencing feature.      

Confidentialty and Privacy

For privacy and security of certain health information only HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant video-conferencing tools can only be used for Telemedicine in US.

Telemedicine scenario Callflow

Calllfow for Attended Call Transfer and 2 way conference in a Telemedicine scenario between Patient , hospital attendant , doctor and a nurse

References :

Performance of WebRTC sites and electron apps

This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.

WebRTC Clients

  1. Single-page applications (HTML5 + Js + CSS) on browser engine on OS
  2. Electron app 
    • Facebook Messenger, slack , twitch are some of the RTC based applications which have have electron clients as well.
  3. Web-view on mobile 
    • (-) doesn’t have advanced Webrtc API support eg, Media Recorder
  4. Native Applications on mobile OS( Android, iOS)
  5. Hybrid Applications (React Native)
  6. Embedded Device ( set-top box, IP camera, robots on raspberry pi)
    1. raw codecs libraries and gstt=reamer/FFmpeg script to create RTSP stream

Codecs weight

opus (111, minptime=10;useinbandfec=1)
VP8 (96)
frameWidth 640
frameHeight 480
framesPerSecond 30

DNS lookup Time

Services such as Pingdom (https://tools.pingdom.com/) or WebPageTest can quickly calculate your website’s DNS lookup times.

Load / sgtress testing for Caching and lookup times can be perfomed over tools such as LoadStorm , JMeter.

Alternatively use websoscket like setup inplace of non reusable TCP connection like HTTP or polling to set up signalling.

Bandwidth Estimation

Bandwidth can be estmated by

RTCP Receiver Reports which periodically summary to indicate packet loss rate and jitter etc from receiver.


TWCC (Transport Wide congestion Control ) calculates the intra-packet delays to estimate the Sender Side Bandwidth

a=rtcp-fb:96 transport-cc

REMB( Receiver Side Bandwidth Estimation) provide bandwidth estimation  by measuing the packet loss

  • used to configure the bitrate in video encoding
  • used to avoid congestion or slow media transmission
a=rtcp-fb:96 goog-remb

Best practices for WebRTC web clients

As a communication agent become a single HTML page driven client, a lot of authentication, heartbeat sync, web workers, signalling event-driven flow management resides on the same page along with the actual CPU consumption for the audio-video resources and media streams processing. This in turn can make the webpage heavy and many a time could result in a crash due to being ” unresponsive”.

Here are some my best to-dos for making sure the webrtc communication client page runs efficiently

Visual stability and CLS ( Cummulative Layout Shift)

CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.

To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .

Minimize main thread work

The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive.

Deprication of XMLHTTP request on main thread

Reduce Javacsipt execution Time

Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.

If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.

Some effective tips to spedding up JS execution include

  • minifying and compressing code
  • Removing the unused code and console.logs
  • Apply caching to save lookup time

Cookies – Security vs persistent state

Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.

While adding cookies we must ensure that if SameSite =None , the cookies must be secure

Set-Cookie: widget_session=abc123; SameSite=None; Secure

SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar. 

Set-Cookie: promo_shown=1; SameSite=Strict

You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.

Web Client Performance monitoring

Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .

Lighthouse results

Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App

Also shsows individual categories and comments

Time to render and Page load

Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has

  • Time to Title
  • Time to render
  • Time to inetract

Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports

Task interaction time

Other page interaction crtiteria includes the frames their inetraction and timings for the same.

In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.

Page’s total memeory



The above functions ( old and new ) estimates the memory usage of the entire web page

these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks. Can also use these memery metrics to do A/B testing .

Page weight and PRPL

Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the page light and active and prevent any chrome tab crashes.

PRPL expands to Push/preload , Render , PreCache , Lazy load

  • Render the initial route as soon as possible.
  • Pre-cache remaining assets.
  • Lazy load other routes and non-critical assets.

Preload is a declarative fetch request that tells the browser to request a resource as soon as possible. Hence should be used for crticial assests .

<link rel="preload" as="script" href="critical.js">

The non critical compoenents could then be loaded on async .

Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.

Web Workers

Web Workers are a simple means for web content to run scripts in background threads.The Worker interface spawns real OS-level threads

By acting as a proxy, service workers can fetch assets directly from the cache rather than the server on repeat visits. 

Native Applications on mobile OS

Threads and Cores



CPU profiling

Energy consumption

References :

5G and IMS

In the course of evolution of RAN ( Radio Access layer) technologies, 5G outsmarts 4G-2010 which comes in succession after 3G-2000, 2.5G, 2G -1990 and 1G/PSTN -1980 respectively. Among the most striking features of 5G are :-

  • IP based protocols
  • ability to connect 100x more devices ( IOT favourable )
  • speed upto 10 Gbit/s
  • high peak bit rate
  • high data volume per unit area
  • virtually 0 latency hence high response time

5G + IMS can accommodate the rapid growth of rich multimedia applications like OTT streaming of HD content, gaming, Augmented reality so on while enabling devices connected to the Internet of Things to onboard the telecommunication backbone with high system spectral efficiency and ubiquitous connectivity.


Infact 5G has seen maximum investment in year 2020 in revamping infrastrcuture as compared to other technologies such as IoT or even Cloud. This could be partly due to high rise in high speed communication for streaming and remote communication owining to steep rise in remote learning adn working from home scenarious.

img source statista – global-telecom-industry-priority-investment-areas


5G is specified to operate over range 1 GHz to 100 GHz.

  • Low-band spectrum (below 2.5 GHz) – excellent coverage,
  • mid- band spectrum (2.5–10 GHz) – a combination of good coverage and very high bitrates,
  • high band-spectrum (10–100 GHz) – the bandwidths needed for the highest bitrates (up to 20 Gb/s) and lowest latencies

Workplan for 5G standardisation and release

The Workplan started in 2014 and is ongoing as of now (2018). UPdate

image source : 3GPP “Getting ready for 5G”

3GPP is the standard defining body for telecom and has specified almost all RAN technologies like GSM , GPRS , W-CDMA , UMTS , EDGE , HSPAand LTE before .

5G Core Network

5G Core Network like LTE

5G + IMS

SDN + NFV for 5G deployment

SDN separates the virtualized network infrastructure from its logical architecture. which automates configuration for routing, security etc. 

It also helps in the management of infrastructure for scaling and availability.

Software-defined Networking (SDN) and Network Functions Virtualization (NFV) are advancing the deployment of 5G systems. The separation of user and control plane are essentially making the system very modular thereby increasing the application to various traffic types 

  • IMS signalling
  • Smart city sensors, cameras 
  • Web services 
  • Self-driving cars 
  • Real-Time Communications / VoIP
  • Augment Reality(AR) , Virtual Reality ( VR)
  • Real Time Gaming
  • Mission Critical Data / Push to Talk ( MCPTT)
  • buffered streaming ( non conversational Video)

Dynamic Network Slicing

Network Slicing allows mobile operators to partition a single network into multiple virtual networks. This allow network operator to use one physical network to cater to many kinds of service networks with varrying usecases around bandwidth, network latency, processing, resiliency, business requirnments.

Dynamic Network Slicing allows the network resources like radio networks, wire access, core, transport and edge networks to be divided into multiple logical networks to meet requirnments of diverse use cases. [2]

Horizontal Slicing (Infrastructure Sharing)Vertical Slicing (QoS Slicing)
The virtual infristructure is shared between different tenants for control and operations ( think IaaS)creating service instances

Service Based Architecture (SBA)

Virtualization and slicing allow us to create Service Based Architectures ( SBA). This allows control plane and user plane sepration( CUPS). It also allows sepration between access and core network.

The modular function design allows concurrent access to services as well as decoupling of stateless processors and statefull backend ( database).

  • (+) network capability exposure
  • (+) scalability
  • (+) redundancy

Applications of 5G

5G targets three main use case

  • enhanced mobile broadband (eMBB),
  • massive machine type communications (mMTC)
  • ultra-reliable low latency communications (URLLC) (also called critical machine type communications (cMTC))
sources : whitepaper ericsson


General Data Protection Regulation (GDPR) in VoIP

GDPR, Europe’s digital privacy legislation passed in 2018, replaces the 1995 EU Data Protection Directive. It is rules designed to give EU citizens more control over their personal data & strengthen privacy rights. It aims to simplify the regulatory environment for business and citizens.

To read about other Certificates , compliances and Security in VoIP which summaries

  • HIPAA (Health Insurance Portability and Accountability Act) ,
  • SOX( Sarbanes Oxley Act of 2002),
  • Privacy Related Compliance certificates like COPPA (Children’s Online Privacy Protection Act ) of 1998,
  • CPNI (Customer Proprietary Network Information) 2007,
  • GDPR (General Data Protection Regulation)  in European Union 2018,
  • California Consumer Privacy Act (CCPA) 2019,
  • Personal Data Protection Bill (PDP) – India 2018 and
  • also specifications against Robocalls and SPIT ( SPAM over Internet Telephony) among others

Multinational companies will predominantly be regulated by the supervisory authority where they have their “main establishment” or headquarter. However, the issue concerning GDPR is that it not only applies to any organisation operating within the EU, but also to any organisations outside of the EU which offer goods or services to customers or businesses in the EU.

Key Principles of GDPR are

  • Lawfulness, fairness and transparency
  • Purpose limitation
  • Data minimisation
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality (security)
  • Accountability

GDPR consists of 7 projects (DPO, Impact assessment, Portability, Notification of violations, Consent, Profiling, Certification and Lead authority) that will strengthen the control of personal data throughout the European Union.


stakeholders of data protection regulation are
Data Subject – an individual, a resident of the European Union, whose personal data are to be protected

Data Controller – an institution, business or a person processing the personal data e.g. e-commerce website.

Data Protection Officer – a person appointed by the Data Controller responsible for overseeing data protection practices.

Data Processor – a subject (company, institution) processing a data on behalf of the controller. It can be an online CRM app or company storing data in the cloud.

Data Authority – a public institution monitoring implementation of the regulations in the specific EU member country.

Extra-Territorial Scope

Any VoIP service provider may feel that since they are not based out of EU such as officially headquartered in the Asia Pacific or US region they may not be legally binding to GDPR. However, GDPR expands the territorial and material scope of EU data protection law.  It applies to both controllers and processors established in the EU, and those outside the EU, who offer goods or services to or monitor EU data subject.

VoIP service providers as Data Processors

A processor is a “person, public authority, agency or other body which processes personal data on behalf of the controller”.
Most VoIP service providers are multinational in nature with services offered directly or indirectly to all regions. The GDPR imposes direct statutory obligations on data processors, which means they will be subject to direct enforcement by supervisory authorities, fines, and compensation claims by data subjects. However, a processor’s liability will be limited to the extent that it has not complied with it’s statutory and contractual obligations.

Data minimization – It is now a good practise to store and process as less user’s personal data as necessary to render our services effectively. Also to maintain data for only a stipulated time ( approx 90 days of CDR for call details and logs )

Record Keeping, Accountability and governance

To show compliance with GDPR, a service provider maintain detailed records of processing activities. Also, they must implement technological and organisational measures to ensure, and be able to demonstrate, that processing is performed in accordance with the GDPR. Some ways to apply these are :

  • Contracts: putting written contracts in place with organisations that process personal data on your behalf
  • maintaining documentation of your processing activities
  • Organisational policies focus on Data protection by design and default – two-factor auth, strong passwords to guard against brute-force, encryption, focus on security in architecture
  • Risk analysis and impact assessments: for uses of personal data that are likely to result in a high risk to individuals’ interests
  • Audit by Data protection officer
  • Clear Codes of conduct
  • Certifications

As for a VOIP landscape thankfully every call or message session is followed by a CDR ( Calld Detail Record ) or MDR ( Message Detail Record).

Additionally, assign a unique signature to every data-access client the VoIP system and log every read/write operation carried out on data stores whether persistent datastores or system caches.

Privacy Notices to Subjects

User profile data such as :

  • Basic identity information, name, address and ID numbers
  • Web data such as location, IP address, cookie data and RFID tags
  • Health and genetic data
  • Bio-metric data
  • Racial or ethnic data
  • Political opinions
  • Sexual orientation

is protected strictly under GDPR rules

A service provider should provide indepth information to data subjects when collecting their personal data, to ensure fairness and transparency. They must provide the information in an easily accessible form, using clear and plain language.


The GDPR introduces a higher bar for relying on consent , requiring clear affirmative action. Silence, pre ticked boxes or inactivity will not be sufficient to constitute consent. Data subjects can withdraw their consent at any time, and it must be easy for them to do so.

Lawful basis for processing Data now include

In Article 6 of the GDPR , there are six available lawful bases for processing.

(a) Consent: the individual has given clear consent for you to process their personal data for a specific purpose.

(b) Contract: the processing is necessary for a contract you have with the individual, or because they have asked you to take specific steps before entering into a contract.

(c) Legal obligation: the processing is necessary for you to comply with the law (not including contractual obligations).

(d) Vital interests: the processing is necessary to protect someone’s life.

(e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law.

(f) Legitimate interests: the processing is necessary for your legitimate interests or the legitimate interests of a third party, unless there is a good reason to protect the individual’s personal data which overrides those legitimate interests.

File such as PCAPS , Recordings and transcripts of calls hold sensitive information from end users , these should be encryoted and inaccssible to even the dev teams within the org without explicit consent of end user .

Individuals’ Rights

The GDPR provides individuals with new and enhanced rights to Data subjects who will have more control over the processing of their personal data. A data subject access request can only be refused if it is manifestly unfounded or excessive, in particular because of its repetitive character.

Rights of Data Subjets include

  • Right of Access
  • Right to Rectification
  • Right to Be Forgotten
  • Right to Restriction of Processing
  • Right to Data Portability
  • Right to Object
  • Right to Object to Automated Decisionmaking

For a VoIP service provider if a user opts for redaction then none of his calls or messages should be traced in logs . Also replace distinguishable end user identifier such as phone number and sip uri with *** charecters

Provide option for “Account Deletion” and purge account – If a user wished to close his/her account , his/her detaisl should be deleted form the sustem except for the bare bones detaisl which are otherwise required for legal , taxation and accounting requirnments

Breach Notification

A controller is a “person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of processing of personal data”,

A controller will have a mandatory obligation to notify his supervisory authority of a data breach within 72 hours unless the breach is unlikely to result in a risk to the rights of data subjects. Will also have to notify affected data subjects where the breach is likely to result in a “high risk” to their rights. A processor, however, will only be obliged to report data breaches to controllers

International Data Transfers

Data transfers to countries outside the EEA(European Economic Area) continue to be prohibited unless that country ensures an adequate level of protection. The GDPR retains existing transfer mechanisms and provides for additional mechanisms, including approved codes of conduct and certification schemes.

The GDPR prohibits any non-EU court, tribunal or regulator from ordering the disclosure of personal data from EU companies unless it requests such disclosure under an international agreement, such as a mutual legal assistance treaty.

One of the biggest challenges for a service provider is the identification & categorization of GDPR impacted data sets in disparate locations across the enterprise. A dev team must flag tables, attributes and other data objects that are categorically covered under GDPR regulations and then ensure that they are not transferred to a server outside of EU.

In the present age of Virtual shared server instance, cloud computing and VoIP protocol it is operational a very tough task for a communication service provider to ensure that data is not transferred outside of EU such as a VoIP call from origination in US and destination in EU will require information exchanges via SDP, vcard , RTP stream via media proxies etc.


The GDPR provides supervisory authorities with wide-ranging powers to enforce compliance, including the power to impose significant fines. You will face fines of up to €20m or 4% of your total worldwide annual turnover of the preceding financial year. In addition, data subjects can sue you for pecuniary or non-pecuniary damages (i.e. distress). Supervisory authorities will have a discretion as to whether to impose a fine and the level of that fine.

Data Protection officer (DPO)

Under the terms of GDPR, an organisation must appoint a Data Protection Officer (DPO) if it carries out large-scale processing of special categories of data, carries out large scale monitoring of individuals such as behaviour tracking or is a public authority.

Reference :

Media Architecture, RTP topologies

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Point to Point

Endpoints communicating over unicast. RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session.

Advantages of P2p Disadvantages of p2p
(+) Facilitates private communication between the parties (-) Only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports

Point to Point via Middlebox

Same as above but with a middle-box involved. Middle Box type are :


Mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion. This does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

Subtypes of Multibox :

Transport/Relay Anchoring

Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server

Middleboxes for auditing or privacy control of participant’s IP

Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup

Transport translator

interconnecting networks like multicast to unicast

media packetization to allow other media to connect to the session like non-RTP protocols

Media translator

Modifies the media inside of RTP streams commonly known as transcoding.

It can do up to full encoding/decoding of RTP streams. In many cases it can also act on behalf of non-RTP supported endpoints, receiving and responding to feedback reports ad performing FEC ( forward error corrected )

Back-To-Back RTP Session

Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions.

Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs

Advantages of Back-To-Back RTP SessionDisadvantages of Back-To-Back RTP Session
(+) B2BUA / media bridge take responsibility tpo relay and manages congestion(-) It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations

Point to Point using Multicast

Any-Source Multicast (ASM)

traffic from any particpant sent to the multicat group address reaches all other partcipants

Source-Specific Multicast (SSM)

Selective Sender stream to the multicast group which streams it to the recibers

Point to Multipoint using Mesh

many unicast RTP streams making a mesh

Point to Multipoint + Translator

Some more variants of this topology are Point to Multipoint with Mixer

Media Mixing Mixer

receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through

static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

Media Switching Mixer

RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

The Mixer can reduce bitrate or switch between sources like active speakers.

SFU ( Selective Forwarding Unit)

Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.

Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

Advantges of SFUDisadvatages of SFU
(+) Low lanetncy and low jitter buffer requirnment by avoiding re enconding
(+) saves on encoding decoding CPU utilization at server
(-) unable to manage network and control bitrate
(-) creates higher load on receiver when compared with MCU

On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.

Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.


Encode in multiple variation and let SFU decide which endpoint should receive which stream type

Advantages of SFU +SimulcastDisadvantages of SFU +Simulcast
(+) Simulcast can ensure endpoints receive media stream depending on their requirnment/bandwidth/diaply(-) Uplink bandwidth reuirnment is high
(-) CPU intensive for sender for encoding many variations of outgoing stream

SVC ( scalable Video Coding)

Encodes in multiple layers based on various modalities such as

  • Signal to noise ration
  • temporal
  • Spatial
Advantages of SFU +SimulcastDisadvantages of SFU +Simulcast
(+) Simulcast can ensure endpoints receive media stream depending on their requirnment/bandwidth/diaply(-) Uplink bandwidth reuirnment is high
(-) CPU intensive for sender for encoding many variations of outgoing stream

Hybrid Topologies

There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan

Hybrid model of forwarding and mixed streamings

Some endpoints receive forwarded streams while others receive mixed/composited streams.

Serverless models

Centralized topology in which one endpoint serves as an MCU or SFU.

Used by Jitsi and Skype

Point to Multipoint Using Video-Switching MCUs

Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.

This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP

To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains

Cascaded SFUs

SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources

Transport Protocols

Before getting into an in-depth discussion of all possible types of Media Architectures in VoIP systems, let us learn about TCP vs UDP.

TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.

Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction is used due to its tunnelling methods where packets of other protocols are encapsulated inside of UDP packet. However to provide E2E security other methods for Auth and encryption are used.

Audio PCAP storage and Privacy constraints for Media Servers

A Call session produces various traces for offtime monitoring and analysis which can include

CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

References :

To learn about the difference between Media Server tologies

  • centralized vs decentralised,
  • SFU vs MCU ,
  • multicast vs unicast ,

Read – SIP conferecning and Media Bridge

SIP conferencing and Media Bridges

SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

To read more about buildinga scalable VoIP Server Side architecture and

  • Clustering the Servers with common cache for High availiability and prompt failure recovery
  • Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
  • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
  • Containerization and Autoscalling

Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

VoIP/ OTT / Telecom Solution startup’s strategy for building a scalable flexible SIP platform

Scalable and Flexible platform. Let’s go in-depth to discuss how can one go about achieving scalability in SIP platforms. ulti geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, uted Event management and Event Driven architecture , Containerization, autoscaling , security , policies…