WebRTC , SIP , IMS, VoLTE , SaaS , SBC , REST , Cloud , IOT , media Streams
Author: altanai
Specialized in CPaaS, carrier-grade WebRTC-SIP telecom platforms for Unified communication-collaboration, signaling gateways, SBC, soft turrets, IoT-surveillance and telecom integrations.
Ardent contributor to Open Source software, avid freelancer, innovator and technical writer (telecom.altanai.com ).
Inventor of "RamuDroid" an IOT Road-Cleaning robot
Author of book "WebRTC Integrator's Guide" published by Packt
Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.
An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.
A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.
(+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
(-) can introduce latency for packets that arrive early
(-) Need active resisizing by means of feedback
for hi speed and goog network jitterbuffer can ve small sized
for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
(-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.
Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.
using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.
Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.
a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
FIR
PIL
request a full key frame from the sender , when new memeber enters the session.
request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss
Congestion is created when a network path has reached its maximum limits which could be due to
failures(switches, routers, cables, fibres ..)
over subscription and operating at peak bandwidth.
broadcast storms
Inapt BGP routing and congestion detection
BGP is responsisble for finiding the shortest routable path for a packet
The direct consequences of congestion for any network transport can be
High Latency
Connection Timeouts
Low throughput
Packet loss
Queueing delay
With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.
A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.
Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems
Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.
On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
On the sender side TWCC(Transport wide congestion control) can be used.
Other congestion control algorithms
QUIC Loss Detection and Congestion Control RFC 9002
Coupled Congestion Control for RTP Media rfc8699
NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
Self-Clocked Rate Adaptation for Multimedia RMCAT WG
SCReAM – Mobile optimised congestion control algorithm by Ericson
High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.
Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.
Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.
Optimize Compute resource
mobile agents have lesser computative power
Camera with features such as auto focus or other adjustments will taker more time to cappture
network should be of suited bandwidth and strength
Reduce information to be encoded and sent
Subject focus and blurring backgroud
Filtering noise at source
Voice Activity Detection (VAD)
send extra data in FEC only is there is voice activity detected in packet
Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization
Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.
Features :
multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
Multistreaming transmit several independent streams of chunks in parallel
SCTP has similarities to TCP retransmission and partial reliability like UDP.
Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
Validation and acknowledgment mechanisms protect against flooding attack
SCTP frames data as datagrams and not as a byte stream
(+) SCTP enables WebRTC to be multiplexing
(+) It has flow control and congestion avoidance support
End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.
Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.
With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .
(+) easier to manage security by gathering ICE candidates for a single port only instead of 2
(+) increases the systesm capacity for media session using the same number of ports
(+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing
Echo is the sound of your own voice reverberating. If the amplitude of such a sound is high and intervals exceed 25 ms, it becomes disruptive to the conversation. Its types can be acoustic or hybrid. Echo cancellers need to eliminate the echo while still preserving call quality and not disrupting tones such as DTMF.
Usually the background or reflected noise which is an undesired voiceband energy transfers from the speaker to the microphone and into the communication network. Mostly found in a hands-free set or speakerphone. In a multiparty call scenario, it could also occur due to unmatched volume levels, challenging network conditions on one party, background noise, double talk or even proximity between user and microphone
In a public telephone system, local loop wiring is done using two-wire connections carrying bidirectional voice signals. In PBX, a two-to-four wire conversion is done using a hybrid circuit which does not perform perfect impedance matches resulting in a Hybrid echo.
An efficient echo canceller should cancel out the entire echo tail while not leading to any packet loss. It needs to be adaptive to changing IP network bandwidth and algorithm should function equally well in conference scenarios where there may be more than one echo sources. Benchmarking tools like MOS (Mean Opinion scores ) are used to gauge the results. Often voice quality enhancement technologies are also integrated into AEC modules, such as :
VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.
get CDR ( filtered per cal or acc to specific date or account)
bulk export of CDR
Internal API gateways
API Rate Limiter
Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.
Auto scaling
Load balancer
Rate Limiter
horizotal or vertical scalling can countger incoming traffic
LB can limit number of simultaneous requests. It can reject or send to queue for later operation
Can intelligently understand the cost of each operation and perform throttling.
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately
(-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations) (-) LB cannot ensure uniform distribution of distribution of operations among all servers.
A rate limiter should have low latency, accurate and scalable.
RateLimiter inside the serviceprocess
Rate Limiter as its own process outside as a daemon
(+) faster , no IPC (+) reisstnt to interprocess call failures
(+) programming langiage agnostic daemon (+) uses its own memory space, more predictable
(-) service meory needs to allocate space for rate limiters
widely used for auto discovery of service host
Token based Rate Limiting
provides admission contro
Token bucket filter
define a users quota in terms average rate and burst capacity
Hierarchical Token Bucket ( HTB)
uses the deficit round-robin algorithm for fair queuing
Fair Queing
give paying users a bandwidth fraction of 25%
priority queuing
decide 1 packet/ms for free or reduce rate user
distributes that sender’s bandwidth among the other senders
CBQ ( Class Based Queing)
Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.
Modular QoS command-Line interface (MQC) Shaping
mplement traffic shaping for a specific type of traffic using a traffic policy
When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
When the buffer queue is full, the device discards the buffered packets.
Throttling
delay the packet until the bucket is ready / shaping
drop the packet / Policing
mark the packet as non-compliant
Failure management on Rate Limiter
Node Crash : just less requests trolled
Leaky bucket
tokens can go into -ve
System Design for API gateway
Important points for design API gateway
Serialize data in company binary format
allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
aggregation on API gateway on the fly
Frontend Service
Partitioned Service
Backend Service
Lightweight web service Stateless Request Validation Auth / Authorization TLS(SSL ) termination Server sode encryption Caching Rate Limiting(throttling) Request deduplication
Caching layer between frontend and backend
Replication Leader Selection + Quorem
Distributed messaging system( fast and slow paths) for API
A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.
used by applications in cloud ADN (Application delivery network)
used by network address translators (NATs) DNS load balancing
LB ping each server fpr health status and greylists server that are unhealthy , it rechecks aftera while and if a server is healthy ( reponds with pong) it can resume sending traffic to it.
LB should also be distributed to diff data centres in primary -secondary setup for HA.
Networking protocol
TCP Loadbalancer
HTTP load balancers
SIP based LB as Kamailio/ Opensips
can forwrad the packet without inspecting the content of the packet.
terminate the connection and look inside the request to make a load balcing decsiion for exmaple by using a cookie or a header.
domain specific to VoIP
(+) fast, can handle million of req per second
(+) handle SIP routing based on SIP headers and prevent flooding atacks and other malicious malformed packets from reaching application server
Load balancing algorithms
Weighted Scheduling Algorithm
Round Robin Algorithm
Least Connection First Scheduling
Lest response time algorithms
Hash based algorithm ( send req based on hashed value such as suing IP address of request URL)
Loadbalancer
Reverse Proxy
forward proxy server allows multiple clients to route traffic to an external server
accepts clients requestd for server and also returns the server’s response to the client ie routes traffic on behalf of multiple servers.
Balances load and incoming traffic endpoint
public facing endpoint for outgoing traffic additional level of abstraction and security, compression
used in SBC (session border controllers) and gateways
client side service discovery uses broadcasting or heartbeat mechanism to keep track of active servers and deactivates unresponsive or failed servers. Some approaches for Service Discovery
Mesh
(-) exponentially incresing network traffic
Gossip
Distributed cache
Coordination service with Service
(-) requesres coordination service for leader selection
Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint.
Partition strategy can be based on various ways such as :-
Name based partition
geographic partition
names’ hashed value based on identifier
(-) can lead to hot partitions ( high density in areas of freq accessible identioers )
(-) high density spots for example all messages with a null key to go to the same partition
(-) doesnt scale
event time based hash
(+) data is spread evenly over time
To create a well distributed partition we could spread hot partition into 2 partitions or dedicate partitions for freq accessible items. An effective partitioning keys uses
Cardinality : total num of unique keys for a usecase. High cardinality leads to better distribution.
high cardinatility keys : names , email address , url since they have high variatioln
low cardinatlity keys : boolean flags such as gender M/F
Selectivity : number of message with each key. High selectivity leads to hotspots and hence low selectivity is better for even distribution.
Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management used in DevOps. I have mentioned this in detail on the article on VoIP and DevOps below.
for a VOIP system catering to many clients accross the globe or accessing multiple carriers meant for different counteries based on Prefix matching , there should be alocal PoP in most used regions . typically these regions include – US east – west coasts, UK – germany of London , Asia Pacific – Mumbai ,Hong Kong and Australia.
Minimal Latency and lowest amount of tarffic via public internet
Created multiple POPs and enbaling private traffic via VPN inbetween them ensures that we use the backbone of our cloud proivider such as AWS or datacentre instead of traversing via public internet which is slower and more insecure .
By hoping on private interface between the cloud server and mainting a private connection and keepalive betwen them helps optimize the traffic flow while keeping the RTT and latency low.
A high-availability (HA) architecture implies Dependability.Usually via existence of redundant applications servers for backups: a primary and a standby. These applications are configured so that if primary fails, the other can take over its operations without significant loss of data or impact to business operations.
Downtime / SLA of 5 9’s in aggregate failures
4 9’s of availiability on each service components gives a downtime of 53 mins per service each year. However in aggregate failure this could amlount to (99.99)10 = 99.9 downtime which is 8-10 hours each year.
Thus, aggregate failure should be taken into consideration while designing reliable systems.
HA for Proxy / Load balancer (LB)
A LB is the first point of contact for outbound calls and usually does not save the dialogue information into memory or database but still contain the transaction information in memory. In case the LB crashes and has to restart, it should
have a quick uptime
be able to handle in dialogue requests
handle new incoming dialogue requests in a stateless manner
verify auth/authorization details from requests even after restart
HA for Call Control app server
App server is where all the business logic for call flow management resides and it maintains the dialog information in memory.
Issues with in-memory call states : If the VM or server hosting the call control app server is down or disconnected, then live calls are affected, this, in turn, causes revenue loss. Primarily since the state variable holding the call duration would be able to pass onto the CDR/ billing service upon the termination of the call. For long-distance, multi telco endpoint calls running hours this could be a significant loss.
Standby app server configurationand shared memory : If the primary app server crashes the standby app server should be ready to take its place and reads the dialog states from the shared memory.
Live load balanced secondary app server + external cache for state varaibles : External cache for state variables: a cluster of master-slave caches like Redis is a good way of maintaining the dialogue state and reading from it once the app server recovers from a failed state or when a secondary server figures it has a missing variable in local memory.
Media Server HA
Assuming the kamailio-RTPengine duo as App server and Media Server. These components can reside in same or different VMs. Incase of media server crash, during the process of restoring restarted RTpengine or assigning a secondary backup RTpengine , it should load the state of all live calls without dropping any and causing loss of revenue . This is achived by
external cache such as Redis ,
quick switchover from primary to secondary/fallback media server and
floating IPs for media servers that ensures call continuity inspite of failure on active media server.
Architecturally it looks the same as fig above on HA for the SIP app server.
Attacks and security compromisation pose a very signficant threat to a VoIP platform.
MITM attacks
Man in midddle attacks can be counetred by
End to end encryption of media using SRTP and signals using TLS
Strong SIP auth mechanism using challenges and creds where password is composed of mixed alphanumeric charecters and atleast 12 digits long
Authorization / whitelisting based on IP which adheres to CIDR notation
DDOS attacks
DDOS renders a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.
dDOS – multiple network hosts to flood a target host with a large amount of network traffic. Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.
Can be counetred by
detect flooding and q in traffic and use Fail2ban to block
challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required)
Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion .
Sources for alert manager
Build failed ( code crashes, Jenkins error)
Deployment failed ( from Kubernetes , codechef, docker ..)
configuration errors ( setting VPN etc )
Server logs
Server health
homer alerts ( SIP calls responses 4xx,5xx,6xx)
PCAP alerts ( Malformed SIP SDP ..)
Internal Smoke test ( auto testing procedure done routinely to check live systems )
Support tickets from customer complaints ( treat these as high priority since they are directly impacting customers)
A QA framework basically validates the services and callflows on staging envrionment before pushing changes to production. Any architectural changes should especially be validated throughly on staginng QA framework befire making the cut. The qualities of an efficient QA platform are :
Genric nature – QA framework should be adatable to different envrionments such as dev , staging , prod
Containerized – it should be easy to spn the QA env to do large scale or small scale testing and hence it should be dockerized
CICD Integration and Automation – integrate the testcases tightly with gt post push and pull request creation . Minimal Latency and lowest amount of tarffic via public internet
Keep as less external dependecies as possible for exmaple a telecom carrier can be simulated by using an PBX like freeswitch or asterix
AsynchronousRun – Test cases should be able to run asynchronously. Such as seprate sipp xml script for reach usecase
Sample Testcases for VoIP
Authentication before establish a session
Balance and account check before establishing a session like whitelisting , blacklisting , restricted permission in a particular geography
Transport security and adaptibility checks , TLS , UDP , TCP
codec support validation
DTMF and detection
Cross checking CDR values with actual call initiator and terminator party
Inidividual Events ( like every click or every call metric)
Aggregate Data ( clicks per minute, outgoing calls per minute)
(+) fast write (+) can customize/ recalculate data from raw
(+) faster reads (+) data is fready for decision making / statistics
(-) slow reads (-) costlier for large scale implementations ( many events )
(-) can only query in the data as was aggregates ( no raw ) (-) requires data aggregation pipeline (-) hard to fix errors
suitable for realtime / data on fly low expected data delay ( minutes )
suitable for batch processing in background where delay is acceptable from mintes to hours
Push vs Pull Architecture
Push : A processing server manages state of varaible in memory and pushes them to data store.
(-) crashed processingserver means all data is lost
Pull : A temporary data strcyture such as a queue manages the stream of data and processing service pull from it to process before pusging to data stoore.
(+) a crashed server has to effect on temporarily queue held data and new server can simply take on where previous processing server left.
(+) can use checkpointing
Popular DB storage technologies
SQL
NoSQL
Structured and Strict schema Relational data with joins
Semi-structured data Dynamic or flexible schema
(+) faster lookup by index
(-) data intensive workload (+) high throughput for IOPS (Input/output operations per second )
used for Account information transactions
best suitable for Rapid ingest of clickstream and log data Leaderboard or scoring data Metadata/lookup tables
DynamoDB – Document-oriented database from Amazon MongoDB – Document-oriented database
A NoSQL databse can be of type
Quorem
Document
Key value
Graph
Cassandra is wide column supports asyn master less replication
Hinge base also a quorem based db also has master based preplication
MongoDB documente orientd DB used leacder based replication
SQL scaling patterns include:
Federation/ federated database system : transparently maps multiple autonomous database systems into a single virtual/federated database.
(-) slow since it access multiple data storages to get the value
Sharding / horizontal partition
Denormalization : Even though normalization is more memory efficient denormalization can enhance read performance by additing redundant pre computed data in db or grouping related data.
Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.
SQL Tuning : “iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals”
Distributed event management, monitoring and working on incoming realtime data instead of stored Database is the preferred way to churn realtime analysis and updates. The multiple ways to handle incoming data are
Batch processing – has lags to produce results, not time crtical
Data stream – realtime response
Message Queues – ensures timely sequence and order
Buffering
Batching
Add events to buffer that can be read
Add events to batch and send when batch is full
(+) can handle each event
(+) cost effective (+) ensures throughput (-) if some events in batch fail should whole batch fail ? (-) not suited for real time processing
S3 like objects storage + Hadoop Mapreduce for processing
Timeout
Connection timeout : use latency percentiles to calculate this
Request timeout
Retries
exponential backoff : increase waiting time each try
jitter : adds rabdomness to retry intervals to spread out the load.
Grouping events into object storage and Message Brokers
slower than stream processing but faster than batch processing.
In event driven archietcture a produce components performs and action which creates an event thata consumer/listener would subscribes to consume.
(+) time sensitive
(+)Asynch
(+) Decoupled
(+) Easy scaling and Elasticity
(+) Heterogeneous
(+) contginious
Expanding the stream pipeline
Event Streams decouple the source and sink applications. The event source and event sinks (such as webhooks) can asynchronously communicate with each other through events.
Options for stream processing architectures
Apache Kafka
Apache Spark
Amazon kinesis
Google Cloud Data Flow
Spring Cloud Data Flow
Here is a post from earlier which discusses – Scalable and Flexible SIP platform building, Multi geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Failure Recovery, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, Load Balancer / Message Dispatcher, Back end Dynamic Routing and REST API services, Containerization and Auto Deployment, Auto scaling CloudServersusing containerized images.
Stream processing on top of map reduce and stream processing engine. In lambda architecture we can send events to batch system and stream processing system in parallel. The results are stiched together at query time.
Lambda Archietcture : stream processing on top of map reduce and stream processing engine. Send events to batch system and stream processing system in parallel. The results are stiched together at query time.
Apache Kafka is used as source which is a framework implementation of a software bus using stream-processing. “.. high-throughput, low-latency platform for handling real-time data feeds”.
Apache Spark : Data partitioning and in memory aggregation.
Isolates cache fro service Cache and service do not share memory and CPU can scale independently can be used by many microservices flexibility in choosing hardware
doesnt require seprate hardware low operational and hardware cost scales together with the service
EEP duplicates and IP datagram and encapsulates and sends for remote relatime monitoring for SIP specific alerts and notifications . HEP is popular among many SIP servers including Freeswitch , Opensips, Kamailio, RTP engine as an external module .
intended for passive duplicated for remote collection
can be used for audit storage and analysis
does not alter the orignal datagram or headers
HOMER is Packet and Event capture system popular fpr VOIP/RTC Monitoring based on HEP/EEP (Extensible Encapsulation protocol)
SIP Server Integration
Homer and homer encapsulation protocl (HEP) integration with sip server brings the capabilities to SIP/SDP payload retention with precise timestamping better monitor and detect anomilies in call tarffic and events correlation of session ,logs , reports also the power to bring charts and statictics for SIP and RTP/RTCP packets etc. We read about sipcapture and sip trace modules in project sipcapture_siptrace_hep.
Both Kamailio and Opensips HEP Integration are structurally simmilar. In kamailio SIPCAPTURE [2] module enables support for –
● Monitoring/mirroring port ● IPIP encapsulation (ETHHDR+IPHDR+IPHDR+UDPHDR) ● HEP encapsulation protocol mode (HEP v1, v2, v3)
seting db_mode – synchronisation of dialog information from memory to an underlying database has following options 0 – NO_DB – the memory content is not flushed into DB; 1 – REALTIME – any dialog information changes will be reflected into the database immediately. 2 – DELAYED – the dialog information changes will be flushed into DB periodically, based on a timer routine. 3 – SHUTDOWN – the dialog information will be flushed into DB only at shutdown – no runtime updates.
note :
use the same hash_size while using diff kamailio to restore dialogs
database table for dialogue
install mysql
define root ( with db create permissions ) and user ( with database read wrote ) permission in kamctlrc
vi /usr/local/etc/kamailio/kamctlrc
Dialogue table schema *
name type size default null key extra attributes description id unsigned int 10 no primary autoincrement unique ID hash_entry unsigned int 10 no Number of the hash entry in the dialog hash table hash_id unsigned int 10 no The ID on the hash entry callid string 255 no Call-ID of the dialog from_uri string 128 no URI of the FROM header (as per INVITE) from_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog. to_uri string 128 no URI of the TO header (as per INVITE) to_tag string 64 no identify a dialog, which is the combination of the Call-ID along with two tags, one from participant in the dialog. caller_cseq string 20 no Last Cseq number on the caller side. callee_cseq string 20 no Last Cseq number on the caller side. caller_route_set string 512 yes Route set on the caller side. callee_route_set string 512 yes Route set on on the caller side. caller_contact string 128 no Caller's contact uri. callee_contact string 128 no Callee's contact uri. caller_sock string 64 no Local socket used to communicate with caller callee_sock string 64 no Local socket used to communicate with callee state unsigned int 10 no The state of the dialog. start_time unsigned int 10 no The timestamp (unix time) when the dialog was confirmed. timeout unsigned int 10 0 no The timestamp (unix time) when the dialog will expire. sflags unsigned int 10 0 no The flags to set for dialog and accesible from config file. iflags unsigned int 10 0 no The internal flags for dialog. toroute_name string 32 yes The name of route to be executed at dialog timeout. req_uri string 128 no The URI of initial request in dialog xdata string 512 yes Extra data associated to the dialog (e.g., serialized profiles).
Siptrace module
SIPtrace module offer a possibility to store incoming and outgoing SIP messages in a database and/or duplicate to the capturing server (using HEP, the Homer encapsulation protocol, or plain SIP mode).
integrating iut with request route to start duplicating the sip messages
sip_trace();
setflag(22);
trace_mode * 1 – uses core events triggered when receiving or sending SIP traffic to mirror traffic to a SIP capture server using HEP 0 – no automatic mirroring of SIP traffic via HEP.
duplicate
address in form of a SIP URI where to send a duplicate of traced message. It uses UDP all the time.
New OSS Capture-Agent framework with capture suitable for SIP, XMPP and more. With internal method filtering , encryption and authetication this does look very promising howevr since I have perosnally not tried it yet , I will leave this space TBD for future
Multi-Protocol HEP Server & Switch in NodeJS. stand-alone HEP Capture Server designed for HOMER7 capable of emitting indexed datasets and tagged timeseries to multiple backends
PCAP monitoring -> Homer Server -> Notification and Fraud Prevention
A realtime monitoring and alerting setup fom homer can best safeguard on VoIP specific attacks and suspecious activity by early warning . Some list of attacks such as DDOS , SIP SQL injections , parser , remote manipulation hijacking as cell as resource enumeration are common ifor a cloud telephony provider.
Adiitionally homer provide session quality using varables that include [1]
SD = Session Defects [SUM(500,503,504)]
ISA = Ineffective Session Attempts [SUM(408,500,503)]
AHR = Average HOP Requests
ASR = Answer Seizure Ratio [(‘200’ / (INVITES – AUTH – SUM(3XX))) * 100]
NER = Network Efficiency Ratio [(‘200’ + (‘486′,’487′,’603’) / (INVITES -AUTH-(SUM(30x)) * 100]
HOMER Web Interface or Custom Dashboard
Some more visualization for inter team communication such as NOC team can include
Data Centres are the concentrated processing units for the amazing Internet that is driving the technological innovation of our generation and has become the backbone of our global economy. DataCentres not only process , store and carry textual data rather a vast amount of computing is for multimedia content which could range from social media to, video streaming or VoIP calls. In this article let us analyze the energy effiiciency , carbon footprint and scope of improvements for a VoIP related data centre which hosts SIP and related RTC technology signalling and media servers and process CDRs and/or media files for playback or recordings.
Just like a regular IT datacentre , storage, computing power and network capacity define the usage of the server.Also unobstructed electricty of of paramount importance as any blackout could drop ongoing calls and lead to loss of revenue for the service provider not to forget the loss caused to parties engaged in call.
Increasing Power consumption by telecom Sector over the years
Typical VoIP Setup : Whether a cloud Infrstrcture provider of a hosted data centre , an aproximate number of 7 servers is required even for SME ( small to medium enterprises ) communication system and VoIP systems
2 signalling servers primary and standby for HA ,
2 media server for MCU of media bridges or IVR playback etc ,
1 for CDR , logs or call analytics , stats and other supplementary operation
1 for dev or engineering team .
1 edge server could be API server or a gateway or laodbalancer.
Sample voIP system
VoIP solutions are more energy expensive, unless aggressive power saving schemes are in place
Comparison of energy efficiency in PSTN and VoIP systems [14]
While PSTN and other hybrid scenarios relied on audio only communication the embedded systems involved took great pain to make then energy efficient which is not really the case with all digital and software based VoIP.
Power Consumption
Mobile phone : Typical smartphone with 4,000mAh ( 4 Ah) battery that gets 1 full cycle of usage a day. Daily consumption =4Ah*3.7V=14.8 Wh
Laptop : With 14–15″ screen, a laptop can draw 60 watts power in active use depending on model. Runing 8 hours a day can be 60 * 8 = 480 Wh ( 0.480 kWh) energy consumed in a day.
Desktop PC : Runing at 50-60 Hz frequency , can upto draw 200 W power in active use. For 8 hours energy usage 200 * 8 = 1600 Wh ( 1.6 kWh ) energy a day.
Server : Even though servers are virtual to the request maker , they caters to the request on the other end of the internet.
Server
Purpose
Server CPU consumption
Clients
Client CPU consumption
Application
Hosts an application, which can be run through a web browser or customized client software.
medium
Any network device with access.
low
Computing
Makes available CPU and memory to the client. This type of server might be a supercomputer or mainframe.
high
Any networked computer that requires more CPU power and RAM to complete an activity.
medium
Database
Maintains and provides access to any database.
low
Any form of software that requires access to structured data.
low
File
Makes available shared files and folders across a network.
medium
Any client that needs access to shared resources.
low
Game
Provisions a multiplayer game environment.
high
Personal computers, tablets, smartphones, or game consoles.
high
Mail
Hosts your email and makes it available across the network.
medium
User of email applications.
low
Media
Enables media streaming of digital video or audio over a network.
high
Web and mobile applications.
high
Print
Shares printers over a network.
low
Any device that needs to print.
low
Web
Hosts webpages either on the internet or on private internal networks.
medium
Any device with a browser.
medium
CPU consumption of various server types and their clients
Typically runing on 850 Wh ( 0.850 kwh ) of energy in an hour and since server are usually up 24*7 that totals to
0.850 * 24 = 20.4 kWh a day [2].
VoIP System ( 7 VM’s) : For a setup of 7 VM’s ( could on a the same PM), total energy consumed in a day
20.4 * 7 = 142.8 kWh.
Data centre: The data centre building consists of the infrastructure to support the servers, disks and networking equipment it contains. However, for simplicity, I will only use the consumption of servers and ignore the cooling units, networking, backup batteries charging, generators, lightning, fire suppression, maintenance etc.
High tier DC can have 100 Megawatts of capacity having each rack was using 25 kW of power in a 52U Rack. 100,000 kW / 25 kW = 4,000 racks * 52(U) = 208000 1U servers. This number scales down depending on how much energy each server uses and idle servers.
Total energy 100,000 kW * 24 hours = 2400000 kWh
Carbon Footprint
Carbon footprint in the context of this article refers to the amount of greenhouse gas ( consisting majorly of Co2) caused by electricity consumption. The unit is carbon emission equivalent of the total amount of electricity consumed kg CO2 per kWh.
In doing this calculation I have assumed 0.233 kg CO2 per kWh which could be less or more depending on the generation profile of the electricity provider as well as the heat produced by the machine.
Laptop: Aside from the production which could be 61.4 kg (135.5 lbs) of Co2, a 60W laptop will produce 0.112 kg co2 eq per day.
Desktop PC: Aside from production cost and heating, the GWG and co2 eq emission from running a desktop for a day ( 8 hours) produces 1.6* 0.233 = 0.3728 kg CO2 per kWh
Server : 20.4 * 0.233 = 4.7532 kg CO2 per kWh per day .
VoIP System ( 7 VM’s): Again ignoring the GWG emission of associated components, 142.8 * 0.233 = 33.2724 kg CO2 per kWh per day. It is to be noted that DC’s ( datacentres) use the term PUE ( Power Usage Effectiveness) to showcase their energy efficiency and energy efficiency certification uses the same in ratings.
Data centre: electrical carbon footprint( approximate calculation not counting the cooling, infra maintenance, lightning and possibly idle servers in datacentre) is 2400000 * 0.233 = 559200 kg CO2 per kWh per day
It is to be noted that a common figure should not be extrapolated like this to derive carbon emission. The emission depends on the fuel mix of the electricity generation as well as the life cycle assessment (LCA) of carbon equivalent emission. Countries with heavy reliance on renewables have lower co2 footprint per kWh ~ 0.013 kg co2 per kWh Sweden while others may have higher such as 0.819 kg CO2 per kWh Estonia [1].
Flatten the Curve from Tech and Internet usage
Rack servers tend to be the main perpetrators of wasting energy and represent the largest portion of the IT energy load in a typical data center.
A decade ago, small enterprise IT facilities were quick to create data centres for hosting applications from hospitals, banks, insurance companies. While some of these is likely to have been upgrade to shared server instances runing on IaaS providers, most of them are still serving traffic or stays there for the lack of effort to upgrade.
With the advancement in p2p technlogies such as dApps , bitcoion network , p2p webrtc streaming , more edge computed ML continue to create disruptions in existing trend , most likely to result in in many fold increase in consumption.
According to the Cambridge Center for Alternative Finance (CCAF), Bitcoin currently consumes around 110 Terawatt Hours per year — 0.55% of global electricity production
Harward Business Review [12]
“the emissions generated by watching 30 minutes of Netflix (1.6 kg of CO2) is the same as driving almost four miles.”
EnergyInnovation [13]
Cloud Computing and Energy efficiency
Cloud computing ( SaaS, PaaS , IaaS and also CPaaS) minimize power consumption and consequently IT costs via virtualization, clustering and dynamic configuration.
With cloud infrastrcture vendors such as Amazon , Google , microsoft .. and their adoption of energy efficiency computing and credible transparency has alleviated some of the stress that could have been made if onsite self – hosted data centres were used as often in mainstream as a decade ago.
Even as cloud providers gives on -demand access to shared resources in large scale distributed computing , the ease of getting on board has inturn created a surge in cloud hosted online applications consequently high power consumption, more operation costs and higher CO2 emissions.
Components of energy Consumption in Data Centre
As shown CPU, Memory, and Storage incur 45% of the costs and consume 26% of the total energy , however power distribution and colling cost 25% but consumer >50% of total energy.
Energy forcast for Data Centres
As reported by nature [3] the widely cited forcasts suggested thte total electrcity demand of ICT ( Informatioin and Communication technology ) will accelerate and while consumer devices such as smart TV , laptops and mobile are becoming energy effcient , the data centres and network devices will demand bigger portions. Reported in 2018 , 200 Twh( terawatt hours) of energy was being consumed by data centers . Although there are no figures for the telecom or specifically IP cloud telephony , the assumption that enormous multimedia data flows in every session is enouogh to assume the figure must be huge.
Energy eficiency in data centres have also been the subject of many papers and studies. Many of the tech advancements and measures have so far been able to keep the growth in energy requirnments by tech sector to a linear/ flat one.
past and projected growth rate of total US data center energy use from 2000 until 2020. It also illustrates how much faster data center energy use would grow if the industry, hypothetically, did not make any further efficiency improvements after 2010. (Source: US Department of Energy, Lawrence Berkeley National Laboratory)
Some noteworthy innovations made in Data centre for energy efficiency include –
Star efficiency requirnments
Average server utilization
Server power scaling at low utilization
Average power draw of hard disk drives
Average power draw of network ports
Average infrastructure efficiency (i.e., PUE)
PUE = Total Facility power / IT equipment power
Standard 2.0, Good 1.4 , Better 1.1
Low PUE indicates greater efficicny since more power would then be used by It gear . Idealistically 1 should be the perfect score where all power was used only by the IT gears.
2. Optimizing the cooling system which takes a lot of focus is also not touched upon here but can be understood in great detail from very many sources including one here on how google uses AI for cooling its Datacentres [6]
3. Throttle-down drive ,a device that reduces energy consumption on idle processors, so that when a server is running at its typical 20% utilization it is not drawing full power
Energy efficiency is vital to not only productivity and performance but also to carbon neutral tech and economy. There is ample scope to designing energy efficient applications and platfroms. Some approaches are described below:
Energy Efficiency in VoIP Architecture and design
Low Energy consumption not only lowers operating cost but also helps the enviornment by reducing carbon emission.
1.ServerVirtualization
By consolidating multiple independant servers to a single underlying physical server helps retain the logical sepration while also maintaining the energy costs and maximizinng utilization . VM’s( Virtual machines) are instances of virtaulized portions on the same server and can be independetly accesed using its own IP and network settings.
To reduce electricity usage in our labs and data centers, we use smart power distribution units to monitor our lab equipment. We increase server utilization by using virtual machines. Our Cisco Customer Experience labs use a check-in, check-out system of automation pods to allow lab employees to set up configurations virtually and then release equipment when they are finished with it.
Cisco 2020 Environment Technical Review [20]
Models to place VMs on PM ( physical machine ) have been proposed by Dong et al[8] , Huang[9] ,Tian et al [10]
2.Decommissioning old / outdated servers
While this is the most obvious way to increase efficiency , it is also the toughest since legacy applications or a small portion of it may be running on a server that service providers are not keen on updating or updates do not exist and it is past end of life yet somehow still in use. It is important to identify such components. Check if maybe an old glassfish or bea weblogic SIP servlet server needs updating and/or migration !
3.Plan HA ( high availability ) efficiently
Redundant servers take only if at all any , partial loads so they can be activated in full swing when failover happens in other server. With quick load up times and forward looking monitoring , the analyzers can monitor logs for upcoming failure or predictable downtime and infra script can bring up pre designed containers in seconds if not minutes. It isn’t wise to create more than 1 standby server which does no essential work but consumes as much power.
4.Consolidate individual applications on a Server
Map the maximum precitable load and deduce the percentage comsuption with teh same . In view of these figures it is best to consolidate applications servers to be run on a single server . A distributed microservice based architecture can also support consolidation by runing each major application in its own dockerized container. Consolidation ensures that
All data can be stored and accessed centrally, which reduces the likelihood of data duplication.
while a server is drawing full power , it is also showing relataible utilization.
Single point to prevent intrusion , provide security and fix vulnerabilities against malware like ( ransomware , viruses , spyware , trojans)
5.Reduce redundancy
While it is a common practise to store multiple copies of data such as CDR ( call detail records ) and archiev historical logs for later auditing , it is not the most energy efficient way since it ends up wasting stoarge space. It is infact a better approach to skim only the crtical parts and diacard the rest and definetely implement background tasks to compress the older and less referenced logs.
6.Power management
Powering down idle server or putting unused server to sleep is an effective way to reduce operating power but is often ignored by the IT department in view of risking slower performance and failure in call continuity in case a server does go down. However power management leads to potential energy savings and should be weighted accordingly.
7.Common Storage such as Network Attached Storage
Power consumption is roughly linear to the number of storage modules used. Storage redundancy needs to be right-sized to avoid rapid consumption of avaible storage space , CPU cycles to refer and index them, its associated power consumption [7].
The process of maximizing storage capacity utilization by drawing from a common pool of shared storage on need baisis also allows for flexixbility.
It is sensible to take the data offline thereby reducing clutter on production system and make the existing data quickly retrievable.
8.Sharing other IT resources
Central Processing Units (CPU), disk drives, and memory optimizes electrical usage. Short term load shifting combined with throttling resources up and down as demand dictates improves long term hardware energy efficiency. [7]
Hardware based approaches such as energy star rating, air conditoning , placement of server racks , air flow , cabling etc have not been touched upon in this article they can be read from energystar report here [5] .
9. DMZ / Perimeter network
The perimeter network (also known as DMZ, demilitarized zone, and screened subnet) is a zone where resources and services accessible from outside the organization are available. Often used as barrier between internal secure green zone within company and outside partners / suppliers such as external organization gateways.
Load balancers
API gateways
SBC ( Session Border controllers)
Media Gateways
Ways to cut down on CPU consumption in DMZ machines
Scrutinize incoming traffic only , trust outgoing traffic .
2. Use hardware / network firewalls to monitor and block instead of software defined ones . Hardware firewall can be a standalone physical device or form part of another device on your network. Physical devices like routers, for example, already have a built-in firewall.
Other types of firewalls
Application-layer firewalls can be a physical appliance, or software-based, like a plug-in or a filter. These types of firewalls target your applications. For example, they could affect how requests for HTTP connections are inspected across each of your applications.
Packet filtering firewalls scrutinize each data packet as it travels through your network. Based on rules you configure, they decide whether to block the specific packet or not. For example firewalls can block SSH/RDP for remote management.
Circuit-level firewalls check whether TCP and UDP connections across your network are valid before data is exchanged. For example, this type of firewall might first check whether the source and destination addresses, the user, the time, and date meet certain defined rules.
Proxy server firewalls secure the traffic into and out of a network by monitor, filter, and cache data requests to and from the network.
Energy Efficiency in VoIP Applications and algorithms
In theory, energy efficient algorithms would take less processing power , run fewer CPU cycles and consume less memory. For the experiments with WebRTC and SIP VoIP systems CPU performance can be reliable factor to consider for carbon emissions . Here is list of approaches to include energy as of the parameters in programing for RTC applications.
Take advanatge of Multi Core applications
Multi-core processor chips allow simultaneous processing of multiple tasks, which leads to higher efficiency. Same power source and shared cooling leads to better efficiency . It is the same logic which applied to consolidating one power supply for a rach isntead of individual power supply to each servers on rack.
2. Reduce Buffering
Input/Output buffer pile up comuted packets or blocks which will come inot use in near future but may be discarded all together in event of skip or shutdown. For example in case of video on Demand ( VoD) , a buffered video of 1 hour is of not much use if viewer decides to cancel the video session after 10 minutes .
3. Optimize memeory access algorithms
4. Network energy Management to vary as per demand
The newer generations of network equipment pack more throughput per unit of power. There are active energy management measures that can also be applied to reduce energy usage as network demand varies. In a telecoomunication system , almost always a tradeof between power consumption and network performance is made.
Quick switching of speed of the network to match the amount of data that is currently transmitted. A demand following streaming session will maingtain the QoS , avoid imbalance while also reducing power consumption.
2. Avoid sudden burst and peaks and/or align them with energy availaibility .
Metrics
computational performance (i.e., computations/second per server),
electrical efficiency of computations (i.e., computations per kWh),
storage capacity (i.e., TB per drive), and
port speeds (i.e., Gb per port)
5. Task Scheduling algorithms
Some recent researched frameworks and models take Co2 emission into prespective , while allocating resources according to queuing model. The most efficient ones not only bring down the carbon footprint but also the high operating cost [11].
Scheduling and monitoring techniques have been applied to achieve a cost effective and power-aware cloud environment by reducing the resource exploitation
Instead of operating many servers at low CPU utilization, at edge of client’s end, combines the processing power onto fewer servers that operate at higher utilization.
Modern machine learning programs are computationally intensive, and their integration in VoIP systems for tagging , sentiment analysis , voice quality analysis is increasingly adding additional strain already heavy processing of media server in transcoding and multiplexing .
Media Server using SFU ( Selective Forwarding unit) to transmit mediastrem
As an example a SFU client sends one upstream but receives 4 downstreams which reduces the load on server but increases on clients .
7. Distributing workload based on server performance
Aggregating tasks and runing them as Serverless , asynchronous jobs instead of standalone processes is very efficient way to cut down idle runing wastage. Additioally catagorizing server workloads based on server performance can also reduce power consumption by using idle servers efficiently. Thermal aware workload distribution also helps reducing power consumption and consequently electricity consumption in cooling .
8 . Reduce reauthetication and challemge response mechanism when it can be avoided.
There exists multiple modes to authenticate and authorize users and application access to server content
Over the network
password based auth ,
third party based auth ( Oauth)
2 factors authetication( phone/sms based) ,
multi factor auth ( sms / email / other media) ,
token auth ( custom USB device/ smart card ) ,
biometric auth (physical human charecteristics / scanners ) ,
transactional auth ( location , hour of day , browser/ machine type)
Computer recognition authentication
CAPTCHA
Single sign-on
Authentication protocols
Kerbos – Key Distribution Center (KDC) using a Ticker gransting Server ( TGS)
TLS/SSL
A callflow involves AAA while creating the session and may require occsional re authetication to reafform the user is intended one. Doing re-authtication too often increases the power consumption and can be countered by caching and timeout mechanism.
Point of presence and handover using Carbon footprint in different demographics
Include Carbon emission from Datacentre in condieration before engaging the server in call path from load balancer gateway
2. Use point of presence ( PoP) for server according to their carbon emission factor in the demography .
Us states carbon emission rate from electricity generation (2018 report ) Source : [16]UK greenhouse gas reporting source : [17]
Energy Efficiency in WebRTC browser applications and native applications
In a Video conferencing the over browser, WebRTC has emerged as te the default standard . The efficiency of sch webrtc browser based video conferencing web applications can be enhanced in the following ways :
1.Use VoIP Push Notifications to Avoid Persistent Connections
2. Voice Activity detection ( Mute the spectators ) and join with video true , audio false for attendeees
Energy efficiency in VoIP phones
If all eligible VoIP phones sold in the United States were ENERGY STAR certified, the energy cost savings would grow to more than $65 million each year and 1.2 billion pounds of annual greenhouse gas emissions would be prevented, equivalent to the emissions from more than 119,000 vehicles.
Energystart [15]
Low-energy-consuming embedded hardware on most phones keep the average consumption low . A analog phone can consume power between 0.07 W to 9.27 W while a VoIP phone can consume 0.1W to 3.5 W of standby power.
Off mode power is often less than standby power since phone is on low power model during idle hours such as night . According to energy star Sund transmission mechnism also plays a key role and hybrid phones consume more power.
Power allowance for each of the below features of the device:
1.0 watt for Gigabit Ethernet
0.2 watt for Energy Efficiency Ethernet 802.3az compliant Gigabit Ethernet
Additional proxy incentive(W) for the ability to maintain network presence while in a low power mode and intelligently wake when needed
0.3 watt for base capability
0.5 watt for remote wake
Government bodies and groups to track Energy efficiency of Telecom and IP telephony
Alliance for Telecommunications Industry Solutions (ATIS)
Telecommunications Energy Efficiency Ratio (TEER)
measurement method covers all power conversion and power distribution from the front end of the system to the data wire plug, including application-specific integrated circuits (ASICs).
European Telecommunications Standards Institute (ETSI)
International Telecommunication Union (ITU)
U.S. Department of Energy (DOE), Environmental Protection Agency (EPA)
The purpose of the article is to raise awareness about carbon footprint from application programs to archietcture designs techniques to data centres and commuulative performance. It gives a direction to stakeholders (customers , programmers , architects , mangers , … ) to choose less carbon emitting approach whenever possible since every bit counts to help the environment.
[4] Center of Expertise for Energy Efficiency in Data Centers at the US Department of Energy’s Lawrence Berkeley National Laboratory in Berkeley, California. https://datacenters.lbl.gov/
[8] Yin K, Wang S, Wang G, Cai Z, Chen Y. Optimizing deployment of VMs in cloud computing environment. In: Proceedings of the 3rd international conference on computer science and network technology. IEEE; 2013. p. 703–06.
[9] Huang W, Li X, Qian Z. An energy efficient virtual machine placement algorithm with balanced resource utilization. In: Proceedings of the seventh IEEE international conference on innovative mobile and internet services in ubiquitous computing; 2013. p. 313–19.
[10] W. Tian, C.S. Yeo, R. Xue, Y. Zhong Power-aware schedulingof real-time virtual machines in cloud data centers considering fixed processing intervalsProc IEEE, 1 (2012), pp. 269-273
[11] H. Chen, X. Zhu, H. Guo, J. Zhu, X. Qin, J. Wu Towards energy-efficient scheduling for real-time tasks under uncertain Cloud computing environmentJ Syst Softw, 99 (2015), pp. 20-35
[14] F. Bota, F. Khuhawar, M. Mellia and M. Meo, “Comparison of energy efficiency in PSTN and VoIP systems,” 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), 2012, pp. 1-4, doi: 10.1145/2208828.2208834. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.720.446&rep=rep1&type=pdf
Programing in SIP servers enables the IP telephony provider to add complex control that is difficult to realise with simple dialplan XML and IVR menus. These are best handled by using a program that is compiled with the telecom application server and invoked by SIP requests or responses in the session. This may include
using policy control or dynamic input to control call routing or blacklisting
transcription for voicemail
media file playback with dynamic text to speech ….so on.
Common Freeswitch , opensips , Kamailio and Astersik suppored programing engines may include python, java, c++, javascript. Opensips and kamailio also include XML_RPC, HTTP API and Websockets as additional means of adding call control login in telephony sever.
Kamailo modules
Opensips modulesFreeswitch modules
Lua (https://www.lua.org) is a small, powerful and lightweight scripting language, mostly used for embedded and gaming use cases. Among many programming engines supported by FreeSWITCH and Kamailio, Lua is very handy to add business logic to call control by integrating with the telecom server.
Form the a multiple choice, Lua is the prefered language for scripting in SIP server which is due to
Does not requie recompilation
Saves on the effort to resatrt the freeswitch server while loading updated script
this in turn saves service disruption for the time server woulve taken to shutdown and restart
Can ve sync or asyn
lua : runs in current thread and waits for script completion
luarun : runs in seprate thread and returns immediately
Freeswitch Lua Integration
To load the program
<action aplication="lua" data="mainprog.lua">
1. In the program, we could get status and print to console log
local api = freeswitch.API()
local status = api:execute("status")
freeswitch.consoleLog(status)
2. we could also check is session is active and play a file inot the call
if session:ready() then
session:streamFile("silence_stream://100000")
end
3.Program to answer call , play file and hangup using session class methods
-- Answer call, play a prompt, hang up
session:answer()
-- Create a string with path and filename of a sound file
pathsep = '/'
-- Windows users do this instead pathsep = ''
prompt ="ivr" ..pathsep .."ivr-welcome_to_freeswitch.wav"
-- Play the prompt
freeswitch.consoleLog("WARNING","About to play '" .. prompt .."'n")
session:streamFile(prompt)
-- Hangup
session:hangup()
freeswitch.consoleLog("WARNING","After hangup")
output
[INFO] mod_dialplan_xml.c:637 Processing altanai <altanai>->5000 in context public
EXECUTE sofia/internal/altanai@x.x.x.x lua(/etc/freeswitch/dialplan/lua_session_answer_prompt_hangup.lua)
...
[DEBUG] switch_channel.c:3781 (sofia/internal/altanai@x.x.x.x) Callstate Change EARLY -> ACTIVE
[WARNING] switch_cpp.cpp:1376 About to play 'ivr/ivr-welcome_to_freeswitch.wav
...
[DEBUG] switch_ivr_play_say.c:1942 done playing file /usr/share/freeswitch/sounds/en/us/callie/ivr/ivr-welcome_to_freeswitch.wav
...
[DEBUG] switch_cpp.cpp:731 CoreSession::hangup
[NOTICE] switch_cpp.cpp:733 Hangup sofia/internal/altanai@x.x.x.x [CS_EXECUTE] [NORMAL_CLEARING]
[WARNING] switch_cpp.cpp:1376 After hangup
other methods :
Initiate new session session:originate()
Record Audio session:recordFile()
5. Fire and consume Events
freeswitch.Event() and freeswitch.eventConsume() can be used to fire new events and consume events respectively. For instance to fire callback function on hangup session:setHangupHook()
Anywhere anytime Telemedicine communication tool accessible on any device. The solution provides a low eight signalling server which drops out as soon as call is connected thus ensuring absolutely private calls without relaying or involving any central server in any call related data or media . This ensure doctor patient details are not processed , stored or recorded by our servers.
The solution enables doctors / nurses / medical practitioners and patients to do
High definition Audio/video calls
End to end encrypted p2p chats
Integration with HMS ( hospital management system ) to fetch history of the patients
Screens sharing to show reports without transferring them as files
Include more concerned people of doctors using Mesh based peer to peer conferencing feature.
Confidentialty and Privacy
For privacy and security of certain health information only HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant video-conferencing tools can only be used for Telemedicine in US.
Telemedicine scenario Callflow
Calllfow for Attended Call Transfer and 2 way conference in a Telemedicine scenario between Patient , hospital attendant , doctor and a nurse
This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.
As a communication agent become a single HTML page driven client, a lot of authentication, heartbeat sync, web workers, signalling event-driven flow management resides on the same page along with the actual CPU consumption for the audio-video resources and media streams processing. This in turn can make the webpage heavy and many a time could result in a crash due to being ” unresponsive”.
Here are some my best to-dos for making sure the webrtc communication client page runs efficiently
CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.
To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .
The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive.
Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.
If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.
Some effective tips to spedding up JS execution include
Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.
While adding cookies we must ensure that if SameSite =None , the cookies must be secure
SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar.
Set-Cookie: promo_shown=1; SameSite=Strict
You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.
Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .
Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App
Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has
Time to Title
Time to render
Time to inetract
Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports
Other page interaction crtiteria includes the frames their inetraction and timings for the same.
In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.
The above functions ( old and new ) estimates the memory usage of the entire web page
these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks. Can also use these memery metrics to do A/B testing .
Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the page light and active and prevent any chrome tab crashes.
The non critical compoenents could then be loaded on async .
Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.
In the course of evolution of RAN ( Radio Access layer) technologies, 5G outsmarts 4G-2010 which comes in succession after 3G-2000, 2.5G, 2G -1990 and 1G/PSTN -1980 respectively. Among the most striking features of 5G are :-
IP based protocols
ability to connect 100x more devices ( IOT favourable )
speed upto 10 Gbit/s
high peak bit rate
high data volume per unit area
virtually 0 latency hence high response time
5G + IMS can accommodate the rapid growth of rich multimedia applications like OTT streaming of HD content, gaming, Augmented reality so on while enabling devices connected to the Internet of Things to onboard the telecommunication backbone with high system spectral efficiency and ubiquitous connectivity.
Infact 5G has seen maximum investment in year 2020 in revamping infrastrcuture as compared to other technologies such as IoT or even Cloud. This could be partly due to high rise in high speed communication for streaming and remote communication owining to steep rise in remote learning adn working from home scenarious.
mid- band spectrum (2.5–10 GHz) – a combination of good coverage and very high bitrates,
high band-spectrum (10–100 GHz) – the bandwidths needed for the highest bitrates (up to 20 Gb/s) and lowest latencies
Workplan for 5G standardisation and release
The Workplan started in 2014 and is ongoing as of now (2018). UPdate
image source : 3GPP “Getting ready for 5G”
3GPP is the standard defining body for telecom and has specified almost all RAN technologies like GSM , GPRS , W-CDMA , UMTS , EDGE , HSPAand LTE before .
SDN separates the virtualized network infrastructure from its logical architecture. which automates configuration for routing, security etc.
It also helps in the management of infrastructure for scaling and availability.
Software-defined Networking (SDN) and Network Functions Virtualization (NFV) are advancing the deployment of 5G systems. The separation of user and control plane are essentially making the system very modular thereby increasing the application to various traffic types
Network Slicing allows mobile operators to partition a single network into multiple virtual networks. This allow network operator to use one physical network to cater to many kinds of service networks with varrying usecases around bandwidth, network latency, processing, resiliency, business requirnments.
Dynamic Network Slicing allows the network resources like radio networks, wire access, core, transport and edge networks to be divided into multiple logical networks to meet requirnments of diverse use cases. [2]
Horizontal Slicing (Infrastructure Sharing)
Vertical Slicing (QoS Slicing)
The virtual infristructure is shared between different tenants for control and operations ( think IaaS)
Virtualization and slicing allow us to create Service Based Architectures ( SBA). This allows control plane and user plane sepration( CUPS). It also allows sepration between access and core network.
The modular function design allows concurrent access to services as well as decoupling of stateless processors and statefull backend ( database).
[2] J. Zhou, W. Zhao and S. Chen, “Dynamic Network Slice Scaling Assisted by Prediction in 5G Network,” in IEEE Access, vol. 8, pp. 133700-133712, 2020, doi: 10.1109/ACCESS.2020.3010623.
GDPR, Europe’s digital privacy legislation passed in 2018, replaces the 1995 EU Data Protection Directive. It is rules designed to give EU citizens more control over their personal data & strengthen privacy rights. It aims to simplify the regulatory environment for business and citizens.
GDPR (General Data Protection Regulation) in European Union 2018,
California Consumer Privacy Act (CCPA) 2019,
Personal Data Protection Bill (PDP) – India 2018 and
also specifications against Robocalls and SPIT ( SPAM over Internet Telephony) among others
Multinational companies will predominantly be regulated by the supervisory authority where they have their “main establishment” or headquarter. However, the issue concerning GDPR is that it not only applies to any organisation operating within the EU, but also to any organisations outside of the EU which offer goods or services to customers or businesses in the EU.
Key Principles of GDPR are
Lawfulness, fairness and transparency
Purpose limitation
Data minimisation
Accuracy
Storage limitation
Integrity and confidentiality (security)
Accountability
GDPR consists of 7 projects (DPO, Impact assessment, Portability, Notification of violations, Consent, Profiling, Certification and Lead authority) that will strengthen the control of personal data throughout the European Union.
Stakeholders
stakeholders of data protection regulation are Data Subject – an individual, a resident of the European Union, whose personal data are to be protected
Data Controller – an institution, business or a person processing the personal data e.g. e-commerce website.
Data Protection Officer – a person appointed by the Data Controller responsible for overseeing data protection practices.
Data Processor – a subject (company, institution) processing a data on behalf of the controller. It can be an online CRM app or company storing data in the cloud.
Data Authority – a public institution monitoring implementation of the regulations in the specific EU member country.
Extra-Territorial Scope
Any VoIP service provider may feel that since they are not based out of EU such as officially headquartered in the Asia Pacific or US region they may not be legally binding to GDPR. However, GDPR expands the territorial and material scope of EU data protection law. It applies to both controllers and processors established in the EU, and those outside the EU, who offer goods or services to or monitor EU data subject.
VoIP service providers as Data Processors
A processor is a “person, public authority, agency or other body which processes personal data on behalf of the controller”. Most VoIP service providers are multinational in nature with services offered directly or indirectly to all regions. The GDPR imposes direct statutory obligations on data processors, which means they will be subject to direct enforcement by supervisory authorities, fines, and compensation claims by data subjects. However, a processor’s liability will be limited to the extent that it has not complied with it’s statutory and contractual obligations.
Data minimization – It is now a good practise to store and process as less user’s personal data as necessary to render our services effectively. Also to maintain data for only a stipulated time ( approx 90 days of CDR for call details and logs )
Record Keeping, Accountability and governance
To show compliance with GDPR, a service provider maintain detailed records of processing activities. Also, they must implement technological and organisational measures to ensure, and be able to demonstrate, that processing is performed in accordance with the GDPR. Some ways to apply these are :
Contracts: putting written contracts in place with organisations that process personal data on your behalf
maintaining documentation of your processing activities
Organisational policies focus on Data protection by design and default – two-factor auth, strong passwords to guard against brute-force, encryption, focus on security in architecture
Risk analysis and impact assessments: for uses of personal data that are likely to result in a high risk to individuals’ interests
Audit by Data protection officer
Clear Codes of conduct
Certifications
As for a VOIP landscape thankfully every call or message session is followed by a CDR ( Calld Detail Record ) or MDR ( Message Detail Record).
Additionally, assign a unique signature to every data-access client the VoIP system and log every read/write operation carried out on data stores whether persistent datastores or system caches.
Privacy Notices to Subjects
User profile data such as :
Basic identity information, name, address and ID numbers
Web data such as location, IP address, cookie data and RFID tags
Health and genetic data
Bio-metric data
Racial or ethnic data
Political opinions
Sexual orientation
is protected strictly under GDPR rules
A service provider should provide indepth information to data subjects when collecting their personal data, to ensure fairness and transparency. They must provide the information in an easily accessible form, using clear and plain language.
Consent
The GDPR introduces a higher bar for relying on consent , requiring clear affirmative action. Silence, pre ticked boxes or inactivity will not be sufficient to constitute consent. Data subjects can withdraw their consent at any time, and it must be easy for them to do so.
Lawful basis for processing Data now include
In Article 6 of the GDPR , there are six available lawful bases for processing.
(a) Consent: the individual has given clear consent for you to process their personal data for a specific purpose.
(b) Contract: the processing is necessary for a contract you have with the individual, or because they have asked you to take specific steps before entering into a contract.
(c) Legal obligation: the processing is necessary for you to comply with the law (not including contractual obligations).
(d) Vital interests: the processing is necessary to protect someone’s life.
(e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law.
(f) Legitimate interests: the processing is necessary for your legitimate interests or the legitimate interests of a third party, unless there is a good reason to protect the individual’s personal data which overrides those legitimate interests.
File such as PCAPS , Recordings and transcripts of calls hold sensitive information from end users , these should be encryoted and inaccssible to even the dev teams within the org without explicit consent of end user .
Individuals’ Rights
The GDPR provides individuals with new and enhanced rights to Data subjects who will have more control over the processing of their personal data. A data subject access request can only be refused if it is manifestly unfounded or excessive, in particular because of its repetitive character.
Rights of Data Subjets include
Right of Access
Right to Rectification
Right to Be Forgotten
Right to Restriction of Processing
Right to Data Portability
Right to Object
Right to Object to Automated Decisionmaking
For a VoIP service provider if a user opts for redaction then none of his calls or messages should be traced in logs . Also replace distinguishable end user identifier such as phone number and sip uri with *** charecters
Provide option for “Account Deletion” and purge account – If a user wished to close his/her account , his/her detaisl should be deleted form the sustem except for the bare bones detaisl which are otherwise required for legal , taxation and accounting requirnments
Breach Notification
A controller is a “person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of processing of personal data”,
A controller will have a mandatory obligation to notify his supervisory authority of a data breach within 72 hours unless the breach is unlikely to result in a risk to the rights of data subjects. Will also have to notify affected data subjects where the breach is likely to result in a “high risk” to their rights. A processor, however, will only be obliged to report data breaches to controllers
International Data Transfers
Data transfers to countries outside the EEA(European Economic Area) continue to be prohibited unless that country ensures an adequate level of protection. The GDPR retains existing transfer mechanisms and provides for additional mechanisms, including approved codes of conduct and certification schemes.
The GDPR prohibits any non-EU court, tribunal or regulator from ordering the disclosure of personal data from EU companies unless it requests such disclosure under an international agreement, such as a mutual legal assistance treaty.
One of the biggest challenges for a service provider is the identification & categorization of GDPR impacted data sets in disparate locations across the enterprise. A dev team must flag tables, attributes and other data objects that are categorically covered under GDPR regulations and then ensure that they are not transferred to a server outside of EU.
In the present age of Virtual shared server instance, cloud computing and VoIP protocol it is operational a very tough task for a communication service provider to ensure that data is not transferred outside of EU such as a VoIP call from origination in US and destination in EU will require information exchanges via SDP, vcard , RTP stream via media proxies etc.
Sanctions
The GDPR provides supervisory authorities with wide-ranging powers to enforce compliance, including the power to impose significant fines. You will face fines of up to €20m or 4% of your total worldwide annual turnover of the preceding financial year. In addition, data subjects can sue you for pecuniary or non-pecuniary damages (i.e. distress). Supervisory authorities will have a discretion as to whether to impose a fine and the level of that fine.
Data Protection officer (DPO)
Under the terms of GDPR, an organisation must appoint a Data Protection Officer (DPO) if it carries out large-scale processing of special categories of data, carries out large scale monitoring of individuals such as behaviour tracking or is a public authority.
With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution
This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.
Endpoints communicating over unicast. RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session.
Advantages of P2p
Disadvantages of p2p
(+) Facilitates private communication between the parties
(-) Only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports
Same as above but with a middle-box involved. Middle Box type are :
Translator
Mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion. This does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.
Subtypes of Multibox :
Transport/Relay Anchoring
Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server
Middleboxes for auditing or privacy control of participant’s IP
Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup
Transport translator
interconnecting networks like multicast to unicast
media packetization to allow other media to connect to the session like non-RTP protocols
Media translator
Modifies the media inside of RTP streams commonly known as transcoding.
It can do up to full encoding/decoding of RTP streams. In many cases it can also act on behalf of non-RTP supported endpoints, receiving and responding to feedback reports ad performing FEC ( forward error corrected )
Back-To-Back RTP Session
Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions.
Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs
Advantages of Back-To-Back RTP Session
Disadvantages of Back-To-Back RTP Session
(+) B2BUA / media bridge take responsibility tpo relay and manages congestion
(-) It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations
Some more variants of this topology are Point to Multipoint with Mixer
Media Mixing Mixer
receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through
static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.
Media Switching Mixer
RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.
The Mixer can reduce bitrate or switch between sources like active speakers.
Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.
Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.
Advantges of SFU
Disadvatages of SFU
(+) Low lanetncy and low jitter buffer requirnment by avoiding re enconding (+) saves on encoding decoding CPU utilization at server
(-) unable to manage network and control bitrate (-) creates higher load on receiver when compared with MCU
On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.
Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.
There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan
Hybrid model of forwarding and mixed streamings
Some endpoints receive forwarded streams while others receive mixed/composited streams.
Serverless models
Centralized topology in which one endpoint serves as an MCU or SFU.
Used by Jitsi and Skype
Point to Multipoint Using Video-Switching MCUs
Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.
This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP
To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains
Cascaded SFUs
SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources
Before getting into an in-depth discussion of all possible types of Media Architectures in VoIP systems, let us learn about TCP vs UDP.
TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.
Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction is used due to its tunnelling methods where packets of other protocols are encapsulated inside of UDP packet. However to provide E2E security other methods for Auth and encryption are used.
A Call session produces various traces for offtime monitoring and analysis which can include
CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc
Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .
Media Stats – jitter , buffer , RTT , MOS for all legs and avg values
Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.
Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.
Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.
SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.
To read more about buildinga scalable VoIP Server Side architecture and
Clustering the Servers with common cache for High availiability and prompt failure recovery
Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
Scalable and Flexible platform. Let’s go in-depth to discuss how can one go about achieving scalability in SIP platforms. ulti geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, uted Event management and Event Driven architecture , Containerization, autoscaling , security , policies and market differentiator, ticketing and issue tracking.
Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session. The list codecs are sent between each other as part of offeer and answer or SDP in SIP.
As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs
RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.
RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for VP8 and H.264’s Constrained Baseline profile for video .
In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.
WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .
WebRTC Browsers MUST implement the VP8 video codec as described in RFC6386 and H.264 Constrained Baseline described in RFC 7442.
Most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding. Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :
AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC.
propertiary, patented codec, mianted by MPEG / ITU
Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos
Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.
Other profiles , which are not supporedt are Baseline(BP), Extended(XP), Main(MP) , High(HiP) , Progressive High(ProHiP) , High 10(Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive
supported containers are 3GP, MP4, WebM
Parameter settings:
packetization-mode
max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )
Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.
suited for low bandwidth networks
(-) not comaptible with WebRTC
but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them
H.265 / HEVC
proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .
Container – Mp4
Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints
With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.
The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8
Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints
Opus is a lossy audio compression format developed by the Internet Engineering Task Force (IETF) targeting a broad range of interactive real-time applications over the Internet, from speech to music and supportes multiple compression algorithms
Constant and variable bitrate encoding – 6 kbit/s to 510 kbit/s
frame sizes – 2.5 ms to 60 ms
sampling rates – 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).
container- Ogg, WebM, MPEG-TS, MP4
As an open format standardized through RFC 6716, a reference implementation is provided under the 3-clause BSD license. All known software patents which cover Opus are licensed under royalty-free terms.
(+ ) flexible, suited for speech ( by SILK) and music ( CELT)
(+) support for mono and stereo
(+) inbuild FEC( Forward Error Correction) thus resilient to packet loss
(+) compression adjustability\ for unpredictable networks
(-) Highly CPU intensive ( unsuitable for embedded devices like rpi)
(-) processing and memory intensive
For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommends that Opus be offered before PCMA/PCMU.
AAC (Advanvced Audio Encoding)
part of the MPEG-4 (H.264) standard. Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.
supported containers – MP4, ADTS, 3GP
G.711 (PCMA and PCMU)
G.711 is an ITU standard (1972) for audio compression. It is primarily used in telephony.
ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding. vital to interface with the standard telecom network and carriers. G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU
It is the required standard in many voice-based systems and technologies, for example in H.320 and H.323 specifications.
Fixed 64Kbpd bit rate
supports 3GP container formats
G.722
ITU standard (1988) Encoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression
7 kHz Wideband audio codec operating
Bitrate 48, 56 and 64 kbit/s.
containers used 3GP, AMR-WB
G722 improved speech quality due to a wider speech bandwidth of up to 50-7000 Hz compared to G.711 of 300–3400 Hz.
Comfort noise (CN)
artificial background noise which is used to fill gaps in a transmission instead of using pure silence. It prevents – jarring or RTP Timeout.
Should be used for streams encoded with G.711 or any other supported codec that does not provide its own CN. Use of Discontinuous Transmission (DTX) / CN by senders is optional
Internet Low Bitrate Codec (iLBC)
A opensource narrowband speech codec for VoIP and streaming audio.
8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
Defined by IETF RFCs 3951 and 3952.
Internet Speech Audio Codec (iSAC)
iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. It is designed for voice transmissions which are encapsulated within an RTP stream.
16 kHz or 32 kHz sampling frequency
adaptive and variable bit rate of 12 to 52 kbps.
Speex
patent-free audio compression format designed for speech and also a free software speech codec that is used in VoIP applications and podcasts. May be obsolete, with Opus as its official successor.
AMR-WB Adaptive Multi-rate Wideband is a patented wideband speech coding standard that provides improved speech quality. This is codec is generally available on mobile phones.
wider speech bandwidth of 50–7000 Hz.
data rate is between 6-12 kbit/s, and the
DTMF and ‘audio/telephone-event’ media type
endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.
Major standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling proMajor standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling protocol for services such as LTE, VoIP, conferencing, Video on Demand (VoD), IPTV (Internet Television), presence, and Instant Messaging (IM) etc. With the continuous evolution of SIP as the defacto VoIP protocol, we need to understand the risk mitigation practices around it.
malicious registrations on registrar by a third party who modifies From header field of a SIP request.
exmaple implementation : attacker de-registers all existing contacts for a URI attacker can also register their own device as the appropriate contact address, thereby directing all requests for the affected user to him
attacker impersonates the remote server user’s request can now be intercepted by some other party user’s request may be forwarded to insecure locations
Solution : confidentiality, integrity, and authentication of proxy servers
Proxy/redirect sever, and registrars SHOULD possess a site certificate issued by CA which could be validated by UA
If users are relying on SIP message bodies to communicate either of
session encryption keys for a media session
MIME bodies
SDP
encapsulated telephony signals Then the atackers on proxy server can modify the session key or can act as a man-in-the-middle and do eaves droppng
exmaple implementation : attacker can point RTP media streams to a wiretapping device can changes Subject header field to appear to users as spam
solution – end to end ecryption over TLS + Digest Authorization
Mid-session threats like tearing down session
Request forging– attacker learns the params of the session like To , From tags etc then he can alter ongoing session parameters and even bring it down
example implementation : attacker inserts a BYE in a ongoing session thereby tearing it down can insert re INVITE and redierct the stream to wiretaping device
solution – authetication on every request signing and encrypting of MIME bodies, and transference of credentials with S/MIME
DOS – rendering a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces. dDOS – multiple network hosts to flood a target host with a large amount of network traffic.
Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion. Some examples of implementation of DOS attacks :
Attackers creates a falsified source IP address and a corresponding Via header field that identify a targeted host as the originator of the request. Then send this to large number of SIP network element. This geneerates DOS aimed at target.
Attackers uses falsified Route header field values in a request that identify the target host and then send such messages to forking proxies that will amplify messaging sent to the target.
Flooding with register attacks can deplete available memory and disk resources of a registrar by registering huge numbers of bindings.
Flooding a stateful proxy server causes it to consume computational expense associated with processing a SIP transaction
Solution – detect flooding and pike in traffic and use ipban to block challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required), forgoing the normal response retransmission algorithm, and thus behaving statelessly towards unauthenticated requests.
Security mechanisms
Full encryption vs hop by hop encrption
SIP mssages cannot be encrypted end-to-end in their entirety since message fields such as the Request-URI, Route, and Via need to be visible to proxies in most network architectures so that SIP requests are routed correctly. proxy servers need to also update the message with via headers
Thus SIP uses low level security along with hop by hop encrption and auth headers to verify the identity of proxy servers
Transport and Network Layer Security
IPsec – used where set of hosts or administrative domains have an existing trust relationship with one another.
TLS – used where hop-by-hop security is required between hosts with no pre-existing trust association.
SIPS URI Scheme
Used as an address-of-record for a particular user, signifies that each hop over which the request is forwarded, must be secured with TLS
HTTP Authentication
Reuse of the HTTP Digest authentication via 401 and 407 response codes that implement challenge for autehtication provides replay protection and one-way authentication.
S/MIME
allows SIP UAs to encrypt MIME bodies within SIP, securing these bodies end-to-end without affecting message headers. provides end-to-end confidentiality and integrity for message bodies
nonce-count
provides replay protection
SIP over TLS
SIP messages can be secured using TLS. There is also TLS for Datagrams called DTLS.
Security of SIP signalling is different from security of protocols used in concert with SIP like RTP , RTCP. and that will be covered in later topics of this article.
TLS operation consists of two phases: handshake phase and bulk data encryption phase
Handshake phase
Prepare algorithm to be used during TLS session
Server Authentication
server sends its certificate to the client, which then verifies the certificate using a certificate authority’s (CA’s) public key.
Client Authentication
Server sends an additional CertificateRequest message to request the client’s certificate. The client responds with
Certificate message containing the client certificate with the client public key and
CertificateVerify message containing a digest signature of the handshake messages signed by clients private key
Server authenticates client by client’s public key , since only client holding correct private key can sign the message.
Prepare the shared secret for bulk data encryption
client generate a pre_master_secret, and encrypt it using the server’s public key obtained from the server’s certificate. The server decrypts the pre_master_secret using its own private key. Both the server and client then compute a master_secret they share based on the same pre_master_secret. The master_secret is further used to generate the shared symmetric keys for bulk data encryption and message authentication
Public key cryptographic operations such as RSA are much more expensive than shared key cryptography. This is why TLS uses public key cryptography to establish the shared secret key in the handshake phase, and then uses symmetric key cryptography with the negotiated shared secret as the data encryption key.
Stateless proxy servers do not maintain state information about the SIP session and therefore tend to be more scalable. However, many standard application functionalities, such as authentication, authorization, accounting, and call forking require the proxy server to operate in a stateful mode by keeping different levels of session state information.
Steps :
The SIP proxy server enforces proxy authentication with 407 Proxy Authentication Required challenge.
UAC provides credentials that verify its claimed identity (e.g., based on MD5 [34] digest algorithm) and retransmits in authorization header
Security of RTP
confidentiality protection of the RTP session and integrity protection of the RTP/RTCP packets requires source authentication of all the packets to ensure no man-in-the-middle (MITM) attack is taking place.
end to end media encryption – SRTP ( Secure RTP )
encodes the voice into encrypted IP packages and transport those via the internet from the transmitter to receive
References
The Impact of TLS on SIP Server Performance – Charles Shen† Erich Nahum‡ Henning Schulzrinne† Charles Wright , Department of Computer Science, Columbia University,IBM T.J. Watson Research Center
I have written about VoIP and security in these blogs before
For security around web browser-based calling via webrtc, the articles below discuss security practices in general
Webrtc Security , which describes browser threat modal , access to local resource , Same Orogin Policy (SOP) and Cross Resource Sharing ( CORS) as well as Location sharing , ICE , TUEN and threats to privacy with screen sharing , microgone camera long term access and probable mid call attacks .
Genric secrutity of web Application build around hosting platform of webrtc. Includs concepts like Identity management , browser security – cross site security amd clickjacking , Authetication of devices and applications , Media Encryption and regex checking.
Secure Communication – https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/ which discusses the Key management protocols used for establishing end to end encryption in VoIP media streams. It describes Sanity checks , ACL lists with permissions , hiding topology details , countering Flood using pike and Fail2Ban as well as Traffic monitoring and detection .
HTTP ( Hyper Text Transfer Protocol ) is the top application layer protocol atop the Tarnsport layer ( TCP ) and the Network layer ( IP ).
HTTP/1.1
HTTP/1.1 was released in 1997. HTTP/1 allowed only 1 req at a time. But HTTP/1.1 allows one one outstanding connection on a TCP session but allowed request pieplinig to achieve concurency.
HTTP/2
HTTP/2 was released in 2015, it aimed at reducing latency while delivering heavy graphics, videos and other media cpmponents on web page especially on mobile sites . optimizes server push and service workers.
Features
Header compression (HPACK)
reuse connection TCP connection
All frames (e.g. HEADERS, DATA, etc) are sent over single TCP connection
Binary framing layer
Prioritization
Flow control
Server push
Request → Stream
Streams multiplexed
Streams prioritized
(+) low latency / iproves end user perceived latency
(+) retain semantics of HTTP1.1
FRAMES
A key differenet between HTTP/1.1 and HTTP/2 is the fact that former transmites requests and reponses in plaintext whereas the later encapsulates them into binary format, proving more features and scope for optimzation. Thus at protocol level, it is all about frames of bytes which are part of stream.
HTTP messages are decomposed into one or more frames
HEADERS for meta-data (9-byte, length prefixed)
DATA for payload
RST_STREAM to cancel
…..
“enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.”
It is important to know that Browsers only implement HTTP/2 under HTTPS, thus TLS connection is must for whichw e need certs ad keys signed by CA ( either self signed using openssl , signed by public CA like godaddy , verisign or letsencrypt).
Data Flow
DATA frames are subject to per-stream and connection flow control
Flow control allows the client to pause stream delivery, and resume it later
Compatibility Layer between HTTP1.1 and HTTP2.0 in node
Nodejs >9 provides http2 as native module. Exmaple of using http2 with compatibility layer
const http2 = require('http2');
const options = {
key: 'ss/key', // path to key
cert: 'ssl/cert' // path to cert
};
const server = http2.createSecureServer(options, (req, res) => {
req.addListener('end', function () {
file.serve(req, res);
}).resume();
});
server.listen(8084);
in replacement for existing server http/https server
const https = require('https');
app = https.createServer(options, function (request, response) {
request.addListener('end', function () {
file.serve(request, response);
}).resume();
});
app.listen(8084);
Socket.io/ Websocket over HTTP2
The WebSocket Protocol uses the HTTP/1.1 Upgrade mechanism to transition a TCP connection from HTTP into a WebSocket connection
Due to its multiplexing nature, HTTP/2 does not allow connection-wide header fields or status codes, such as the Upgrade and Connection request-header fields or the 101 (Switching Protocols) response code. These are all required for opening handshake.
Ideally the code shouldve looekd like this with backward compatiability layer , but continue reading update ..
var app = http2
.createSecureServer(options, (req, res) => {
req.addListener('end', function () {
file.serve(req, res);
}).resume();
})
.listen(properties.http2Port);
var io = require('socket.io').listen(app);
io.origins('*:*');
io.on('connection', onConnection); // evenet handler onconnection
Error during WebSocket handshake: Unexpected response code: 403
update May 2020 : I tried using the http2 server with websocket like mentioned above ,h owever many many hours of working around WSS over HTTP2 secure server , I consistencly kept faccing the ECONNRESET issues after couple of seconds , which would crash the server
client 403server ECONNRESET
Therefore leaving the web server to server htmll conetnt I reverted the siganlling back to HTTPs/1.1 given the reasons for sticking with WSS is low latency and existing work that was already put in.
Reading Further of exploring HTTP CONNECT methods for setting WS handshake . Will update this section in future if it works .
Streams
A “stream” is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection. A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.
Core http2 module provides new core API (Http2Stream), accessed via a “stream” listener:
With the new binary framing mechanism in place, HTTP/2 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP/2 connections are persistent, and only one connection per origin is required,
Server Push
bundling multiple assets and resources into a single HTTP/2 and lets the srever proactively push resources to client’s cache .
Server issues PUSH_PROMISE , client validates whether it needs the resource of not. If the client matches it then they will load like regular GET call
The PUSH_PROMISE frame includes a header block that contains a complete set of request header fields that the server attributes to the request.
After sending the PUSH_PROMISE frame, the server can begin delivering the pushed response as a response on a server-initiated stream that uses the promised stream identifier.
Client receives a PUSH_PROMISE frame and can either chooses to accept the pushed response or if it does not wish to receive the pushed response from the server it can can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream’s identifier.
Push Stream Support
-tbd
respondWithFile() and respondWithFD() APIs can send raw file data that bypasses the Streams API.
Related technologies
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.
Email messages + MIME : transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).
MIME in HTTP in WWW : servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.
MIME header fields
MIME version
MIME-Version: 1.0
Content Type
Content-Type: text/plain
multipart/mixed , text/html, image/jpeg, audio/mp3, video/mp4, and application/msword
content disposition
Content-Disposition: attachment; filename=genome.jpeg;
modification-date="Wed, 12 Feb 1997 16:29:51 -0500";