High availiability and Scalibility in VoIP platforms

Load Balancers
- MPLS
Service-discovery
- Keepalive, unregistering unhealthy nodes
Replication
- Data Store Replication
Quick Response / Low latency
Scalability
- autoscalling
- Partitioining
Multiple PoPs (point of presence)
- Minimal Latency and lowest amount of tarffic via public internet
High availiability (HA)
- 5 9’s in aggregate failures
- HA for Load balancer (LB)
- HA for Call Control app server
- Media Server HA
Security against malicious attacks
- MITM
- DDOS
Identifying outages , logs and pcap analysis + alerting
Bottlenecks
- Performance testing
- Robust QA for potential bottlenecks before going live
Distributed Data Store
- message-queues-vs-batch-processing
Distributed-event-management-and-event-driven-architecture-using-streams
Distributed cache for call control Servers
Circuits – fail fast , wait for circuit to recover before calling again

Load Balancers

Load Balancer(LB) is the initial point of interaction between the client application and the core system. It is pivotal in the distribution of the load across multiple servers and ensuring the client is connected to the nearest VoIP/SIP application server to minimize latency. However, the load balancers are also susceptible to security breaches and DOS attacks as they have a public-facing interface. This section lists the protocols, types and algorithms used popularly in Load balancers of VoIP systems.

software LB	Layer 4 / hardware LB
Nginx Amazon ELB ( eleastic load balanecr)	F5 BIG-IP load balancer CISCO system catalyst Barracuda load balancer NetScaler
used by applications in cloud ADN (Application delivery network)	used by network address translators (NATs) DNS load balancing

examples and roles of software and hardware based load balancers

Load Balancers(LB) ping each server for health status and greylists servers that are unhealthy( respond late) as they may be overloaded or experiencing congestion. The LB monitors it rechecks after a while and if a server is healthy ( ie if a server responds with responds with status update) it can resume sending traffic to it. LB should also be distributed to different data centres in primary-secondary setup for HA.

Networking protocol

TCP Loadbalancer	HTTP load balancers	SIP based LB as Kamailio/ Opensips
can forwrad the packet without inspecting the content of the packet.	terminate the connection and look inside the request to make a load balcing decsiion for exmaple by using a cookie or a header.	domain specific to VoIP
(+) fast, can handle million of req per second		(+) handle SIP routing based on SIP headers and prevent flooding atacks and other malicious malformed packets from reaching application server

Load balancing algorithms

Weighted Scheduling Algorithm
Round Robin Algorithm
Least Connection First Scheduling
Lest response time algorithms
Hash based algorithm ( send req based on hashed value such as suing IP address of request URL)

Loadbalancer	Reverse Proxy
forward proxy server allows multiple clients to route traffic to an external server	accepts clients requestd for server and also returns the server’s response to the client ie routes traffic on behalf of multiple servers.
Balances load and incoming traffic endpoint	public facing endpoint for outgoing traffic additional level of abstraction and security, compression
	used in SBC (session border controllers) and gateways

Service Discovery

Client-side or even backend service discovery uses a broadcasting or heartbeat mechanism to keep track of active servers and deactivates unresponsive or failed servers. This process of maintaining active servers helps in faster connection time. Some approaches to Service Discovery

Mesh
1. (-) exponentially incresing network traffic
Gossip
Distributed cache
Coordination service with Service
- (-) requesres coordination service for leader selection
- (-) needs consensus
- (-) RAFT and pbFT for mnaging failures
Random leader selection
- (+) quicker
- (-) may not gurantee one leader
- (-)split brain problem

Keepalive, unregistering unhealthy nodes

Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint.

Replication

Usuallay there is a tradeoff between liveness and safety.

Single leader replication
- (-) vulnerable to loss of data is leader goes down before replication completes
- used to in sql
multileader replication and
leaderless replication
- (-) increases latencies
- (-) quorem based on majority , cannot function is majority node are not down
- used in cassandra

Data Store Replication

For Relatonal Dataabase

For NoSQL databse replication and HA

Quick Response / Low latency

Message format

Textual Message format	Binary Message Format
human readbale like json xml	diff to comprehend , need shared schema between sender and receiver to serilaize and deserialze ,
names for every field adds to size	no field name or only tags , reduces message size

Gateways for faster routing and caching to services

gateways are single entry point to route user requests to backend services .

Separate hot storage from cold storage

hot storage is frquently accessed data which must be near to server

cold storage is less frequently accessed data such as archives

object storage
slow access

Scalability

To make a system :-

scalable : use partitioing
reliable : use replication and checkpointing to not loose data in failures
fast : use in -memory usage

According to CAP theorem Consistency and Availiability are difficult to achieve together and there has to be a tradeof acc to requirnments.

Partitioining

Partition strategy can be based on various ways such as :-

Name based partition
geographic partition
names’ hashed value based on identifier
- (-) can lead to hot partitions ( high density in areas of freq accessible identioers )
- (-) high density spots for example all messages with a null key to go to the same partition
- (-) doesnt scale
event time based hash
- (+) data is spread evenly over time

To create a well distributed partition we could spread hot partition into 2 partitions or dedicate partitions for freq accessible items. An effective partitioning keys uses

Cardinality : total num of unique keys for a usecase. High cardinality leads to better distribution.
- high cardinatility keys : names , email address , url since they have high variatioln
- low cardinatlity keys : boolean flags such as gender M/F
Selectivity : number of message with each key. High selectivity leads to hotspots and hence low selectivity is better for even distribution.

Autoscalling

Scale Out not Up !

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management used in DevOps. I have mentioned this in detail on the article on VoIP and DevOps below.

VoIP system DevOps, operations and Infrastructure management, Automation

Multiple PoPs (point of presence)

for a VOIP system catering to many clients accross the globe or accessing multiple carriers meant for different counteries based on Prefix matching , there should be alocal PoP in most used regions . typically these regions include – US east – west coasts, UK – germany of London , Asia Pacific – Mumbai ,Hong Kong and Australia.

Minimal Latency and lowest amount of tarffic via public internet

Creating multiple POPs and enabling private traffic via VPN in between them ensures that we use the backbone of our cloud provider such as AWS or datacentre instead of traversing via public internet which is slower and more insecure. Hopping on a private interface between the cloud server and maintaining a private connection and keepalive between them helps optimize the traffic flow while keeping the RTT and latency low.

HA ( High-availability )

Some factors affecting Dependability are

Eventual Consistency
MultiRegion failover
Disaster Recovery

A high-availability (HA) architecture implies Dependability.Usually via existence of redundant applications servers for backups: a primary and a standby. These applications are configured so that if primary fails, the other can take over its operations without significant loss of data or impact to business operations.

Downtime / SLA of 5 9’s in aggregate failures

4 9’s of availiability on each service components gives a downtime of 53 mins per service each year. However in aggregate failure this could amlount to (99.99)¹⁰ = 99.9 downtime which is 8-10 hours each year.

Thus, aggregate failure should be taken into consideration while designing reliable systems.

HA for Proxy / Load balancer (LB)

A LB is the first point of contact for outbound calls and usually does not save the dialogue information into memory or database but still contain the transaction information in memory. In case the LB crashes and has to restart, it should

have a quick uptime
be able to handle in dialogue requests
handle new incoming dialogue requests in a stateless manner
verify auth/authorization details from requests even after restart

HA for Call Control app server

App server is where all the business logic for call flow management resides and it maintains the dialog information in memory.

Issues with in-memory call states : If the VM or server hosting the call control app server is down or disconnected, then live calls are affected, this, in turn, causes revenue loss. Primarily since the state variable holding the call duration would be able to pass onto the CDR/ billing service upon the termination of the call. For long-distance, multi telco endpoint calls running hours this could be a significant loss.

Standby app server configuration and shared memory : If the primary app server crashes the standby app server should be ready to take its place and reads the dialog states from the shared memory.
Live load balanced secondary app server + external cache for state varaibles : External cache for state variables: a cluster of master-slave caches like Redis is a good way of maintaining the dialogue state and reading from it once the app server recovers from a failed state or when a secondary server figures it has a missing variable in local memory.

Media Server HA

Assuming the kamailio-RTPengine duo as App server and Media Server. These components can reside in same or different VMs. Incase of media server crash, during the process of restoring restarted RTpengine or assigning a secondary backup RTpengine , it should load the state of all live calls without dropping any and causing loss of revenue . This is achived by

external cache such as Redis ,
quick switchover from primary to secondary/fallback media server and
floating IPs for media servers that ensures call continuity inspite of failure on active media server.

Architecturally it looks the same as fig above on HA for the SIP app server.

Security against malicious attacks

Attacks and security compromisation pose a very signficant threat to a VoIP platform.

MITM attacks

Man in midddle attacks can be counetred by

End to end encryption of media using SRTP and signals using TLS
Strong SIP auth mechanism using challenges and creds where password is composed of mixed alphanumeric charecters and atleast 12 digits long
Authorization / whitelisting based on IP which adheres to CIDR notation

DDOS attacks

DDOS renders a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.

dDOS – multiple network hosts to flood a target host with a large amount of network traffic. Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

Can be counetred by

detect flooding and q in traffic and use Fail2ban to block
challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required)

Read about SIP security practices in deatils https://telecom.altanai.com/2020/04/12/sip-security/

Attacks on SIP Networks

Other important factors leading to security

Keystores and certificate expiry tracker
priveligges and roles
Test cases and code coverage
Reviewers approval before code merge
Window for QA setup and testing , to give go ahead before deployment

Identifying outages and Alerting

Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion .

Logging and Alerting for a VoIP CPaaS platform .
Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion.

Sources for alert manager

Build failed ( code crashes, Jenkins error)
Deployment failed ( from Kubernetes , codechef, docker ..)
configuration errors ( setting VPN etc )
Server logs
Server health
homer alerts ( SIP calls responses 4xx,5xx,6xx)
PCAP alerts ( Malformed SIP SDP ..)
Internal Smoke test ( auto testing procedure done routinely to check live systems )
Support tickets from customer complaints ( treat these as high priority since they are directly impacting customers)

Bottlenecks

The test bed and QA framework play a very crticial role in final product’s credibility and quality.

Performance Testing

Stress Testing : take to breaking
Load Testing : 2x to 3x testing
Soak Testing : typical network load to long time ( identify leaks )

Robust QA framework( stress and monkey testing) to identify potential bottlenecks before going live

A QA framework basically validates the services and callflows on staging envrionment before pushing changes to production. Any architectural changes should especially be validated throughly on staginng QA framework befire making the cut. The qualities of an efficient QA platform are :

Genric nature – QA framework should be adatable to different envrionments such as dev , staging , prod

Containerized – it should be easy to spn the QA env to do large scale or small scale testing and hence it should be dockerized

CICD Integration and Automation – integrate the testcases tightly with gt post push and pull request creation . Minimal Latency and lowest amount of tarffic via public internet

Keep as less external dependecies as possible for exmaple a telecom carrier can be simulated by using an PBX like freeswitch or asterix

Asynchronous Run – Test cases should be able to run asynchronously. Such as seprate sipp xml script for reach usecase

Sample Testcases for VoIP

Authentication before establish a session
Balance and account check before establishing a session like whitelisting , blacklisting , restricted permission in a particular geography
Transport security and adaptibility checks , TLS , UDP , TCP
codec support validation
DTMF and detection
Cross checking CDR values with actual call initiator and terminator party
cross checking call uuid and stats
Validating for media and related timeouts

QA frameworks tools – Robot framework

traffic monitor – VOIP monitor

customer simulator – sipP scripts

network traffic analyser – wireshark

pcap collevcter – tcpdump , sngrep

Distributed Data Store

A Distributed Database Design could have many components. It could work on static datastore like

SQL DB where schema is important
- MySQL
- postgress
- Spanner – Globally-distributed database from Google
NoSQL DB for to store records in json
- Cassandra – Distributed column-oriented database
Cache for low latency retrivals
- Memcached – Distributed memory caching system
- Redis – Distributed memory caching system with persistence and value types
Data lakes for heavy sized data
- AWS s3 object storage
- blob storage
File System
- Google File System (GFS) – Distributed file system
- Hadoop File System (HDFS) – Open source Distributed file system

or work on realtime data streams

Batch processing ( Hadoop Mapreduce)
Stream processing ( Kafka + spark)
- Kafka – Pub/sub message queue
Cloud native stream processing ( kinesis)

Each component has its own pros and cons. The choice depends on requirnments and scope for system behaviour like

users/customer usuage and expectation ,
Scale ( read and write )
Performnace
Cost

Users/customers	Scale ( read / write)	Performance	Cost
Who uses the system ? How the system will be used?	Read / writes per second ? Size of data per request ? cps ( calls or click per second) ?	write to read delay ? p99 latency for read querries ?	should design minimize the cost of development ? should design mikn ize the cost of mantainance ?
	spikes in traffic	eventual consistency ( prefer quick stale data ) as compared to no data at all
	redundancy for failure management

Some fundamental constrains while design distributed data structure :-

p99 latency : 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.

Request latency:
    min: 0.1
    max: 7.2
    median: 0.2
    p95: 0.5
    p99: 1.3

Inidiviual Events vs Aggregate Data

Inidividual Events ( like every click or every call metric)	Aggregate Data ( clicks per minute, outgoing calls per minute)
(+) fast write (+) can customize/ recalculate data from raw	(+) faster reads (+) data is fready for decision making / statistics
(-) slow reads (-) costlier for large scale implementations ( many events )	(-) can only query in the data as was aggregates ( no raw ) (-) requires data aggregation pipeline (-) hard to fix errors
suitable for realtime / data on fly low expected data delay ( minutes )	suitable for batch processing in background where delay is acceptable from mintes to hours

Push vs Pull Architecture

Push : A processing server manages state of varaible in memory and pushes them to data store.

(-) crashed processingserver means all data is lost

Pull : A temporary data strcyture such as a queue manages the stream of data and processing service pull from it to process before pusging to data stoore.

(+) a crashed server has to effect on temporarily queue held data and new server can simply take on where previous processing server left.
(+) can use checkpointing

Popular DB storage technologies

SQL	NoSQL
Structured and Strict schema Relational data with joins	Semi-structured data Dynamic or flexible schema
(+) faster lookup by index	(-) data intensive workload (+) high throughput for IOPS (Input/output operations per second )
used for Account information transactions	best suitable for Rapid ingest of clickstream and log data Leaderboard or scoring data Metadata/lookup tables
	DynamoDB – Document-oriented database from Amazon MongoDB – Document-oriented database

A NoSQL databse can be of type

Quorem
Document
Key value
Graph

Cassandra is wide column supports asyn master less replication

Hinge base also a quorem based db also has master based preplication

MongoDB documente orientd DB used leacder based replication

SQL scaling patterns include:

Federation/ federated database system : transparently maps multiple autonomous database systems into a single virtual/federated database.
- (-) slow since it access multiple data storages to get the value
Sharding / horizontal partition
Denormalization : Even though normalization is more memory efficient denormalization can enhance read performance by additing redundant pre computed data in db or grouping related data.
- Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.

SQL Tuning : “iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals”

Influx DB : to store time series data

AWS Redshift

Apache Hadoop

Redis

Embeed Data : RocksDB

Message Queues(Buffering) vs Batch Processing

Distributed event management, monitoring and working on incoming realtime data instead of stored Database is the preferred way to churn realtime analysis and updates. The multiple ways to handle incoming data are

Batch processing – has lags to produce results, not time crtical
Data stream – realtime response
Message Queues – ensures timely sequence and order

Buffering	Batching
Add events to buffer that can be read	Add events to batch and send when batch is full
(+) can handle each event	(+) cost effective (+) ensures throughput (-) if some events in batch fail should whole batch fail ? (-) not suited for real time processing
	S3 like objects storage + Hadoop Mapreduce for processing

Timeout

Connection timeout : use latency percentiles to calculate this
Request timeout

Retries

exponential backoff : increase waiting time each try
jitter : adds rabdomness to retry intervals to spread out the load.

Grouping events into object storage and Message Brokers

slower than stream processing but faster than batch processing.

Distributed Event management and Event Driven architecture using streams

In event driven archietcture a produce components performs and action which creates an event thata consumer/listener would subscribes to consume.

(+) time sensitive
(+)Asynch
(+) Decoupled
(+) Easy scaling and Elasticity
(+) Heterogeneous
(+) contginious

Expanding the stream pipeline

Event Streams decouple the source and sink applications. The event source and event sinks (such as webhooks) can asynchronously communicate with each other through events.

Options for stream processing architectures

Apache Kafka
Apache Spark
Amazon kinesis
Google Cloud Data Flow
Spring Cloud Data Flow

Here is a post from earlier which discusses – Scalable and Flexible SIP platform building, Multi geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Failure Recovery, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, Load Balancer / Message Dispatcher, Back end Dynamic Routing and REST API services, Containerization and Auto Deployment, Auto scaling Cloud Servers using containerized images.

VoIP/ OTT / Telecom Solution startup’s strategy for building a scalable flexible SIP platform

Lambda Architecture

Stream processing on top of map reduce and stream processing engine. In lambda architecture we can send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Lambda Archietcture : stream processing on top of map reduce and stream processing engine. Send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Apache Kafka is used as source which is a framework implementation of a software bus using stream-processing. “.. high-throughput, low-latency platform for handling real-time data feeds”.

Apache Spark : Data partitioning and in memory aggregation.

Distributed cache for call control Servers

Dedicated Cache Cluster	Co located cache
Isolates cache fro service Cache and service do not share memory and CPU can scale independently can be used by many microservices flexibility in choosing hardware	doesnt require seprate hardware low operational and hardware cost scales together with the service

Choosing cache host

Mod function
- (-) behaves differently when a new client is added or one is removed , unsuitable for prod
Consistent hashing ( chord)
- maps each value to a point on circle

Isolated thread pool in circuits and ensure full recovery before calling the service again.

(+) Circuit breaker event causes the entire circuit to repair itself before attempting operations.

References :

[1] http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html
[2] http://engineering.hackerearth.com/2013/10/07/scaling-database-with-django-and-haproxy/
[3] Mastering Chaos – A Netflix Guide to Microservices , QCon London
[4] https://dzone.com/refcardz/event-stream-processing-essentials
[5] p99 latency https://stackoverflow.com/questions/12808934/what-is-p99-latency
[6] Oracle site on partitioing- https://docs.oracle.com/en-us/iaas/Content/Streaming/Concepts/partitioningastream.htm
[7] Mastering Chaos – A Netflix Guide to Microservices https://www.youtube.com/watch?v=CZ3wIuvmHeM&ab_channel=InfoQ
[8] Netflix cloud architecture https://netflixtechblog.com/tagged/cloud-architecture

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

	Boris Ivanov on Asterisk – installation…
	Paras Kumar on Hosted IP-PBX and SBC
	altanai on Hosted IP-PBX and SBC
	Debra Olsen on Streaming / broadcasting Live…
	Things to know about… on WebRTC
	Hugo K on FreeSwitch SIP and Media …
	Bert H on Evolution of voice Commun…

High availiability and Scalibility in VoIP platforms

Load Balancers

Service Discovery

Keepalive, unregistering unhealthy nodes

Replication

Data Store Replication

Quick Response / Low latency

Message format

Gateways for faster routing and caching to services

Separate hot storage from cold storage

Scalability

Partitioining

Autoscalling

Multiple PoPs (point of presence)

Minimal Latency and lowest amount of tarffic via public internet

HA ( High-availability )

Downtime / SLA of 5 9’s in aggregate failures

HA for Proxy / Load balancer (LB)

HA for Call Control app server

Media Server HA

Security against malicious attacks

Identifying outages and Alerting

Bottlenecks

Performance Testing

Robust QA framework( stress and monkey testing) to identify potential bottlenecks before going live

Distributed Data Store

Popular DB storage technologies

Message Queues(Buffering) vs Batch Processing

Distributed Event management and Event Driven architecture using streams

Lambda Architecture

Distributed cache for call control Servers

Choosing cache host

Cache Replacement

Consistency and High Availiability in Cache setup

Circuits – fail fast, wait for circuit to recover before using again

Leave a comment Cancel reply

Data Store Replication

Message format

Gateways for faster routing and caching to services

Separate hot storage from cold storage

Minimal Latency and lowest amount of tarffic via public internet

Downtime / SLA of 5 9’s in aggregate failures

HA for Proxy / Load balancer (LB)

HA for Call Control app server

Media Server HA

Performance Testing

Popular DB storage technologies

Share this:

Related

Leave a comment Cancel reply