High availiability and Scalibility in VoIP platforms


Load Balancers

Load Balancer(LB) is the initial point of interaction between the client application and the core system. It is pivotal in the distribution of the load across multiple servers and ensuring the client is connected to the nearest VoIP/SIP application server to minimize latency. However, the load balancers are also susceptible to security breaches and DOS attacks as they have a public-facing interface. This section lists the protocols, types and algorithms used popularly in Load balancers of VoIP systems.

software LBLayer 4 / hardware LB
Nginx
Amazon ELB ( eleastic load balanecr)
F5 BIG-IP load balancer
CISCO system catalyst
Barracuda load balancer
NetScaler
used by applications in cloud
ADN (Application delivery network)
used by  network address translators (NATs) 
DNS load balancing
examples and roles of software and hardware based load balancers

Load Balancers(LB) ping each server for health status and greylists servers that are unhealthy( respond late) as they may be overloaded or experiencing congestion. The LB monitors it rechecks after a while and if a server is healthy ( ie if a server responds with responds with status update) it can resume sending traffic to it. LB should also be distributed to different data centres in primary-secondary setup for HA.

Networking protocol

TCP LoadbalancerHTTP load balancersSIP based LB as Kamailio/ Opensips
can forwrad the packet without inspecting the content of the packet.terminate the connection and look inside the request to make a load balcing decsiion for exmaple by using a cookie or a header.domain specific to VoIP
(+) fast, can handle million of req per second(+) handle SIP routing based on SIP headers and prevent flooding atacks and other malicious malformed packets from reaching application server

Load balancing algorithms

  • Weighted Scheduling Algorithm
  • Round Robin Algorithm
  • Least Connection First Scheduling
  • Lest response time algorithms
  • Hash based algorithm ( send req based on hashed value such as suing IP address of request URL)
LoadbalancerReverse Proxy
forward proxy server allows multiple clients to route traffic to an external serveraccepts clients requestd for server and also returns the server’s response to the client ie routes traffic on behalf of multiple servers.
Balances load and incoming traffic endpointpublic facing endpoint for outgoing traffic
 additional level of abstraction and security, compression
used in SBC (session border controllers) and gateways

Service Discovery

Client-side or even backend service discovery uses a broadcasting or heartbeat mechanism to keep track of active servers and deactivates unresponsive or failed servers. This process of maintaining active servers helps in faster connection time. Some approaches to Service Discovery

  1. Mesh
    1. (-) exponentially incresing network traffic
  2. Gossip
  3. Distributed cache
  4. Coordination service with Service
    • (-) requesres coordination service for leader selection
    • (-) needs consensus
    • (-) RAFT and pbFT for mnaging failures
  5. Random leader selection
    • (+) quicker
    • (-) may not gurantee one leader
    • (-)split brain problem

Keepalive, unregistering unhealthy nodes

Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint.

Replication

Usuallay there is a tradeoff between liveness and safety.

  1. Single leader replication
    • (-) vulnerable to loss of data is leader goes down before replication completes
    • used to in sql
  2. multileader replication and
  3. leaderless replication
    • (-) increases latencies
    • (-) quorem based on majority , cannot function is majority node are not down
    • used in cassandra

Data Store Replication

For Relatonal Dataabase

For NoSQL databse replication and HA

Quick Response / Low latency

Message format

Textual Message formatBinary Message Format
human readbale like
json xml
diff to comprehend , need shared schema between sender and receiver to serilaize and deserialze ,
names for every field adds to size no field name or only tags , reduces message size

Gateways for faster routing and caching to services

gateways are single entry point to route user requests to backend services .

Separate hot storage from cold storage

hot storage is frquently accessed data which must be near to server

cold storage is less frequently accessed data such as archives

  • object storage
  • slow access

Scalability

To make a system :-

  • scalable : use partitioing
  • reliable : use replication and checkpointing to not loose data in failures
  • fast : use in -memory usage

According to CAP theorem Consistency and Availiability are difficult to achieve together and there has to be a tradeof acc to requirnments.

Partitioining

Partition strategy can be based on various ways such as :-

  • Name based partition
  • geographic partition
  • names’ hashed value based on identifier
    • (-) can lead to hot partitions ( high density in areas of freq accessible identioers )
    • (-) high density spots for example all messages with a null key to go to the same partition
    • (-) doesnt scale
  • event time based hash
    • (+) data is spread evenly over time

To create a well distributed partition we could spread hot partition into 2 partitions or dedicate partitions for freq accessible items. An effective partitioning keys uses

  • Cardinality : total num of unique keys for a usecase. High cardinality leads to better distribution.
    • high cardinatility keys : names , email address , url since they have high variatioln
    • low cardinatlity keys : boolean flags such as gender M/F
  • Selectivity : number of message with each key. High selectivity leads to hotspots and hence low selectivity is better for even distribution.

Autoscalling

Scale Out not Up !

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management used in DevOps. I have mentioned this in detail on the article on VoIP and DevOps below.

Multiple PoPs (point of presence)

for a VOIP system catering to many clients accross the globe or accessing multiple carriers meant for different counteries based on Prefix matching , there should be alocal PoP in most used regions . typically these regions include – US east – west coasts, UK – germany of London , Asia Pacific – Mumbai ,Hong Kong and Australia.

Minimal Latency and lowest amount of tarffic via public internet

Creating multiple POPs and enabling private traffic via VPN in between them ensures that we use the backbone of our cloud provider such as AWS or datacentre instead of traversing via public internet which is slower and more insecure. Hopping on a private interface between the cloud server and maintaining a private connection and keepalive between them helps optimize the traffic flow while keeping the RTT and latency low.

HA ( High-availability )

Some factors affecting Dependability are

  • Eventual Consistency
  • MultiRegion failover
  • Disaster Recovery

A high-availability (HA) architecture implies Dependability.Usually via existence of redundant applications servers for backups: a primary and a standby. These applications are configured so that if primary fails, the other can take over its operations without significant loss of data or impact to business operations.

Downtime / SLA of 5 9’s in aggregate failures

4 9’s of availiability on each service components gives a downtime of 53 mins per service each year. However in aggregate failure this could amlount to (99.99)10 = 99.9 downtime which is 8-10 hours each year.

Thus, aggregate failure should be taken into consideration while designing reliable systems.

HA for Proxy / Load balancer (LB)

A LB is the first point of contact for outbound calls and usually does not save the dialogue information into memory or database but still contain the transaction information in memory. In case the LB crashes and has to restart, it should

  • have a quick uptime
  • be able to handle in dialogue requests
  • handle new incoming dialogue requests in a stateless manner
  • verify auth/authorization details from requests even after restart

HA for Call Control app server

App server is where all the business logic for call flow management resides and it maintains the dialog information in memory.

Issues with in-memory call states : If the VM or server hosting the call control app server is down or disconnected, then live calls are affected, this, in turn, causes revenue loss. Primarily since the state variable holding the call duration would be able to pass onto the CDR/ billing service upon the termination of the call. For long-distance, multi telco endpoint calls running hours this could be a significant loss.

  • Standby app server configuration and shared memory : If the primary app server crashes the standby app server should be ready to take its place and reads the dialog states from the shared memory.
  • Live load balanced secondary app server + external cache for state varaibles : External cache for state variables: a cluster of master-slave caches like Redis is a good way of maintaining the dialogue state and reading from it once the app server recovers from a failed state or when a secondary server figures it has a missing variable in local memory.

Media Server HA

Assuming the kamailio-RTPengine duo as App server and Media Server. These components can reside in same or different VMs. Incase of media server crash, during the process of restoring restarted RTpengine or assigning a secondary backup RTpengine , it should load the state of all live calls without dropping any and causing loss of revenue . This is achived by

  • external cache such as Redis ,
  • quick switchover from primary to secondary/fallback media server and
  • floating IPs for media servers that ensures call continuity inspite of failure on active media server.

Architecturally it looks the same as fig above on HA for the SIP app server.

Security against malicious attacks

Attacks and security compromisation pose a very signficant threat to a VoIP platform.

MITM attacks

Man in midddle attacks can be counetred by

  • End to end encryption of media using SRTP and signals using TLS
  • Strong SIP auth mechanism using challenges and creds where password is composed of mixed alphanumeric charecters and atleast 12 digits long
  • Authorization / whitelisting based on IP which adheres to CIDR notation

DDOS attacks

DDOS renders a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.

dDOS – multiple network hosts to flood a target host with a large amount of network traffic. Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

Can be counetred by

  • detect flooding and q in traffic and use Fail2ban to block
  • challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required)

Read about SIP security practices in deatils https://telecom.altanai.com/2020/04/12/sip-security/

Other important factors leading to security

  • Keystores and certificate expiry tracker
  • priveligges and roles
  • Test cases and code coverage
  • Reviewers approval before code merge
  • Window for QA setup and testing , to give go ahead before deployment

Identifying outages and Alerting

Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion .

Logging and Alerting for a VoIP CPaaS platform .
Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion.

Sources for alert manager

  • Build failed ( code crashes, Jenkins error)
  • Deployment failed ( from Kubernetes , codechef, docker ..)
  • configuration errors ( setting VPN etc )
  • Server logs
  • Server health
  • homer alerts ( SIP calls responses 4xx,5xx,6xx)
  • PCAP alerts ( Malformed SIP SDP ..)
  • Internal Smoke test ( auto testing procedure done routinely to check live systems )
  • Support tickets from customer complaints ( treat these as high priority since they are directly impacting customers)

Bottlenecks

The test bed and QA framework play a very crticial role in final product’s credibility and quality.

Performance Testing

  • Stress Testing : take to breaking
  • Load Testing : 2x to 3x testing
  • Soak Testing : typical network load to long time ( identify leaks )

Robust QA framework( stress and monkey testing) to identify potential bottlenecks before going live

A QA framework basically validates the services and callflows on staging envrionment before pushing changes to production. Any architectural changes should especially be validated throughly on staginng QA framework befire making the cut. The qualities of an efficient QA platform are :

Genric nature – QA framework should be adatable to different envrionments such as dev , staging , prod

Containerized – it should be easy to spn the QA env to do large scale or small scale testing and hence it should be dockerized

CICD Integration and Automation – integrate the testcases tightly with gt post push and pull request creation . Minimal Latency and lowest amount of tarffic via public internet

Keep as less external dependecies as possible for exmaple a telecom carrier can be simulated by using an PBX like freeswitch or asterix

Asynchronous Run – Test cases should be able to run asynchronously. Such as seprate sipp xml script for reach usecase

Sample Testcases for VoIP

  • Authentication before establish a session
  • Balance and account check before establishing a session like whitelisting , blacklisting , restricted permission in a particular geography
  • Transport security and adaptibility checks , TLS , UDP , TCP
  • codec support validation
  • DTMF and detection
  • Cross checking CDR values with actual call initiator and terminator party
  • cross checking call uuid and stats
  • Validating for media and related timeouts

QA frameworks tools – Robot framework

traffic monitor – VOIP monitor

customer simulator – sipP scripts

network traffic analyser – wireshark

pcap collevcter – tcpdump , sngrep

Distributed Data Store

A Distributed Database Design could have many components. It could work on static datastore like

  • SQL DB where schema is important
    • MySQL
    • postgress
    • Spanner – Globally-distributed database from Google
  • NoSQL DB for to store records in json
    • Cassandra – Distributed column-oriented database
  • Cache for low latency retrivals
    • Memcached – Distributed memory caching system
    • Redis – Distributed memory caching system with persistence and value types
  • Data lakes for heavy sized data
    • AWS s3 object storage
    • blob storage
  • File System
    • Google File System (GFS) – Distributed file system
    • Hadoop File System (HDFS) – Open source Distributed file system

or work on realtime data streams

  • Batch processing ( Hadoop Mapreduce)
  • Stream processing ( Kafka + spark)
    • Kafka – Pub/sub message queue
  • Cloud native stream processing ( kinesis)

Each component has its own pros and cons. The choice depends on requirnments and scope for system behaviour like

  • users/customer usuage and expectation ,
  • Scale ( read and write )
  • Performnace
  • Cost
Users/customersScale ( read / write)PerformanceCost
Who uses the system ?
How the system will be used?
Read / writes per second ?
Size of data per request ?
cps ( calls or click per second) ?
write to read delay ?
p99 latency for read querries ?
should design minimize the cost of development ?

should design mikn ize the cost of mantainance ?
spikes in traffic eventual consistency ( prefer quick stale data ) as compared to no data at all
redundancy for failure management

Some fundamental constrains while design distributed data structure :-

p99 latency : 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.

Request latency:
    min: 0.1
    max: 7.2
    median: 0.2
    p95: 0.5
    p99: 1.3

Inidiviual Events vs Aggregate Data

Inidividual Events ( like every click or every call metric)Aggregate Data ( clicks per minute, outgoing calls per minute)
(+) fast write
(+) can customize/ recalculate data from raw
(+) faster reads
(+) data is fready for decision making / statistics
(-) slow reads
(-) costlier for large scale implementations ( many events )
(-) can only query in the data as was aggregates ( no raw )
(-) requires data aggregation pipeline
(-) hard to fix errors
suitable for realtime / data on fly
low expected data delay ( minutes )
suitable for batch processing in background where delay is acceptable from mintes to hours

Push vs Pull Architecture

Push : A processing server manages state of varaible in memory and pushes them to data store.

  • (-) crashed processingserver means all data is lost

Pull : A temporary data strcyture such as a queue manages the stream of data and processing service pull from it to process before pusging to data stoore.

  • (+) a crashed server has to effect on temporarily queue held data and new server can simply take on where previous processing server left.
  • (+) can use checkpointing
SQLNoSQL
Structured and Strict schema
Relational data with joins
Semi-structured data
Dynamic or flexible schema
(+) faster lookup by index(-) data intensive workload
(+) high throughput for IOPS (Input/output operations per second )
used for
Account information
transactions
best suitable for
Rapid ingest of clickstream and log data
Leaderboard or scoring data
Metadata/lookup tables
DynamoDB – Document-oriented database from Amazon
MongoDB – Document-oriented database

A NoSQL databse can be of type

  • Quorem
  • Document
  • Key value
  • Graph

Cassandra is wide column supports asyn master less replication

Hinge base also a quorem based db also has master based preplication

MongoDB documente orientd DB used leacder based replication

SQL scaling patterns include:

  • Federation/ federated database system : transparently maps multiple autonomous database systems into a single virtual/federated database.
    • (-) slow since it access multiple data storages to get the value
  • Sharding / horizontal partition
  • Denormalization : Even though normalization is more memory efficient denormalization can enhance read performance by additing redundant pre computed data in db or grouping related data.
    • Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.
  • SQL Tuning : “iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals”

Influx DB : to store time series data

AWS Redshift

Apache Hadoop

Redis

Embeed Data : RocksDB

Message Queues(Buffering) vs Batch Processing

Distributed event management, monitoring and working on incoming realtime data instead of stored Database is the preferred way to churn realtime analysis and updates. The multiple ways to handle incoming data are

  1. Batch processing – has lags to produce results, not time crtical
  2. Data stream – realtime response
  3. Message Queues – ensures timely sequence and order
BufferingBatching
Add events to buffer that can be read Add events to batch and send when batch is full
(+) can handle each event(+) cost effective
(+) ensures throughput
(-) if some events in batch fail should whole batch fail ?
(-) not suited for real time processing
S3 like objects storage + Hadoop Mapreduce for processing

Timeout

  • Connection timeout : use latency percentiles to calculate this
  • Request timeout

Retries

  • exponential backoff : increase waiting time each try
  • jitter : adds rabdomness to retry intervals to spread out the load.

Grouping events into object storage and Message Brokers

slower than stream processing but faster than batch processing.

Distributed Event management and Event Driven architecture using streams

In event driven archietcture a produce components performs and action which creates an event thata consumer/listener would subscribes to consume.

  • (+) time sensitive
  • (+)Asynch
  • (+) Decoupled
  • (+) Easy scaling and Elasticity
  • (+) Heterogeneous
  • (+) contginious

Expanding the stream pipeline

Event Streams decouple the source and sink applications. The event source and event sinks (such as webhooks) can asynchronously communicate with each other through events.

Options for stream processing architectures

  • Apache Kafka
  • Apache Spark
  • Amazon kinesis
  • Google Cloud Data Flow
  • Spring Cloud Data Flow

Here is a post from earlier which discusses – Scalable and Flexible SIP platform building, Multi geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Failure Recovery, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, Load Balancer / Message Dispatcher, Back end Dynamic Routing and REST API services, Containerization and Auto Deployment, Auto scaling Cloud Servers using containerized images.

Lambda Architecture

Stream processing on top of map reduce and stream processing engine. In lambda architecture we can send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Lambda Archietcture : stream processing on top of map reduce and stream processing engine. Send events to batch system and stream processing system in parallel. The results are stiched together at query time.

Apache Kafka is used as source which is a framework implementation of a software bus using stream-processing. “.. high-throughput, low-latency platform for handling real-time data feeds”.

Apache Spark : Data partitioning and in memory aggregation.

Distributed cache for call control Servers

Dedicated Cache ClusterCo located cache
Isolates cache fro service
Cache and service do not share memory and CPU
can scale independently
can be used by many microservices
flexibility in choosing hardware
doesnt require seprate hardware
low operational and hardware cost
scales together with the service

Choosing cache host

  • Mod function
    • (-) behaves differently when a new client is added or one is removed , unsuitable for prod
  • Consistent hashing ( chord)
    • maps each value to a point on circle

Cache Replacement

Least Recently Used Cache Replacement

Consistency and High Availiability in Cache setup

ReadReplicas live in differenet data centre for disaster recovery.

Strong consistency using Master Slave

Circuits – fail fast, wait for circuit to recover before using again

Design patterns for a circuit base setup to gracefully handle exceptions using fallback.

Circuit breaker : stops client from repeatedly trying to exceute by calculate the error threshold.

Isolated thread pool in circuits and ensure full recovery before calling the service again.

(+) Circuit breaker event causes the entire circuit to repair itself before attempting operations.

References :

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.