VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.
get CDR ( filtered per cal or acc to specific date or account)
bulk export of CDR
Internal API gateways
API Rate Limiter
Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.
Auto scaling
Load balancer
Rate Limiter
horizotal or vertical scalling can countger incoming traffic
LB can limit number of simultaneous requests. It can reject or send to queue for later operation
Can intelligently understand the cost of each operation and perform throttling.
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately
(-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations) (-) LB cannot ensure uniform distribution of distribution of operations among all servers.
A rate limiter should have low latency, accurate and scalable.
RateLimiter inside the serviceprocess
Rate Limiter as its own process outside as a daemon
(+) faster , no IPC (+) reisstnt to interprocess call failures
(+) programming langiage agnostic daemon (+) uses its own memory space, more predictable
(-) service meory needs to allocate space for rate limiters
widely used for auto discovery of service host
Token based Rate Limiting
provides admission contro
Token bucket filter
define a users quota in terms average rate and burst capacity
Hierarchical Token Bucket ( HTB)
uses the deficit round-robin algorithm for fair queuing
Fair Queing
give paying users a bandwidth fraction of 25%
priority queuing
decide 1 packet/ms for free or reduce rate user
distributes that sender’s bandwidth among the other senders
CBQ ( Class Based Queing)
Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.
Modular QoS command-Line interface (MQC) Shaping
mplement traffic shaping for a specific type of traffic using a traffic policy
When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
When the buffer queue is full, the device discards the buffered packets.
Throttling
delay the packet until the bucket is ready / shaping
drop the packet / Policing
mark the packet as non-compliant
Failure management on Rate Limiter
Node Crash : just less requests trolled
Leaky bucket
tokens can go into -ve
System Design for API gateway
Important points for design API gateway
Serialize data in company binary format
allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
aggregation on API gateway on the fly
Frontend Service
Partitioned Service
Backend Service
Lightweight web service Stateless Request Validation Auth / Authorization TLS(SSL ) termination Server sode encryption Caching Rate Limiting(throttling) Request deduplication
Caching layer between frontend and backend
Replication Leader Selection + Quorem
Distributed messaging system( fast and slow paths) for API
A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.
Load Balancer(LB) is the initial point of interaction between the client application and the core system. It is pivotal in the distribution of the load across multiple servers and ensuring the client is connected to the nearest VoIP/SIP application server to minimize latency. However, the load balancers are also susceptible to security breaches and DOS attacks as they have a public-facing interface. This section lists the protocols, types and algorithms used popularly in Load balancers of VoIP systems.
used by applications in cloud ADN (Application delivery network)
used by network address translators (NATs) DNS load balancing
examples and roles of software and hardware based load balancers
Load Balancers(LB) ping each server for health status and greylists servers that are unhealthy( respond late) as they may be overloaded or experiencing congestion. The LB monitors it rechecks after a while and if a server is healthy ( ie if a server responds with responds with status update) it can resume sending traffic to it. LB should also be distributed to different data centres in primary-secondary setup for HA.
Networking protocol
TCP Loadbalancer
HTTP load balancers
SIP based LB as Kamailio/ Opensips
can forwrad the packet without inspecting the content of the packet.
terminate the connection and look inside the request to make a load balcing decsiion for exmaple by using a cookie or a header.
domain specific to VoIP
(+) fast, can handle million of req per second
(+) handle SIP routing based on SIP headers and prevent flooding atacks and other malicious malformed packets from reaching application server
Load balancing algorithms
Weighted Scheduling Algorithm
Round Robin Algorithm
Least Connection First Scheduling
Lest response time algorithms
Hash based algorithm ( send req based on hashed value such as suing IP address of request URL)
Loadbalancer
Reverse Proxy
forward proxy server allows multiple clients to route traffic to an external server
accepts clients requestd for server and also returns the server’s response to the client ie routes traffic on behalf of multiple servers.
Balances load and incoming traffic endpoint
public facing endpoint for outgoing traffic additional level of abstraction and security, compression
used in SBC (session border controllers) and gateways
Client-side or even backend service discovery uses a broadcasting or heartbeat mechanism to keep track of active servers and deactivates unresponsive or failed servers. This process of maintaining active servers helps in faster connection time. Some approaches to Service Discovery
Mesh
(-) exponentially incresing network traffic
Gossip
Distributed cache
Coordination service with Service
(-) requesres coordination service for leader selection
Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint.
Partition strategy can be based on various ways such as :-
Name based partition
geographic partition
names’ hashed value based on identifier
(-) can lead to hot partitions ( high density in areas of freq accessible identioers )
(-) high density spots for example all messages with a null key to go to the same partition
(-) doesnt scale
event time based hash
(+) data is spread evenly over time
To create a well distributed partition we could spread hot partition into 2 partitions or dedicate partitions for freq accessible items. An effective partitioning keys uses
Cardinality : total num of unique keys for a usecase. High cardinality leads to better distribution.
high cardinatility keys : names , email address , url since they have high variatioln
low cardinatlity keys : boolean flags such as gender M/F
Selectivity : number of message with each key. High selectivity leads to hotspots and hence low selectivity is better for even distribution.
Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management used in DevOps. I have mentioned this in detail on the article on VoIP and DevOps below.
for a VOIP system catering to many clients accross the globe or accessing multiple carriers meant for different counteries based on Prefix matching , there should be alocal PoP in most used regions . typically these regions include – US east – west coasts, UK – germany of London , Asia Pacific – Mumbai ,Hong Kong and Australia.
Minimal Latency and lowest amount of tarffic via public internet
Creating multiple POPs and enabling private traffic via VPN in between them ensures that we use the backbone of our cloud provider such as AWS or datacentre instead of traversing via public internet which is slower and more insecure. Hopping on a private interface between the cloud server and maintaining a private connection and keepalive between them helps optimize the traffic flow while keeping the RTT and latency low.
A high-availability (HA) architecture implies Dependability.Usually via existence of redundant applications servers for backups: a primary and a standby. These applications are configured so that if primary fails, the other can take over its operations without significant loss of data or impact to business operations.
Downtime / SLA of 5 9’s in aggregate failures
4 9’s of availiability on each service components gives a downtime of 53 mins per service each year. However in aggregate failure this could amlount to (99.99)10 = 99.9 downtime which is 8-10 hours each year.
Thus, aggregate failure should be taken into consideration while designing reliable systems.
HA for Proxy / Load balancer (LB)
A LB is the first point of contact for outbound calls and usually does not save the dialogue information into memory or database but still contain the transaction information in memory. In case the LB crashes and has to restart, it should
have a quick uptime
be able to handle in dialogue requests
handle new incoming dialogue requests in a stateless manner
verify auth/authorization details from requests even after restart
HA for Call Control app server
App server is where all the business logic for call flow management resides and it maintains the dialog information in memory.
Issues with in-memory call states : If the VM or server hosting the call control app server is down or disconnected, then live calls are affected, this, in turn, causes revenue loss. Primarily since the state variable holding the call duration would be able to pass onto the CDR/ billing service upon the termination of the call. For long-distance, multi telco endpoint calls running hours this could be a significant loss.
Standby app server configurationand shared memory : If the primary app server crashes the standby app server should be ready to take its place and reads the dialog states from the shared memory.
Live load balanced secondary app server + external cache for state varaibles : External cache for state variables: a cluster of master-slave caches like Redis is a good way of maintaining the dialogue state and reading from it once the app server recovers from a failed state or when a secondary server figures it has a missing variable in local memory.
Media Server HA
Assuming the kamailio-RTPengine duo as App server and Media Server. These components can reside in same or different VMs. Incase of media server crash, during the process of restoring restarted RTpengine or assigning a secondary backup RTpengine , it should load the state of all live calls without dropping any and causing loss of revenue . This is achived by
external cache such as Redis ,
quick switchover from primary to secondary/fallback media server and
floating IPs for media servers that ensures call continuity inspite of failure on active media server.
Architecturally it looks the same as fig above on HA for the SIP app server.
Attacks and security compromisation pose a very signficant threat to a VoIP platform.
MITM attacks
Man in midddle attacks can be counetred by
End to end encryption of media using SRTP and signals using TLS
Strong SIP auth mechanism using challenges and creds where password is composed of mixed alphanumeric charecters and atleast 12 digits long
Authorization / whitelisting based on IP which adheres to CIDR notation
DDOS attacks
DDOS renders a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.
dDOS – multiple network hosts to flood a target host with a large amount of network traffic. Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.
Can be counetred by
detect flooding and q in traffic and use Fail2ban to block
challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required)
Raise Event notification alerts to designated developers for any anolous behavior. It could be call based or SMS basef alert based on the sevirity of the situtaion .
Sources for alert manager
Build failed ( code crashes, Jenkins error)
Deployment failed ( from Kubernetes , codechef, docker ..)
configuration errors ( setting VPN etc )
Server logs
Server health
homer alerts ( SIP calls responses 4xx,5xx,6xx)
PCAP alerts ( Malformed SIP SDP ..)
Internal Smoke test ( auto testing procedure done routinely to check live systems )
Support tickets from customer complaints ( treat these as high priority since they are directly impacting customers)
A QA framework basically validates the services and callflows on staging envrionment before pushing changes to production. Any architectural changes should especially be validated throughly on staginng QA framework befire making the cut. The qualities of an efficient QA platform are :
Genric nature – QA framework should be adatable to different envrionments such as dev , staging , prod
Containerized – it should be easy to spn the QA env to do large scale or small scale testing and hence it should be dockerized
CICD Integration and Automation – integrate the testcases tightly with gt post push and pull request creation . Minimal Latency and lowest amount of tarffic via public internet
Keep as less external dependecies as possible for exmaple a telecom carrier can be simulated by using an PBX like freeswitch or asterix
AsynchronousRun – Test cases should be able to run asynchronously. Such as seprate sipp xml script for reach usecase
Sample Testcases for VoIP
Authentication before establish a session
Balance and account check before establishing a session like whitelisting , blacklisting , restricted permission in a particular geography
Transport security and adaptibility checks , TLS , UDP , TCP
codec support validation
DTMF and detection
Cross checking CDR values with actual call initiator and terminator party
Inidividual Events ( like every click or every call metric)
Aggregate Data ( clicks per minute, outgoing calls per minute)
(+) fast write (+) can customize/ recalculate data from raw
(+) faster reads (+) data is fready for decision making / statistics
(-) slow reads (-) costlier for large scale implementations ( many events )
(-) can only query in the data as was aggregates ( no raw ) (-) requires data aggregation pipeline (-) hard to fix errors
suitable for realtime / data on fly low expected data delay ( minutes )
suitable for batch processing in background where delay is acceptable from mintes to hours
Push vs Pull Architecture
Push : A processing server manages state of varaible in memory and pushes them to data store.
(-) crashed processingserver means all data is lost
Pull : A temporary data strcyture such as a queue manages the stream of data and processing service pull from it to process before pusging to data stoore.
(+) a crashed server has to effect on temporarily queue held data and new server can simply take on where previous processing server left.
(+) can use checkpointing
Popular DB storage technologies
SQL
NoSQL
Structured and Strict schema Relational data with joins
Semi-structured data Dynamic or flexible schema
(+) faster lookup by index
(-) data intensive workload (+) high throughput for IOPS (Input/output operations per second )
used for Account information transactions
best suitable for Rapid ingest of clickstream and log data Leaderboard or scoring data Metadata/lookup tables
DynamoDB – Document-oriented database from Amazon MongoDB – Document-oriented database
A NoSQL databse can be of type
Quorem
Document
Key value
Graph
Cassandra is wide column supports asyn master less replication
Hinge base also a quorem based db also has master based preplication
MongoDB documente orientd DB used leacder based replication
SQL scaling patterns include:
Federation/ federated database system : transparently maps multiple autonomous database systems into a single virtual/federated database.
(-) slow since it access multiple data storages to get the value
Sharding / horizontal partition
Denormalization : Even though normalization is more memory efficient denormalization can enhance read performance by additing redundant pre computed data in db or grouping related data.
Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality. In its full definition, normalization is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.
SQL Tuning : “iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals”
Distributed event management, monitoring and working on incoming realtime data instead of stored Database is the preferred way to churn realtime analysis and updates. The multiple ways to handle incoming data are
Batch processing – has lags to produce results, not time crtical
Data stream – realtime response
Message Queues – ensures timely sequence and order
Buffering
Batching
Add events to buffer that can be read
Add events to batch and send when batch is full
(+) can handle each event
(+) cost effective (+) ensures throughput (-) if some events in batch fail should whole batch fail ? (-) not suited for real time processing
S3 like objects storage + Hadoop Mapreduce for processing
Timeout
Connection timeout : use latency percentiles to calculate this
Request timeout
Retries
exponential backoff : increase waiting time each try
jitter : adds rabdomness to retry intervals to spread out the load.
Grouping events into object storage and Message Brokers
slower than stream processing but faster than batch processing.
In event driven archietcture a produce components performs and action which creates an event thata consumer/listener would subscribes to consume.
(+) time sensitive
(+)Asynch
(+) Decoupled
(+) Easy scaling and Elasticity
(+) Heterogeneous
(+) contginious
Expanding the stream pipeline
Event Streams decouple the source and sink applications. The event source and event sinks (such as webhooks) can asynchronously communicate with each other through events.
Options for stream processing architectures
Apache Kafka
Apache Spark
Amazon kinesis
Google Cloud Data Flow
Spring Cloud Data Flow
Here is a post from earlier which discusses – Scalable and Flexible SIP platform building, Multi geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Failure Recovery, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, Load Balancer / Message Dispatcher, Back end Dynamic Routing and REST API services, Containerization and Auto Deployment, Auto scaling CloudServersusing containerized images.
Stream processing on top of map reduce and stream processing engine. In lambda architecture we can send events to batch system and stream processing system in parallel. The results are stiched together at query time.
Lambda Archietcture : stream processing on top of map reduce and stream processing engine. Send events to batch system and stream processing system in parallel. The results are stiched together at query time.
Apache Kafka is used as source which is a framework implementation of a software bus using stream-processing. “.. high-throughput, low-latency platform for handling real-time data feeds”.
Apache Spark : Data partitioning and in memory aggregation.
Isolates cache fro service Cache and service do not share memory and CPU can scale independently can be used by many microservices flexibility in choosing hardware
doesnt require seprate hardware low operational and hardware cost scales together with the service