VoIP/ OTT / Telecom Solution startup’s strategy for building a scalable flexible SIP platform

I have been contemplating points that make for a successful developer to develop solutions and services for a Telecom Application Server. The trend has shown many variations from pure IN programs like VPN, Prepaid billing logic to SIP servlets for call parking, call completion. From SIP servlets to JAISNLEE open standard-based communication.

Scalable and Flexible SIP platform building

This section has been updated in 2020

A cloud communication provider is who acts as a service provider between the SME ( Small and Medium Enterprises ) and Large scale telco carrier. An important concern for a cloud provider is to build a Scalable and Flexible platform. Let’s go in-depth to discuss how can one go about achieving scalability in SIP platforms.

Multi geography Scaled via Universal Router

A typical semi multi-geography scaled, read replica based / data sharding based Distributed VoIP system which is controlled by a router that distributes the traffic to various regions based on destination number prefix matching looks like

Cluster SIP telephony Server for High Availability

Clusters of SIP servers are great at providing High availability and resilience however they also add a factor of latency and management issues.

Considerations for a clustered SIP application server setup

  • memory requirements to store the state for a given session and the increasing overhead of having more than two replicas within a partition.
  • Co-hosted virtual machine add resource contention and delay call established due to multi-node traversal.
  • In case of node failures or plannet reboot after upgrade, the traffic redirection needs draining existing calls from sip server before briniging it down. This setup ensures that
    • no new calls are channelled to this server
    • servers waits for existing calls to end before reboot.
  • Fail fast and recover : The system should be reliable to not let a node failure propagate and become root cause for entire system failure due to corrupted data.

Failure Recovery

A Clustered SIP platform is quickly recoverable with containerized applications. Clear separation between stateless engine layer and session management or Data layer is critical to enable auto-reboot of failed nodes in engine layer.

It should be noted that, unlike HTTP based platforms, dialogue and transaction state variables are critical to SIP platforms for example, call duration for CDR entry. Therefore for a mid-call failure and auto reboot the state variable should be replicated on an extrenal cache so that value can persist for correct billing.

Multi-tier cluster architecture

Symmetrical Multi-Processing (SMP) architectures have

  • stateless “Engine Tier” processes all traffic and
  • distributes all transaction and session state to a “Data Tier.”

A very good example of this is the Oracle Communications Converged Application Server Cluster (OCCAS) which is composed of 3 tiers :

  1. Message dispatcher,
  2. Communication engine stateless
  3. Datastore which is in-memory session store for the dialogues and ongoing transactions

An advantage of having stateless servers is that if the application server crashes or reboots, the session state is not lost as a new server can pick up the session information from an external session store .

Role Abstraction / Micro-Service based architecture

The components for a well-performing highly scalable SIP architecture are abstracted in their role and responsibilities. We can have categories like

Load Balancer / Message Dispatcher

LB routes traffic based on an algorithm (round robin, hashing , priority based scheduling, weight-based scheduling ) among active and ready servers

Backend Dynamic Routing and REST API services 

Services which the Application server calls during its call flow execution which may include tasks like IP address associated with the caller, screened numbers associated with destination etc such as XML Remote Procedure Call (XML-RPC) or AVAPI Service in Kamailio

OSS/BSS layer 

This layer is responsible for jobs in relation to operations and billing and should take place in an independent system without affecting the session call flow or causing a high RTT. Some top features possible with defining this layer well are

  • POS CRM ,Order Management , Loyality , feedback , ticketing
  • Post Paid Billing , Inter-carrier Billing
  • BPM and EAI
  • Provisioning & Mediation
  • Number Management
  • Inventory
  • ERP, SCM
  • Commissions
  • Directory Enquiry
  • Payments & Collections
  • BI ( Business Intelligence)
  • Fraud and RAS
  • Pre-Paid Billing
  • Document Management
  • EBPP, Self Care

There are other componets ina typical VoIP micro services architecture such as Heartbeat service , backend accounting servuce , security check service, REST API service , synmaic routing service , event notofication service etc which should be decoupled from each other leading to high parallel programing approach.

Containerization and Auto deployment

To improve Flexibility w.r.t Infrastructure binding, all server components including edge components, proxies, engines, media servers must be containerized in form of images or docker for easy deployment via an infrastructure tool like Kubernetes, Terraform, chef cookbooks and be efficiently controlled with an Identify manage tool and CICD ( continuous integration and Delivery ) tool like Travis or Jenkins.

Autoscaling Cloud Servers using containerized images

Autoscaled servers are provided by the majority of Cloud Infrastructure providers such as AWS ( Amazon Web Services ), Google Cloud platform which scale the capacity based on traffic in real-time also called elasticity. Any VoIP developer would notice patterns in voice traffic such as less during holidays/night hours where servers can be freed, whereas traffic peaks during days where server capacity needs to scale up.

Additionally, traffic may pike when the setup is under DDoS attacks, not an uncommon thing for SIP server, then the server needs to identify and block malicious source points and prevent unnecessary upscaling. There are 2 approaches to scaling

Scale UP / Vertical ScalingScale OUT / Horizontal scaling
Resusing the existing server to upgrade performance to match the load requirnmentsIncreasing the number of servers and adding their IP to Load balancer to manage traffic .
It should be noted that scalling up or down shouel be carried out incrementally to have better control on resource to requirnment ratio.
Hardware resource map for Clustered Application server , Media Server Database cluster , LB , monitoring server
Hardware resource map for Clustered Application server , Media Server Database cluster , LB , monitoring server

Security should be a priority

It is crucial for any Voice traffic / media servcis provoder to have state of the art security in the content without disrupting data privacy norms.

SIP secure practises like Authentication , authorization ,Impersonating a Server , Temparing Message bodies, mid-session threats like tearing down session , Denial of Service and Amplification , Full encryption vs hop by hop encrption , Transport and Network Layer Security , HTTP Authentication , SIP URI, nonce and SIP over TLS flows , can be read at https://telecom.altanai.com/2020/04/12/sip-security/

While scaling out the infrastructure for extensing the Pop( point of presence ) accross the differnet geographies , define zones such as

  • red zone : public facing server like load balancers
  • dmz zone ( demilitarized zone ) interfacing servers betwee private and public network
  • green zone : provate and secure interal serer which communicate over private IPs snd should ne unrechable from outside .

To futher increase efficiency between communication and transmission between green zone server , setup private VPC ( Virtual provate cloud ) between them .

Open standards

To establish itself as a dependable Realtime communication provider , the product must follow stabdardised RFC’s and stacks such as SIP RFC 3261 and W3C drfat for Webrtc peer connection etc . It si also a good practise to be updated with all recommendation by ITU and IANA and keep with the implementation . For exmaple : STIR/SHAKEN https://telecom.altanai.com/2020/01/08/cli-ncli-and-stir-shaken/

Data Privacy

Adhere to Privacy and protection standards like GDPE , COPPA , HIPPA , CCPA. More details on VoIP certificates , compliances and security at https://telecom.altanai.com/2020/01/20/certificates-compliances-and-security-in-voip/

Product Innovation and Market Differentiator

Innovation + Experiment + Oyt of Box Thinking

Many Communication service providers offer Voice over IP and related unified communication and collaboration platforms. A new VoIP provider needs to envision enhancements and innovations that meet the growing user expectation.

  • Easy to follow technical documentation and help and quick response to any technical question about platform posted on QnA sites (StackOverflow, Quora .. ), tech forums ( Google groups, slack channels .. ) even Twitter handles to address issues.

Data Visualization Tools – Show overall call quality insights, call flows, stats, probable issues, fixes, spending/saving on user groups, duration, negative-positive margins, healthy/unhealthy calls, spams etc. 

Graphical Event Timelines – time based events such as call setup , termination , codec negotiation , call rediection events

Drag and Drop Call Flow deisgner – As call routing logic beome more complicated with a large set of known and pre-defined operations ( parking , routing , voicemail , forking , rediercting etc) . The call routing can be easily composed from these preset operation as UI block attached to a call flow chain which results in calls being channels as predefined by this call flow logic . Leads to plenty of cutomaizibility and design flexibility to custoemrs to design their calls.

Pricing Model

Encourage users to use the services either for free or for a minimal price

Besides increasing onboarding count and developing an internationla presence, this also helps gain a good word and pays long term.

  • Discount, onboarding bonuses encourage users to try out services without signing up with long term contracts. The value could range from 5-15$ one-time onboarding prize to use services such as DID number purchase, outgoing telco call or purchasing any other service addon.
  • No or minimal onboarding cost
  • Toll-free minutes 50- 1000 minutes per month.

Competitive Pricing some of the enteries below show an approximate pricing figure for various service ( note these may be outdated and should references be used as it is).

  • Pay as you go pricing : Rate per minute (USD) plan for example( from google voice )

Australia – Mobile ~$0.02
Portugal – Mobile ~$0.15
Switzerland – Mobile – ~Orange $0.11
United Kingdom – Mobile – ~Orange $0.02
United Kingdom – Mobile – ~Vodafone $0.01

Outbound calls to PSTN ~$0.015 per min ( depending on teleco and destination)
Incoming Voice Calls on a Local Number ~$0.0060 per min
Incoming Voice Calls on a Toll-Free Number ~$0.020 per min
VoIP Calls (In-App WebRTC & SIP) ~$0.003 per min

  • Addon Services

Call Recordings ~$0.0025 per min
Voicemail Detection, Call Analyzers, Call Transcription ~$0.015 – $0.050 per min ( depends on external API cost or inhouse R&D effort)
Automatic Speech Recognition ~$0.02 per 15 sec

  • Trunk Calls ( heavy volume customers )

Inbound Voice Calls ~$0.0025/min
Outbound Voice Calls ~$0.0065/min
Tollfree Inbound Voice Calls ~$0.0135/min ( toll free numbers usually charge more than local numbers)

“Pay as you go ” Pricing model

Services which should be offered on a non chargable basis :

  • Round the clock technical support
  • Compensation for Downtime
  • CDRs per account
  • IP to IP calls
  • Security Certificates in TLS and SRTP calls
  • Authetication and Authorization

Services that can be charged are

  • Value added services – Live Weather updates , horoscope update ..
  • Carrier Integration – trunk , PRI
  • Toll Free Numbers – DID numbers
  • Virtual Private Network (VPN) : An Intelligent Network (IN) service, which offers the functions of a private telephone network. The basic idea behind this service is that business customers are offered the benefits of a (physical) private network, but spared from owning and maintaining it
  • Access Screening(ASC): An IN service, which gives the operators the possibility to screen (allow/barring) the incoming traffic and decide the call routing, especially when the subscribers choose an alternate route/carrier/access network (also called Equal Access) for long distance calls on a call by call basis or pre-selected.
  • Number Portability(NP) : An IN service allows subscribers to retain their subscriber number while changing their service provider, location, equipment or type of subscribed telephony service. Both geographic numbers and non-geographic numbers are supported by the NP service.

Flexibility for inter-working

Interworking among the services from  legacy IN solution and IMS /IT. Allow the Operators to extend their basic offering with added  services via low cost software and increases the ARPU for subscribers.

Next Gen 911

911 like emergency services afre moving from tradiotional TDM networks to IP networks . However this poses some challenges such as detecting callers geolocation and routing the call to his/her nearest servicing station pr Public safety Answering Point ( PSAP)

Backward compatibility with existing legacy networks

PSTN-SIP gateways to interface bwteen SIP platform and SS7 siganlling platform also convert the RTP stream to Analog waveforms required byb PSTN endpoints

Internetworking with IMS

IMS is a IP telephony service archietcture developed by 3rd Generation Partnership Project ( 3GPP) ,global cellular network standards organization that also standardized Third Generation (3G) services and Long Term Evolution (LTE) services

More about IMS ( IP multimedia System )

Develop on Interactive and populator frameworks like webRTC

Agile Development and Service Priented Architecture (SOA) are proven methods of delievry quality and updated products and releases which can cater to eveolcing market demands . In short “Be Future ready while protecting the existing investments”

Make a WebRTC solution that offers a plug in free, device agnostic, network agnostic web based communication tool along with the server side implementation.


Read More about WebRTC Communication as a platform Service – https://telecom.altanai.com/2019/07/04/webrtc-cpaas-communication-platform-as-a-service/

External Integartions

  • Enterprise communication agents Integration – consider integration with Microsoft 365, Google Workspace, Skype for Business , Slack , WebEx
  • CRM Integartion – Salesforce , Zendesk
  • Business specific integartion
    • Canvas for eleraning
    • telehealth platform for doctor consultation
  • A2P ( application to person) msging

Integration of the services with social media/networking enables new monetizing benefits to CSPs especially in terms on advertising and gaining popularity , inviting new customers etc.


Enterprises seek to reach their customers with trusted telecom mediums such as phone calls/SMS. Telcos play an instrumental role in increasing the customer’s trust for an enterprise by means of updates over call and SMS in addition to emails and postal mail. The medium of VoIP services offers value addition in their present product/service delivery model for any firm whether it be an e-commerce firm or banking.

VoIP providers should develop an SDK rich, a dev-friendly arrangement that can facilitate onboarding SMEs ( small-medium enterprises) by self-guided tutorials and quick setups.

Support developer base to aggregate, use open-standard services/technologies and tie them with other communication technologies suited to their business use-case using the VoIP platform as the medium of communication between web/mobile app endpoints and telecom endpoints.

Operational Efficiencies

Log aggregation and Analytics.
PagerDuty Alerts
Daily and Weekly backups and VM snapshots.
Automated sanity Tests
Centralized alert management, monitoring and admin dashboards .
Deployment automation / CICD
Tools and workflows for diagnostics, software upgrades, OS patches etc.
Customer support portal , provisioning Web Application

Read about VoIP system DevOps, operations and Infrastructure management, Automation


QoS : Media Stats can help us collect the call qulaity metrics which determins the overall USer experience. Some frequently encountered issues include

High Packet Loss 250 ms of audio suration lost in 5 secbroken audio
High Jitterjitter >= 30 ms in 5 secrobotic audio
Low Audio Levelaudio level < -80dBinaudible
High RTTRTT > 300 ms in 5 seclags

Pro-active Isssue Tracking via Call Meta data Analysis

Call details even during a setup phase , continuation or reinvite /update phase can suggest the probably outcomes based on previous results such as bad call quality from certain geographic areas due to their known network or firewall isseus or high packet loss from certain handset device types . We can deduce well in advance what call quality stats will be generated from such calls .

Contains which can be identfied from calls setup details itself include :

  • geography and number – Call was made from which orignating location to which destination
  • SIP devices – device related details , Version of device (browser version etc..,)
  • Chronological aspects of call – Initiation, ring start, pick up and end time.
  • call direction – inbound ( coming from carrier towards our VoIP platform ) or outbound ( call directed to carrier from out VoIP platform )
  • Network type – network ssues and quality score across network type

Contarins which can be identfied during a ongoing call itself include :

  • Participants and their local time – ongoing RTCP from Legs, probability of long Conferences is low in off hours
  • Call events – DTMF, XML, API calls , quality issues

The minor issues identified during an ongoing calls RTCP packets such as increasing jitter or packet loss can extrapolate to human perceivable bad audio quality of call after a while . Thus any suspected issues should be identified as early as traced and corrective action should be put in place .

Predicting Low Audio / Call quality

Having a predictive engine can forecast bad call Quality such as 408 timeouts , high RTT , low audio level , Audio lag , one way audio , MOS < 2.5 out of 5 etc .

The predictive engine can use targeted notifications pointing towards specific issues that can comeup in a call relatine and assign a technical rep to overlook or manually intervene .
This can include scenario such as an agent warning a customer that his bad audio quality is due to him using an outdated SIP Device with slow codecs and suggest to upgrade it to lightweight codecs as per his bandwidth. This saves bad user experince of the customer and can happen without cusomer reporting the issues homself with feedback , RTP stats , PCAPS etc. Save a lot of trouble and effort in call debugging .

Media Procesisng

CSP’s are looking into long term growth and profitability from new online services media streaming services. A new VoIP provider could develop use-cases exceeding the exsting usecase of media stream rending to create a differentiator such as

  • Streaming
  • Conference bridges/mixers
  • Recording and playback
  • IPTV and VOD ( Video On Demand)
  • Voicemails , IVR , DTMF,
  • TTS( text to speech ),
  • realtime transcription / captioning
  • Speech recognition etc

Some more services that a new VoIP provider should consider

  • Feedback gathering and User satisfaction surveys
  • Quick issues detection and detailed RCA

References :

SIP: https://telecom.altanai.com/2013/07/13/sip-session-initiaion-protocol/

What is OTT – https://telecom.altanai.com/2014/10/24/developing-a-ott-over-the-top-communication-application/

WebRTC Business benifits to OTT and telecom carrier – https://telecom.altanai.com/2013/08/02/webrtc-business-benefits/