VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

I have been contemplating points that make for a successful developer  to develop solutions and services for a  Telecom Application Server.  The trend has shown many variations from pure IN programs like VPN , Prepaid billing logic to SIP servlets for call parking , call completion. From SIP servlets to JAISNLEE open standard based communication.

Read about Introduction to SIP : https://telecom.altanai.com/2013/07/13/sip-session-initiaion-protocol/

Scalable and Flexible SIP platform building

This section has been updated in 2020

Most importatnl things for a OTT provider who acts as a service provider between the SME ( SMall and Medium Enterprises ) and Large scale telco carrier , is to buid Scalable and Flexible platform . Lets go in depth to discuss how can one go about schieving scalibility in SIP platforms .

Multi geography Scaled via Universal Router

A typical semi multi geography scaled , read replica based / data sharding based Distributed VoIP system which is controlled by a router that distributes the tarfffic to various regions based on destination number prefix matching looks like

Cluster SIP telephony Server for High Availiability

Clusters of SIP server are great at provding High availiability and resilience however they also add a factor of lantency and management issues .

considerations for a cluster

  • memory requirements to store the state for a given session and the increasing overhead of having more than two replicas within a partition.
  • Co-hosted virtual machine add resource contenstion and delay call established due to multi node traversal .
  • Additionally incase of node failures or reboots, the traffic redirection needs careful planning and can add complications in network.
  • System should be reliable to not let a let node failure propagate and become root cause for entire system failure due to corrupted data .

Failure Recovery

A Clustered SIp platform is quickly recoverble with Containerized applications

Clear separation between stateless engine layer and session management or Data layer is crtical to enable auto reboot of failed nodes in engine layer .

It should be noted that unlike HTTP based platforms , dialog and transaction state varaibles are crtical to SIP platfroms for exmaple , call duration for CDR entry . Therefore for a mid call failure and auto reboot

Multi-tier cluster architecture

Symmetrical Multi-Processing (SMP) architectures have

  • stateless “Engine Tier” processes all traffic and
  • distributes all transaction and session state to a “Data Tier.”

A very good exmaple of this is Oracle Communications Converged Application Server Cluster (OCCAS) which is composed of 3 tiers

Message dipatcher , Communication engine stateless and last Datastore which is in-memory session store for the dialogs and ongoing transactions

An advantage of having statless servers is that is the application server crashes or reboots , the session sattes is not lost as new server can pick up the session ifnromation from exgternal session store .

Role Abstraction / Micro-Service based architecture

The compoenets for a well performing highly scalable SIP arachitecture are abstracted in their role and reponsibilities . We can have catagories like

Load Balancer / Message Dispatcher

routes tarffic based on algorithm (round robin , hasing , prioroity based scejduling , weight based scheduling ) among active and ready servers

Backend Dynamic Routing and REST API services

Services which the Aplication server calls during its callflow excution which may include tasks like IP address associated with caller , screened numbers associated with destination etc such as XML Remote Procedure Call (XML-RPC) or AVAPI Service in kamailio

OSS/BSS layer

This layer is reponsible for jobs relation to operations and billing and should take place in indpendant system without affacting the session call flow or causing a high RTT .

POS CRM ,Order Management , Loyality , feedback , ticketing
Post Paid Billing , Inter-carrier Billing
BPM and EAI
Provisioning & Mediation
Number Management
Inventory
ERP, SCM
Commissions
Directory Enquiry
Payments & Collections
BI
Fraud and RAS
Pre-Paid Billing
Document Management
EBPP, Self Care

There are other componets ina typical VoIP micro services architecture such as Heartbeat service , backend accounting servuce , security check service, REST API service , synmaic routing service , event notofication service etc which should be decoupled from each other leading to high parallel programing approach.

Distributed Event management and Event Driven architecture

Distributed event management , monitoring and working on Data stream instead of stored Database

Distributed Messaging using Data streaming instead of static stored database data

Containerization

To improve Flexibility w.r.t Infrastructure binding ,, all server compoenets including edge compoenets , proxies , enginies , emdia server must be containerized in form of images or docker for easy deployment via an infracstructure tool like kubernetics , terraform , chef cookbooks and be efficently controleed with an Identify manage tool and CICD ( continous integartion and Delivery ) tool like Travis or jenkins

Autoscalling Cloud Servers

Autoscalled server are provided by majority of Cloud Infrastrcture provicderd such as AWS ( Amazon Web Services ), Google Cloud platform which scale the capacitty based on traffic in realtime also called elasticity. Any VoIP developer would notice patterns in voice traffic such as less during holidays/night hours where servers can be freeed, whereas taffic peaks during days where server capacity needs to scale up.

Additionally traffic may pike when the setup is under DDos attacks , not an uncommon thing for SIP server , then the server need to identify and block malacious source points and prevent unnecessary up scaling .

There are 2 approaches to scaling

Scale UP / Vertical Scaling

Resusing the existing server to upgrade performance to match the load requirnments

Scale OUT / Horizontal scaling

Increasing the number of servers and adding their IP to Load balancer to manage traffic .

It should be noted that scalling up or down shouel be carried out incrementally to have better control on resource to requirnment ratio.

Other points points here that make for a successful startup   in logic building domain of telecom core network .

Security

It is crucial for any Voice traffic / media servcis provoder to have state of the art security in the content without disrupting data privacy norms.

SIP secure practises like Authentication , authorization ,Impersonating a Server , Temparing Message bodies , mid-session threats like tearing down session , Denial of Service and Amplification , Full encryption vs hop by hop encrption , Transport and Network Layer Security , HTTP Authentication , SIP URI, nonce and SIP over TLS flows , can be read at https://telecom.altanai.com/2020/04/12/sip-security/

While scaling out the infrastructure for extensing the Pop( point of presence ) accross the differnet geographies , define zones such as

  • red zone : public facing server like load balancers
  • dmz zone ( demilitarized zone ) interfacing servers betwee private and public network
  • green zone : provate and secure interal serer which communicate over private IPs snd should ne unrechable from outside .

To futher increase efficiency between communication and transmission between green zone server , setup private VPC ( Virtual provate cloud ) between them .

Follow Open standards and Data Privacy

To establish itself as a dependable Realtime communication provider , the product must follow stabdardised RFC’s and stacks such as SIP RFC 3261 and W3C drfat for Webrtc peer connection etc . It si also a good practise to be updated with all recommendation by ITU and IANA and keep with the implementation . For exmaple : STIR/SHAKEN –https://telecom.altanai.com/2020/01/08/cli-ncli-and-stir-shaken/

Adhere to Privacy and protection standards like GDPE , COPPA , HIPPA , CCPA. More details on VoIP certificates , compliances and security at https://telecom.altanai.com/2020/01/20/certificates-compliances-and-security-in-voip/

Product Innovation and Market Differentiator

In a crowded market of many SIP Service providers and platforms

Envisions a multiple network technologies, that provides ability to build over new innovative cutting edge technologies in the market. It should deliver platform to launch newer  services like WebRTC and RCS .

innovation
Innovation + Experiment + Oyt of Box Thinking

As a market differentiator following tools are advised

Easy to follow technical documentation and help and quick response to any technical question about platform posted on QnA sites (stackoverflow , Quora .. ) , tech forums ( Google groups , slack channels .. ) or else where ( facebook , twitter .. )

Data Visualization Tools – Show overall call quality insights , call flows , stats , probale issues , fixes , graphs , spending , saving , duaryion , negative positive margins , helathy unelathy calls , spams etc .

Graphical Event Timelines – time based events such as call setup , termination , codec negotiation , call rediection events

Drag and Drop Call Flow deisgner – As call routing logic beome more complicated with a large set of known and pre-defined operations ( parking , routing , voicemail , forking , rediercting etc) . The call routing can be easily composed from these preset operation as UI block attached to a call flow chain which results in calls being channels as predefined by this call flow logic . Leads to plenty of cutomaizibility and design flexibility to custoemrs to design their calls .

Competitive Pricing with Low or No Servicing cost

Cutting down the spiraling cost of Development of the new technologies platform with improvement in the usage of Data rather than voice by integrating new features like File sharing and MSRP messaging. An evolutionary architecture to reduce the effort and cost through high re-use of NGN Platform and Services.

Pictures2
Use Opensource Products

Introduce uniform service experience across different platforms which helps CSP’s to reduce Time Cycles and Costs for handling enhancements requests and the annual OPEX appreciably.

Pictures1
“Pay as you go ” Pricing model

Services which should be offered on a non chargable basis :

  • Round the clock technical support
  • Compensation for Downtime
  • CDRs per account
  • IP to IP calls
  • Security Certificates in TLS and SRTP calls
  • Autheticationa nd Authorization secure practises

Services that can be charges are Value added services

Carrier Integration – trunk , PRI

Toll Free Numbers – DID numbers

Virtual Private Network (VPN) : An Intelligent Network (IN) service, which offers the functions of a private telephone network. The basic idea behind this service is that business customers are offered the benefits of a (physical) private network, but spared from owning and maintaining it

Access Screening(ASC): An IN service, which gives the operators the possibility to screen (allow/barring) the incoming traffic and decide the call routing, especially when the subscribers choose an alternate route/carrier/access network (also called Equal Access) for long distance calls on a call by call basis or pre-selected.

Number Portability(NP) : An IN service allows subscribers to retain their subscriber number while changing their service provider, location, equipment or type of subscribed telephony service. Both geographic numbers and non-geographic numbers are supported by the NP service.

Flexibility for inter-working

Interworking among the services from  legacy IN solution and IMS /IT. Allow the Operators to extend their basic offering with added  services via low cost software and increases the ARPU for subscribers.

Next Gen 911

911 like emrgency services afre moving from tradiotional TDM networks to IP networks . However this poses some challenges such as detecting callers geolocation and routing the call to his/her nearest servicing station pr Public safety Answering Point ( PSAP)

Backward compatibility with existing legacy networks

PSTN-SIP gateways to interface bwteen SIP platform and SS7 siganlling platform also convert the RTP stream to Analog waveforms required byb PSTN endpoints

Internetworking with IMS

IMS is a IP telephony service archietcture developed by 3rd Generation Partnership Project ( 3GPP) ,global cellular network standards organization that also standardized Third Generation (3G) services and Long Term Evolution (LTE) services

More about IMS ( IP multimedia System )

Develop on Interactive and populator frameworks like webRTC

Agile Development and Service Priented Architecture (SOA) are proven methods of delievry quality and updated products and releases which can cater to eveolcing market demands . In short “Be Future ready while protecting the existing investments”

Make a WebRTC solution that offers a plug in free, device agnostic, network agnostic web based communication tool along with the server side implementation.

webrtc

Read More about WebRTC Communication as a platform Service – https://telecom.altanai.com/2019/07/04/webrtc-cpaas-communication-platform-as-a-service/

Operational Efficiencies

Log aggregation and Analytics.
PagerDuty Alerts
Daily and Weekly backups and VM snapshots.
Automated sanity Tests
Centralized alert management, monitoring and admin dashboards .
Deployment automation / CICD
Tools and workflows for diagnostics, software upgrades, OS patches etc.
Customer support portal , provisioning Web Application

Read about VoIP system DevOps, operations and Infrastructure management, Automation

Feedback and Proactive Issue Tracking

Media Stats can help us collect the call qulaity metrics which determins the overall USer experience . Some frequently encountered issues include

IssueCauseObservance
High Packet Loss 250 ms of audio suration lost in 5 secbroken audio
High Jitterjitter >= 30 ms in 5 secrobotic audio
Low Audio Levelaudio level < -80dBinaudible
High RTTRTT > 300 ms in 5 seclags

Pro-active Call Analysis

Call details even during a setup phase , continuation or reinvite /update phase can suggest the probably outcomes based on previous results such as bad call quality from certain geographic areas due to their known network or firewall isseus or high packet loss from certain handset device types . We can deduce well in advance what call quality stats will be generated from such calls .

Contains which can be identfied from calls setup details itself include :

  • geography and number – Call was made from which orignating location to which destination
  • SIP devices – device related details , Version of device (browser version etc..,)
  • Chronological aspects of call – Initiation, ring start, pick up and end time.
  • call direction – inbound ( coming from carrier towards our VoIP platform ) or outbound ( call directed to carrier from out VoIP platform )
  • Network type – network ssues and quality score across network type

Contarins which can be identfied during a ongoing call itself include :

  • Participants and their local time – ongoing RTCP from Legs, probability of long Conferences is low in off hours
  • Call events – DTMF, XML, API calls , quality issues

The minor issues identified during an ongoing calls RTCP packets such as increasing jitter or packet loss can extrapolate to human perceivable bad audio quality of call after a while . Thus any suspected issues should be identified as early as traced and corrective action should be put in place .

Predicting Low Audio / Call quality

Having a predictive engine can forecast bad call Quality such as 408 timeouts , high RTT , low audio level , Audio lag , one way audio , MOS < 2.5 out of 5 etc .

The predictive engine can use targeted notifications pointing towards specific issues that can comeup in a call relatine and assign a technical rep to overlook or manually intervene .
This can include scenario such as an agent warning a customer that his bad audio quality is due to him using an outdated SIP Device with slow codecs and suggest to upgrade it to lightweight codecs as per his bandwidth. This saves bad user experince of the customer and can happen without cusomer reporting the issues homself with feedback , RTP stats , PCAPS etc. Save a lot of trouble and effort in call debugging .

Social Media Platform Integration such as Skype for Business , Slack , WebEx

Integration of the services with social media/networking enables new monetizing benefits to CSPs especially in terms on advertising and gaining popularity , inviting new customers etc.

resources

Enterprises are looking forward to reach customers with ennoblement of Telco in their present landscape which was impossible to reach before. Telco not only plays an instrumental role in increasing the customers base which results into increase in enterprise’s revenue but also offers the value addition in their present product/service delivery model.  Hence it is high-time when developers can aggregate , use open-standard services / technologies ( GSMA , SIP , WebRTC )  and develop high end solutions for Telecom Domain .

Effienet Media Management – Media Streaming , conferencing , Recording and playback

CSP’s are looking into Long term growth and profitability from new online services media streaming services . Make use-cases around IPTV and VOD ( Video On Demand) . Also Voicemails , IVR , DTMF, TTS( text to speech ) , Speech recognition etc

Picture1

References :

What is OTT – https://telecom.altanai.com/2014/10/24/developing-a-ott-over-the-top-communication-application/

WebRTC Business benifits to OTT and telecom carrier – https://telecom.altanai.com/2013/08/02/webrtc-business-benefits/

jitter Wikipedia – https://en.wikipedia.org/wiki/Jitter

What when how – http://what-when-how.com/voip-protocols/acceptability-of-a-phone-call-with-echo-and-delay-voip-protocols/