A CPasS ( communication platform as a service ) is a cloud-based communication platform like B2B cloud communications platform that provides real-time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces. Traditionally, with IP protected protocols, licensed codecs maintaining a signalling protocol stack, and network interfaces building a communication platform was a costly affair. Cisco, Facetime, and Skype were the only OTT ( over the top) players taking away from the telco’s call revenue. However, with the advent of standardised, open-source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real-time communications on his platform has many options to choose from. This article provides an insight into how CPaaS solutions are architectured and programmed.
SIP and WebRTC are many a times closely knit together as protocl, and media plane techologies to build a communication platform such as CPaaS , UCC, B2b call agent , call centre applicatioinsso on. This integration expected to continue to evolve and improve in order to meet the growing demands of users for high-quality, low-latency communication.
Sample CPass Architecture build on open source technologies
Over all Archietcture of Real Time Comunication ecosystem with Media management, CDR , processing pielines , real time analytics.
There are several assessment technologies that can be used for measuring the quality of WebRTC (Web Real-Time Communications) calls, including:
Mean Opinion Score (MOS): A standardized method for measuring the quality of voice and video calls, based on human perception.
Packet loss and jitter: Measures the amount of packet loss and variation in packet arrival times, which can impact the quality of a call.
Round-trip time (RTT): Measures the time it takes for a packet to travel from the sender to the receiver and back, which can affect the delay in a call.
Bitrate: Denotes the amount of data that is transmitted during a call, which can impact the quality of the audio and video.
Codecs chosen can impact the quality and bandwidth requirements of the call.
Network conditions
Quality of Service (QoS): Measures the quality of the network connection and the ability of the network to support real-time communications.
WebRTC specific metrics: such as video resolution, frames per seconds, audio level, and so on.
PESQ (Perceptual Evaluation of Speech Quality) predict subjective opinion scores of a degraded audio such as warping , varioioable delays
PSR( Peak signal to noise ration)
These technologies can be used in combination to provide a comprehensive assessment of the quality of a WebRTC call and to identify any issues that may be impacting the call quality.
Call server + Media Server that can be interacted with via UA
Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
APIs that can trigger automated calls and perform preprogrammed routing.
Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
Some CPaaS providers also add features like transcribing ,transcoding, recording , playback etc to provide edge over other CPaaS providers
(-) Self-hosted datacenters can be more expensive to set up and maintain, as they require the purchase of hardware and ongoing maintenance costs. (+) no monthly recurring fees to cloud vendors
(+) pay as you go
Scalability
(-) maintenance of racks and servers (-) requires planning for high availability and geographical deployment for redundancy
(+) no stress on resource management like cooling, rack space , wiring etc (+) easy to setup
Reliability
(-) limited to a single location and can be affected by local issues such as power outages.
Cloud providers typically have multiple data centers and will automatically route traffic. (-) outages in cloud infrastructures datacentre could lead to service disruption
Control and Security
(+) more controlled for security or access
(-) not in premise, security can be provisonoed by not in control
Cloud-based infrastructure
Cloud Services as Amazon Web service, Google Cloud, Microsoft Azure, IBM Cloud, Digital Ocean is great resources to host the multiple parts of a CPaaS system such as gateways, media servers, SIP Application servers, other servers for microservices including accounting, profile management, rest services etc. Often virtualized machines ( VMs) mounted on a larger physical remote datacentre are an ideal choice for VoIP and cloud communication providers.
Self hosted / inpremises Servers / private cloud
Marinating datacentre provides flexibility to extend and or develop tightly controlled use cases. It is often a requirement for secure communication platforms pertaining to government or banking communications such as turret phones.
Some approaches are to set up the server with Openstack to manage SDN ( software-defined network). Other approaches also involve VMWare to virtualize servers and then using docker container-managed via Kubernetes to dynamically spawn instances of server as load scaled up or down.
I have come across so many small size startups trying to build CPaaS solutions from scratch but only realising it after weeks of trying to build an MVP that they are stuck with firewall, NAT, media quality or interoperability issues. Since there are so many solutions already out in the market it is best to instead use them as an underlying layer and build applications services using it such as call centre or CRM services making custom wrappers.
Tech insights and experiences
Companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday.
Keeping up with emerging trends
Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP, other ML based enhancements are provided by CPaaS company and would potentially try to abstrct away the implementation details from their SDK users or clients.
Auto Scaling, High Availability
A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure
CAPEX and OPEX
Using a CPaaS saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models.
Call Rates are very critical for billing and charging the users. Any updates from the customer or carriers or individuals need to propagate automatically and quickly to avoid discrepancies and negative margins.
CDR ( Call Detail Record ) processing pipeline
CDRs need to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows. CDR can also be used for a wide variety of analytics including correlations, aggregations, filtering, and sampling.
Updating rate sheet ( charges per call or per second )
The following setup is ideal to use the new input rate sheet values via web UI console or POST API and propagate it quickly to the main DB via a queuing system such as SQS. Serverless operations such as using AWS lambda can be used via a trigger-based system for any updates. This ensures that any new input rates are updated in realtime and maintain fallback values in separate storage as s3 bucket too
In current Voip scenarios a call may be passing thorugh various telco providers , ISP and cloud telephony serviIn current VoIP scenarios, a call may be passing through various telco providers, ISP and cloud telephony service providers where each system maintains its own call records and billing. This in my opinion is duplication and missing a single source of truth. A decentralized, reliable and consistent data store via blockchain coudl potentially maintain the call records making then immutable and non diputable. Some more details on the concept are in the article below.
SBC ( Session Borde Controllers ) are basically gateways that provide interconnectivity between the hosted IP-PBX of the enterprise to the outside world endpoints such as telco service provider, PSTN/ TDM , SIP trunking providers or even third party OTT provider apps like skype for business etc. If you have a hosted IPPBX or PBX in your data-centre or on premise and you need controlled but heavy outflowing traffic, it is a good idea to integrate a resilient and efficient SBC to provide seamless interconnectivity.
For an enterprises such as an Trading floor or warehouse with multiple phone types , softphones , hardphones , turrets etc distributed across various geographies and zones a device agnostic architectural setup is prime . Listing the essentials for setting up such a system. Note supplementary services are data-services , logging , licensing etc are important but kept out of scope to keep focus on functional aspects .
An enterprise application usually is structured in tiers or layers
Client tier – the networks clients communication to the central java programs . Runs on client machines
web tier – state full communication between client and business tier . Runs in server machine.
business tier- handles the logic of the application. The business tier uses the Enterprise Java Bean (EJB) container, which manages the execution of the beans
data tier – encompasses DB drivers . Runs on separate machines for database storage
Event services for Line status notifications
providers lines status notification across enterprise for inter zone and softphone to hardphone .
Routing services
routing calls within enterprise and hardphone sites read more about resource zones later in the article
Consolidated set of all service and component that make up the VOIP platform besides media handlers. It includes SIP adapters, bridge managers, call processing frameworks, API frameworks, healthchecks etc.
Call processing framework ( CPF)
Signalling and call routing logic, mostly in SIP and trunks. Manages identities such as Call Line information, Called Party Information, line status etc in shared memory.
Multiple shared Lines and their statuses
Incases where there is a need to process multiple calls from a single User agent device such as a softphone or hardphone ( common scenario for a turret phone) , the design involves assigning it multiple sip uris and each sip uri will establish a line. When caller calls callee , the line is said to be BUSY , otherwise said to be IDLE. Transition of a shared sip line from IDLE to BUSY is transmitted to others via SIP PUBLISH as other UAs holding the same sip Similarly any other event like transfer is propagated to other via SIP UPDATE
Clustering Call control managers
A Call Communication manager (CCM) from various zones should be able to cowork on call and session management and advanced features such as routing from home guest zone to home zone , call transfer , refer , barge etc. Designing a clustered setup will also provide elasticity , fail-over and high availability. Can use clustered , HA compliant framework such as Oracle Communication Application Server , suited for enterprise level deployments.
Call Replication and distributed memory management
A node will store two types of data: active sessions and passive sessions. The active sessions are used by the node and stored in cache. The passive sessions are the replicas from the other nodes’ active sessions. The passives sessions are stored on a persistent storage.
When dealing with many SIP endpoints , now referred to as resource, it is best to assign the resources to their respective zones. Thus a resource’s status updates will be only updated by its active resource zone while can be read by any resource zone.
Incoming request Zone vs Active Resource Zone
For an Incoming request such a INVITE , check whether the zone sending the request is its active resource zone or not .If the Active Resource Zone is the same zone on which the INVITE came in, then the call is handled by that zone. If the Active Resource Zone is a different zone, then the call needs to be forwarded to the Active Resource Zone.
Bridges for Local Media connections
Although call signalling is handled by a resources active resource zone only, we can still create media bridges in local zone of the resource .
Local MM bridges are used to auto answer an incoming sip line call and create trunk , especially from hardphones which do not support provisional responses.
Interzone proxy Handler
proxies call control messages between active and non active resource zones. Primarily mapping the sip messages with all custom headers inbetween the communication device interfaces.
Dial Trunk using multiple dedicated SIP lines and connect via Media Bridge
To save up on call routing /connection time and to support te ability to add as many users on call at runtime , a dedicated media bridge is established for every call.
A sip line activated is auto-answered by MM , creates a trunk and waits for other endpoint to join the bridge. The flow is as follows :
As INVITE arrives for an IDLE sip line , it is connected to a trunk and auto answered by a local MM bridge .
Since the call is already answered , when caller dials number for callee , collect the DTMF digits over RTP using RFC 2833 DTMF events.
Run inter-digit timer for digit collection and detect end of dialing on timeout.
The dialed trunk connection is made and call is added to media bridge
When provisional responses are received on the trunk connection, generate in-band call progress tones (ringing, proceeding etc) via the MM
When the line answers, the progress tones have to be stopped and the called party gets bridged to the calling party via the media bridge.
Call Diversion involves forwarding calls from zone to another zone. joinjed parties get call UPDATE status and forward response.
Call barge is the processing of joining an ongoing call . The barge event is usually propagated to joined parities via SIP INFO. Private lines do not allow barge in and are exclusively reserved for only few users.
Interconnectivity provided by an SBC ( Session Border Controller)
Hold-Resume and Music on Hold in multi-line evironment
While a regular p2p call involves simple reinvite based hold and resume with varrying SDP, the scenario is slightly more detailed for hold resume on bridged trunk connection , as explained below.
As the calls made are on bridge , a hold signal involves a RE-INIVITE with held-SDP to media manager (MM). If hold status on trunk is 200 OK the hold status will be sent to other call interfaces connected on the trunk. Else if hold is denied, 403 is sent back to hold-initiates.
Music on hold is an one way RTP mostly from media server.
For a bridged scenarios , separate Music on hold bridges are kept on Media Managers. When an UA has to hold , it is removed from original bridge and place on music on hold bridge. To be unhold/ resume it is placed back into the orignal bridge from music on hold bridge.
Conference
user initiates conference, the conference feature can execute on the zone where the user was logged on, irrespective of zones where the other conference attendees join from . The Call processing framework of originators zone completes the SDP exchange to establish two-way speech path among all the parties.
Incases there are multiple connections from a zone , a local MM conference bridge can be created for them which would connect back to originators MM conf bridge . this two part conf bridge will be transparent to the sip line sand users .
For provisioning inputs and settings setup a Diagnostics , Administration and Configuration platform which can process APIs for data services , licences , alarms or do remote device control such as using SNMP.
bridging multiple interfaces in different networks even between the IPv4 and IPv6 networks
auto NAT discovery and STUN
protocol conversion such as TLS to UDP etc
Flood detection and IP filtering
For SIP specific functionalities, SBC does
SIP validation involving checks on syntax and message contents also consistency checks are performed.
stateful and call aware. tracing, monitoring and checking for validitya and health of all the SIP messages
Topology hiding
Traffic filtering
Codec filtering , reordering , media pinning, transcoding, or call recording
Data replication brings High Availability (HA) with hot backups or even Active-Active solutions.
Traffic sharing and routing roles of SBC can include
IP-based and Digest-based authentication
limiting traffic by number of concurrent calls or calling rate.
Dialplan and/or Custom routing
Dispatching/Load-balancing to a backend cluster of servers
SBC’s can be physical hardware boxes or software based applications, as the name suggests their purpose is to control the session at border between the enterprise and external service provider. They can be used for various roles such as
SIP to PSTN – SIP is an IP protocol whereas PSTN is a TDM one , achieving interoperability is also the KRA of an SBC
SIP trunking – SBC provide a secure sip connectivity to connect calls to sip trunks which provide bulk calls functionality at a flat pricing.
support for various fixed or mobile endpoints – SBC ensure they are RFC compliant and can extend SIP to any kind of telecom endpoint like PSTN , GSM, fax , Skype , sipphone , IP phones etc.
NAT (Network address translator) – To meet the packet routing challenges across a firewall or even during private -public mapping. A combo of DHCP servers and NAT provider comes very handy to reroute or perform hole punching such that signalling and media packets are not dropped and meet the required endpoint. More about NAT here – NAT traversal using STUN and TURN.
Load balancing – Reverse proxies and Load balancers is a much adopted industry practise to mask the inner IPs of the VoIP platform and also route traffic appropriately between control and media server .
Security, QoS and Regulatory compliance – since SBCs are required to typically support a large array of clients they adhere to regulatory and industry accepted standards ,which also involves security features like AAA, TLS/SSL and other means for quality of assurance like logging and fault detection, preventing DDoS etc . In many cases SBC can also encrypt / decrypt RTP streams for probing , tapping or lawful inspection .
There are 2 ways to integrate IP calls to telecom provider endpoints such as GSM or LTE phones.
PRI lines
SIP trunks
Additional SBC features
Inaddition to above it is good to have if an SBC provides extra features like forking , emergency number dialing ( 911 ) or active directory integration . Real Time Analysis and monitoring of call and metrics are also expected from a SBC since they reside on edge of the network and are more vulnerable to threats . For example Dialogic Mediant SBC’s and gateways , Audio Codes SBCs
With the shift from on premise PBXs to cloud based VM or microservice architecture , SBC vendors adopt a lager umbrella of services also including automation scripts for checks , reporting tools / consoles , developer friendly APIs to manage sessions via SBC and even WebRTC gateways to connect browser endpoints.
A basic enterprise VoIP/SIP solution is illustrated in Figure. The key element is a soft switch (SIP PBX) which might be implemented as a combination of several SIP entities, such as SIP registrar, proxy server, redirect server, forking server, Back-To-Back User Agent (B2BUA) etc. SIP clients can be SIP hard-phones or soft-phones on PCs, PDAs etc. A PSTN gateway links the enterprise SIP PBX to the public PSTN. Enterprise applications, media servers, presence servers, and the VoIP/SIP PBX are interconnected through a company intranet.
VoIP System with IMS : With IMS, applications will be able to establish sessions across different access networks, with guaranteed QoS, flexible charging & AAA support. Call control, user’s database and services, which are the typical functions of softswitch, are controlled by separate units in IMS. CSCF (Call Session Control Function) handles session establishment, modification and release of IP multimedia sessions using the SIP/SDP protocol suite. Services features are separated from call control and handled by application servers. Subscriber’s database function is separated from service logic function and handled by HSS using open subscriber directory interface.
Link registration using subscribe-notify can be handled via Enterprise App server in PBX.
Forking proxy Setup of PBX : The enterprises SIP PBX can work as a forking proxy during call setup to redirect the calls.
Other usecases can involve presence sharing between different enterprise PBX with both domains interconnect their presence servers.
Any VOIP dependant system which deals with bulksome voice / video traffic from external endpoints is a usages scenarios. Listing few
provision of pre-defined enterprise based SIP URI.
Contact Call centres
Remote work / offsite monitoring
CRM solution for sales/marketing
Connecting webrtc click to dial from webpage to enterprise representatives
connecting enterprise UCC clients to PSTN endpoints
The There are many more features and usecases for an IP-PBX solution for an enterprise. The features of modern IP PBX systems are a big addon to internal secure telecom channel in an company and accross its various office.
There has been a significant shift in replacing hard PBX systems with software-based IP PBX such as using Freeswitch, Asterisk or other commercial-grade SIP servers which seamlessly integrate into other business software such as CRM systems, task force management systems. In recent times cloud telephony providers, particularly CPaaS platforms have revolutionized the IP telecommunication landscape with lightweight and feature-rich communication agents( web, native platform) and services such as programmable API to control call logic and services such as recording, IVR announcements, call parking, Automatic Queueing so on.
Elevated Call failure SIP 503 or Call timeout SIP 408
cron service or processed alerts
DB connections / connection pool process alerts
port check, unexpected result alert
cron zombie process checks alerts
Bulk calls checks
Process control supervisor or pm2 checks
Health and load on the reverse proxy, load balancer as Nginx alerts
VPN checks
SSL cert expiry checks
Health of Task scheduling services such as RabbitMQ, Celery Distributed Task Queue
Cluster status
Status of Crticial Application Server
Programming or Syntax error in the production environment
Distributed memory caching – redis , memcahe
SMS service using smsc on Kannel
Overview of VoIP platform DevOPS tools
This article is focussed around various tools required to operate and maintain a growing large scale VoIP Platform, which are mostly classified under following roles:
PCAP Collections
CICD on Jenkins pipeline
Configuration management using chef cookbooks
virtualization and containerization using Docker
Infrastructure management using terraform / Kubernetes
Packet Capture (PCAP) is an API that captures live network packets. Besides tracking, audit and RTC visualizers, PCAP is widely used for debugging faults such as during production alarm on high failure occurrences.
Example usecase: Production alert on 503 SIP response or log entry from a gateway is not as helpful as PCAP tracking of the session ID of call across various endpoints in and out of the network to determine the point of failure.Debugging involves :
Pre-specified SIP / RTP and related protocols capture
Docker containers can be used instead of virtual machines such as VirtualBox , to isolates applications and be OS and platform independent
Makes distributed development possible and automates the deployment possible
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
wait Block until one or more containers stop, then print their exit codes
see all iamges
> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
sipcapture/homer-cron latest fb2243f90cde 3 hours ago 476MB
sipcapture/homer-kamailio latest f159d46a22f3 3 hours ago 338MB
sipcapture/heplify latest 9f5280306809 21 hours ago 9.61MB
<none> <none> edaa5c708b3a
See all stats
> docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
f42c71741107 homer-cron 0.00% 52KiB / 994.6MiB 0.01% 2.3kB / 0B 602MB / 0B 0
0111765091ae mysql 0.04% 452.2MiB / 994.6MiB 45.46% 1.35kB / 0B 2.06GB / 49.2kB 22
Run command from within a docker instnace
docker exec -it bash
First see all processes
docker ps
select a process and enter its bash
docker exec -it 0472a5127fff bash
to edit or update a file inside docker either install vim everytime u login in resh docker conainer like
apt-get update
apt-get install vim
or add this to dockerfile
RUN [“apt-get”, “update”] RUN [“apt-get”, “install”, “-y”, “vim”]
see if ngrep is install , if not then install and run ngrep to get sip logs isnode that docker container
apt update
apt install ngrep
ngrep -p "14795778704" -W byline -d any port 5060
docker volume – Volumes are used for persisting data generated by and used by Docker containers. docker volumes have advantages over blind mounts such as easier to backup or migrate , managed by docker APIs, can be safely shared among multiple containers etc
docker stack – Lets to manager a cluster of docker containers thorugh docker swarm can be defined via docker-compose.yml file
docker service
create Create a new service
inspect Display detailed information on one or more services
logs Fetch the logs of a service or task
ls List services
ps List the tasks of one or more services
rm Remove one or more services
rollback Revert changes to a service’s configuration
scale Scale one or multiple replicated services
update Update a service
Run docker containers
sample run command
docker run -it -d --name opensips -e ENV=dev imagename:2.2
-it flags attaches to an interactive tty in the container.
-e gives envrionment variables
-d runs it in background and prints container id
Remove docker entities
To remove all stopped containers, all dangling images, and all unused networks:
docker system prune -a
To remove all unused volumes
docker system prune --volumes
To remove all stopped containers
docker container prune
sometimes docker images keep piling with stopped congainer such as
REPOSITORY TAG IMAGE ID CREATED SIZE d1dcfe2438ae 15 minutes ago 753MB 2d353828889b 16 hours ago 910MB ...
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0dd6698a7517 2d353828889b "/entrypoint.sh" 13 minutes ago Exited (137) 13 minutes ago hardcore_wozniak
to remove such images and their conainer , first stop and remove confainers
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
Terraform is used for building, changing and versioning infrastructure. Infra as Code – can run single application to datacentres via configuration files which create execution plan. It can manage low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc. Resource Graph – builds a graph of all your resources
tfenv can be used to manage terraform versions
> brew unlink terraform
tfenv install 0.11.14
tfenv list
This is used for declaring resources and descriptions of infrastructure and associated files have a .tf or .tf.json file extension Group of resources can be gathered into a module. Terraform configuration consists of a root module, where evaluation begins, along with a tree of child modules created when one module calls another.
Example : launch a single AWS EC2 instance , fle server1.tf
container orchestration platform , automating deployment, scaling, and management of containerized applications. Can deploy to cluster of computers, automating the distribution and scheduling as well
Service discovery and load balancing – gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.
Automatic bin packing – Automatically places containers based on their resource requirements and other constraints, while not sacrificing availability. Mix critical and best-effort workloads in order to drive up utilization and save even more resources.
Storage orchestration – Automatically mount the storage system of your choice, whether from local storage, a public cloud provider such as GCP or AWS, or a network storage system such as NFS, iSCSI, Gluster, Ceph, Cinder, or Flocker.
Self-healing – Restarts containers that fail, replaces and reschedules containers when nodes die, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
Automated rollouts and rollbacks – progressively rolls out changes to your application or its configuration, while monitoring application health to ensure it doesn’t kill all your instances at the same time.
Secret and configuration management – Deploy and update secrets and application configuration without rebuilding your image and without exposing secrets in your stack configuration.
Batch execution– manage batch and CI workloads, replacing containers that fail, if desired.
Horizontal scaling – Scale application up and down with a simple command, with a UI, or automatically based on CPU usage.
Starting Kubernetes…minikube version: v1.3.0
commit: 43969594266d77b555a207b0f3e9b3fa1dc92b1f
minikube v1.3.0 on Ubuntu 18.04
Running on localhost (CPUs=2, Memory=2461MB, Disk=47990MB) …
OS release is Ubuntu 18.04.2 LTS
Preparing Kubernetes v1.15.0 on Docker 18.09.5 …
kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
Pulling images …
Launching Kubernetes …
Done! kubectl is now configured to use "minikube"
dashboard was successfully enabled
Kubernetes Started
Basic Commands
start Starts a local kubernetes cluster
status Gets the status of a local kubernetes cluster
stop Stops a running local kubernetes cluster
delete Deletes a local kubernetes cluster
dashboard Access the kubernetes dashboard running within the minikube cluster
Images Commands:
docker-env Sets up docker env variables; similar to ‘$(docker-machine env)’
cache Add or delete an image from the local cache.
Configuration and Management Commands:
addons Modify minikube’s kubernetes addons
config Modify minikube config
profile Profile gets or sets the current minikube profile
update-context Verify the IP address of the running cluster in kubeconfig.
Networking and Connectivity Commands:
service Gets the kubernetes URL(s) for the specified service in your local cluster
tunnel tunnel makes services of type LoadBalancer accessible on localhost
Advanced Commands:
mount Mounts the specified directory into minikube
ssh Log into or run a command on a machine with SSH; similar to ‘docker-machine ssh’
kubectl Run kubectl
Troubleshooting Commands:
ssh-key Retrieve the ssh identity key path of the specified cluster
ip Retrieves the IP address of the running cluster
logs Gets the logs of the running instance, used for debugging minikube, not user code.
update-check Print current and latest version number
kubectl
controls the Kubernetes cluster manager.
Basic Commands (Beginner):
create Create a resource from a file or from stdin.
expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
run Run a particular image on the cluster
set Set specific features on objects
explain Documentation of resources
get Display one or many resources
edit Edit a resource on the server
delete Delete resources by filenames, stdin, resources and names, or by resources and label selector
Deploy Commands:
rollout Manage the rollout of a resource
scale Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController
Cluster Management Commands:
certificate Modify certificate resources.
cluster-info Display cluster info
top Display Resource (CPU/Memory/Storage) usage.
cordon Mark node as unschedulable
uncordon Mark node as schedulable
drain Drain node in preparation for maintenance
taint Update the taints on one or more nodes
Troubleshooting and Debugging Commands:
describe Show details of a specific resource or group of resources
logs Print the logs for a container in a pod
attach Attach to a running container
exec Execute a command in a container
port-forward Forward one or more local ports to a pod
proxy Run a proxy to the Kubernetes API server
cp Copy files and directories to and from containers.
auth Inspect authorization
Advanced Commands:
diff Diff live version against would-be applied version
apply Apply a configuration to a resource by filename or stdin
patch Update field(s) of a resource using strategic merge patch
replace Replace a resource by filename or stdin
wait Experimental: Wait for a specific condition on one or many resources.
convert Convert config files between different API versions
kustomize Build a kustomization target from a directory or a remote url.
Settings Commands:
label Update the labels on a resource
annotate Update the annotations on a resource
completion Output shell completion code for the specified shell (bash or zsh)
Other Commands:
api-resources Print the supported API resources on the server
api-versions Print the supported API versions on the server, in the form of “group/version”
config Modify kubeconfig files
plugin Provides utilities for interacting with plugins.
version Print the client and server version information
DevOps monitoring tools nagios
Manage Docker configs
create Create a config from a file or STDIN
inspect Display detailed information on one or more configs
ls List configs
rm Remove one or more configs
Manage containers
attach Attach local standard input, output, and error streams to a running container
commit Create a new image from a container’s changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container’s filesystem
exec Run a command in a running container
export Export a container’s filesystem as a tar archive
inspect Display detailed information on one or more containers
kill Kill one or more running containers
logs Fetch the logs of a container
ls List containers
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
prune Remove all stopped containers
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
run Run a command in a new container
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
wait Block until one or more containers stop, then print their exit codes
Alternatives, Senu multi-cloud monitoring or Raygun
Aggregate logs into logstash and provide search and filtering via Elastic Search and Kibana. Can also trigger alerts or notifications on specific keyword searches in logs such as WARNING or ERRRO or call_failed. Some common alert scenarios include :
SBC and proxy gateways failures – check states of VM instance
DNS caching alerts – Domain Name System (DNS) caching, a Dynamic Host Configuration Protocol (DHCP) server, router advertisement and network boot alerts from service such as dnsmasq
Disk usage alert – setup alerts for 80% usage and trigger an alarm to either manually prune or create automatic timely archive backups. check the percentage of DISK USAGE
df -h
Mostly it is either the logs file or pcap recorder which need to be archieved in external storage.
Use logrotate – it can rotates, compresses, and mails system logs
config file for logrorate – logrotate -vf /etc/logrotate.conf
Elevated Call failure SIP 503 or Call timeout SIP 408 – high frequency of failed calls indicate an internal issue and must be followed up by smoke testing the entire system to identify any probable issue such as undetected frequent crashes of any individual component or any blacklisting by a destination endpoint etc
sudo tail -f sip.log | grep 503
or
sudo tail -f sip.log | grep WARNING
cron service or processed alerts –
ps axf
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
3 ? I< 0:00 \_ [rcu_gp]
4 ? I< 0:00 \_ [rcu_par_gp]
5 ? I 0:00 \_ [kworker/0:0-eve]
6 ? I< 0:00 \_ [kworker/0:0H-kb]
7 ? I 0:00 \_ [kworker/0:1-eve]
8 ? I 0:00 \_ [kworker/u4:0-nv]
9 ? I< 0:00 \_ [mm_percpu_wq]
10 ? S 0:00 \_ [ksoftirqd/0]
11 ? I 0:00 \_ [rcu_sched]
12 ? S 0:00 \_ [migration/0]
13 ? S 0:00 \_ [cpuhp/0]
14 ? S 0:00 \_ [cpuhp/1]
15 ? S 0:00 \_ [migration/1]
16 ? S 0:00 \_ [ksoftirqd/1]
17 ? I 0:00 \_ [kworker/1:0-eve]
18 ? I< 0:00 \_ [kworker/1:0H-kb]
or checks cron status
service cron status
● cron.service - Regular background program processing daemon
Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2016-06-26 03:00:37 UTC; 1min 17s ago
Docs: man:cron(8)
Main PID: 845 (cron)
Tasks: 1 (limit: 4383)
CGroup: /system.slice/cron.service
└─845 /usr/sbin/cron -f
Jun 26 03:00:37 ip-172-31-45-21 systemd[1]: Started Regular background program processing daemon.
Jun 26 03:00:37 ip-172-31-45-21 cron[845]: (CRON) INFO (pidfile fd = 3)
Jun 26 03:00:37 ip-172-31-45-21 cron[845]: (CRON) INFO (Running @reboot jobs)
restart or start cron service if required
DB connections / connection pool process – keep listening for any alerts on DB connections failure or even warnings as this can be due to too many read operations such as in DDOS and can escalate very quickly
cron zombie process checks – zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the “Terminated state”. List xombie process and kill them with pid to free up .
kill -9 <PID1>
Bulk calls checks – consult ongoing call cmd commands for application server such as For Freeswitch use
Incase of DDOS or other macious attacker IP identification block the IP
iptables -I INPUT -s y.y.y.y -j DROP
Can also use fail2ban
>apt-get update && apt-get installfail2ban
Additionally check how many dispatchers are responding on outbound gateway
opensipsctl dispatcher dump
Process control supervisor or pm2 checks – supervisor is a Linux Process Control System that allows its users to monitor and control a number of processes
ps axf | grep supervisor
for pm2
> pm2 status
[PM2] Spawning PM2 daemon with pm2_home=/Users/altanai/.pm2
[PM2] PM2 Successfully daemonized
┌─────┬───────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │
htop to check memeory and CPU
Health and load on the reverse proxy, load balancer as Nginx – perform a direct curl request to host to check if Nginx responds with a non 4xx / 5xx response or not
curl -v <public-fqdn-of-server>
Incase of error response , restart
/etc/init.d/nginx start
Incase of updates restart ngnix config
nginx -s reload
For HTTP/SSL proxy daemon such as tiny proxy which are used for fast resposne , set the MinSpareServers, MaxSpareServers , MaxClients , MaxRequestsPerChild etc appropriately
VPN checks – restart fireealls or IPsec incase of ssues
/etc/init.d/ipsec restart
Additionally also check ssh service
ps axf | grep sshd
restart sshd if required
SSL cert expiry checks – to keep the operations running securely and prevent and abrupt termination it is a good practise to run regular certificate expiry checks for SSL certs especially on secure HTTP endpoint like APIs , web server and also on SIP applications servers for TLS. If any expiry is due in < 10 days to trigger an alert to renew the certs
Health of Task scheduling services such as RabbitMQ, Celery Distributed Task Queue – remote debugging of these can be set up via pdb which supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame.
It can also be set up via using the client libraries provided by these Queue services themselves
Cluster status – setup an efficient health check service which monitors the cluster status for High Availability. JSON object depicting the status of cluster shards
fscli > show status UP 0 years, 0 days, 0 hours, 58 minutes, 33 seconds, 15 milliseconds, 58 microseconds FreeSWITCH (Version 1.6.20 git 987c9b9 2018-01-23 21:49:09Z 64bit) is ready 3 session(s) since startup 0 session(s) - peak 1, last 5min 1 0 session(s) per Sec out of max 30, peak 1, last 5min 1 1000 session(s) max min idle cpu 0.00/80.83 Current Stack Size/Max 240K/8192K
Programming or Syntax error in the production environment – mostly arising due to incomplete QA/testing before pushing new changes to production. Should trigger alerts for dev teams and meet with hot patches.
Many programing application development frameworks have inbuild libs for debugging , exceotion handling and reporting such as
backend service in Django
API service in Go
Distributed memory caching – redis , memcahe : Redis info shows the master -salve configuration for all the instances as well as their memeory and cpu status.
Market trends are not in favour of Telecom Service /providers with increasing use of OTT ( Over The Top ) applications like WhatsApp, Facebook messenger, Google hangouts, skype, Viber, etc. OTT applications are often blamed to take a stake in voice traffic revenue by using IP calls where the telco could’ve charged based on its rate plan of call seconds. This especially intensifies for long-distance or international calls where customers can use OTT providers instead of expensive telco rate plans.
What is an OTT ?
An Over The Top ( OTT ) application is one which provides communication services over Internet . Therefore these bypass the communication billing system setup by a Telecom Operator , resulting in no gain or loss of revenue to Telecom Operator who is providing the Internet service to user in first place .
Hence we see that OTT are major source of concern for Telecom Operators whose traditional and obviously expensive ( when compared to OTTs free service ) billing models are facing disruption .
Telecom Regulatory bodies around the world
The telecom regulatory authorities in some of the countries are for example listed as :
Canadian Radio-television and Telecommunications Commission (CRTC) – Canada
Ministry of Information Industry (MII) – China
Autorité de Régulation des Communications Électroniques et des Postes (ARCEP) – France
Bundesnetzagentur (BNA) – Germany
Telecom Regulatory Authority of India (TRAI) – India
Ministry for Communications and Informatization of the Russian Federation (Minsvyaz) – Russia
Infocomm Development Authority of Singapore (IDA) – Singapore
Independent Communications Authority of South Africa (ICASA) – south Africa
Federal Communications Commission (FCC) , National Association of Regulatory Utility Commissioners (regulators of individual states) (NARUC) , CTIA – The Wireless Association (CTIA) – USA
Such telecom regulatory bodies get to decide whether to enforce differential price to end consumers for using OTT so that telecom service providers can benefit or keep the Internet fair and open by passing Net Neutrality Laws and Bills and amendments .
What is Net Neaurality ?
The fundamental principle of Net Neurality is that Telecom Operators should not block , slow down or charge consumers extra for using other services as their means of communication. This states that it is wrong to charge users above the regular data rates for using VOIP apps and other internet based communication services.
The following counteries have adopted principles of Net Neutrality by passing bills or making law .
Chile – Chile’s General Law of Telecommunications, “No [ISP] can block, interfere with, discriminate, hinder, nor restrict the right of any Internet user of using, send, receive, or offer any content, application, or legitimate service through the Internet, as well as any activity or legitimate use conducted through the Internet.”
Brazil – ” Internet Bill of Rights ” makes equal access to internet mandatory in Brazil .
Netherlands – Even European Union has adopted Netherlands’ Net Neutrality amendment which reads “traffic should be treated equally, without discrimination, restriction or interference, independent of the sender, receiver, type, content, device, service or application.”
USA – Citizens make ‘We the People’ platform to ‘Restore Net Neutrality By Directing the Federal Communications Commission (FCC) to Classify Internet Providers as ‘Common Carriers‘. Therefore not allowing them to either throttle speed by paid prioritization , discriminate in pricing or block any broadband access to legal content . Above facts are from this tech.firstpost.com article.
Inspite of the fact that I Support Net Neutrality with all my heart , as a telecom engineer I understand the cost investment made by Telecom operators in providing am efficient communication network to its subscribers ( Access , Network and Application layers ). Therefor I do have my sympathies with the Telcos and to level out the wide ranging conflict between Telcos and ISP ( Internet Service Providers ) , I pen down the following points which reflect the Telecom Operators Problems and also highlight the solutions that can be adopted to counteract the OTT threat .
Depleting revenue for Telco
Messaging – OTT messaging cost operators $13.9 billion, or 9% of message revenue in 2013
Voice – Voice services under threat from VOIP services like Skype, Viber
OTT apps – Voice & Message apps have been the operator’s biggest headache. Its time Operator should launch its own OTT Services
Data Traffic – The utilization is yet to reach its peak. Will face challenges from WiFi access
Critical Pain areas – Erosion of Operator’s revenue from voice and (especially) messaging
Telco’s OTT Application
At this stage, a telecom Service provider / Operator must enter the apps market and bring forth a Messenger which is more powerful, interactive and awesome than an OTT application. Fortunately, the Operator can always couple this application with his background telecom infrastructure to provide the edge in performance and functionalities.
Road block while developing a OTT application for a Telecom Service Provider :
Investment in Data Network is not being utilized due to lack of service
Reuse of Existing business Logic and extending the service reach across devices and networks is tough
Operator already has full fledged network Infrastructure in Place
Desire for minimum CAPEX while investing in new technologies
compete with OTT players and open new revenue streams is a challenge
Next we find the way of solving the problems and integrating them together to form a Solution .
OTT Application for Telecom Service provider
Introduce new services to benefit from investment on Data Plans and Bandwidth
Expose REST API to enable 3trd party Integration with existing network Infrastructure
Partner with individual OTT players to make new services that do not compete on core competencies like billing etc
Use protocols like SIP that reduce CAPEX and have goto market more quickly
Go for enriched service that lead to better user experience
This write-up outlines the process of creating an OTT application for a Telecom Service Provider. Components for the application include cloud Address Book, Video Chatting, Location share, Contact synchronization ,REST-based thin client , OS and device agnostic etc shown in the figure below:
telco’s OTT app
The Application is designed to close knit with Operator’s own infrastructure hence the crucial entities like Network Address Book , Location Service are synced and fetched from Backend Network .
OTT application Feature Overview
Smart Address Book
Automatic: Get contacts from Gmail, Facebook
Fast search by first, last name, frequently
dialed number
Roadmap: View calendar events
Personal: Get image from Gmail and display in contacts list
Geo Location
Share own location during chatting
Get map for calculating the distance between two chat users
Roadmap : Trigger device (say Switch on/off AC before reaching home) from a threshold distance away from home location
Messaging
Ad-hoc Chat
Session Based Chat
Voice Input for texting
Presence information of contacts
RoadMap: Legacy message integration
Telephony
Voice call to mobile
Voice call to PSTN
Video call to other @imAll user
Share images during voice call to other
Device agnostic
Compatible with IOS, windows
Can run as native app on ipad
Can run as browser client on windows
RoadMap: native app for android, windows phone,blackberry10
Roadmap
features of Unified Communications ( UC)
To upgrade the application and provide enganced and enrich service support the I propose the following roadmap.
From plain vanilla voice and video calling ( supported by every other OTT application ) our application should progress towards legacy telecom support whihc included PSTN , GSM , ISDN etc . This requires backbone of telecom network and a good setup for media codec conversion to suit various legacy media codecs .
Road Map from Traditional to New age services
Voice and video calling
Legacy services support like MMS and SMS
Integration with 3rd party Vendors
Give new enriched services like Multilingual support , file transfer , screen-sharing etc
give facility to integrated web plugins for web calling
To keep the interest of customers it is essential that the application be supported on other popular OTT services like skype , Gtalk . for exmaple a caller should be able to make call from Skype / Gtalk to our application .Multilingual capabilities, support for larger protocol spectrum will just act like icing on the cake .
How does it benefit the Operator??
Saves on development cost and time
Device Agnostic OTT Applications
Simplified Service deployment
Saves licensing cost per client
Reuses existing Messaging and Address Book service logic.
Open New Revenue Streams for operator
No separate SIP stack required for the client
Faster Time to Market
Update : At the time of writing this post I did not anticipate the wave of change that bring focus on subjects like “net neutrality” , ” Save the internet” and “free internet”. However now most of the telcos providers have either joined the bandwagon by prividing SIP trunk endpoinst for cloud teelphony providers ( eg twilio, Google Calls) or have made their own IP call application for B2B customers.
With the fast pace of telecom evolution both towards the access network front ( ie GSM , UMTS , 3G , 4G , LTE , VOLTE ) to core network side ( ie application servers , registrar , proxies , gateway , media server etc ) a CSP ( content service provider ) is trying hard to keep up with the user expectation . The user expects a plethora of services , reduced cost and high speed bandwidth . If this was not enough a CSP also has competition OTT ( Over The Top ) Players who provide communication and messaging for FREE .
You can read on how OTT’s players are disruption the revenue streams of traditional telecom operators and how can Telco’s develop their own OTT app , integrated with their backend system to answer to that challenge here – OTT ( Over the Top ) Communication applications
The following points outline the major business challenges faced by telecom operators today .
Technology Evolution Challenges
The increased data speeds and further more increasing hunger for the data overwhelms the existing network infrastructures.
Ensure uniform service experience across the network technologies to check the customer churn.
Access / Radio Technology independent delivery of services.
Enhance Reuse for exiting investments.
Multiple Service Platform Challenges
Typical network constitutes of Multiple Service Platforms increasing network complexity and integration challenges many fold.
Heterogeneous multiple SDP Solutions typically deployed to cater to Multiple Types of Networks/ Standards/Variants
Service Islands makes introduction of seamless services a challenging task for the CSP
Transport Upgrade and Convergence of Wireless Wireline
Retain investments in copper wire systems while migrating towards next generation Fiber Optic systems.
Severe competition among wire-line and wireless operators to provide latest services to retain subscriber base.
Fixed Mobile Convergence leading to a diminishing gap among the revenue shares of various operators in the space, and leading to losses for wire-line only players.