WebRTC CPaaS ( Communication Platform as a Service )

A CPasS ( communication platform as a service ) is cloud based communication platform alo B2B cloud communications platform that provides real time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces .

Traditionally , with IP protected protocols , licensed codecs maintaining a signalling protocol stack , network interfaces building communication platform was a costly affair. Cisco , Facetime , Skype were the only OTT ( over the top) players taking away from telco’s call revenue .

However with the advent of standardize , open source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real time communications on his platform has many options to choose from. This article provides an insight to how CPaaS solution are architectured and programmed

Sample CPass Architecture build on open source technologies

I have written an article before on Steps for building and deploying WebRTC solution , which includes standalone , cloud hosted and TURN based NAT handler systems .

A typical CPaaS solution provides

  • Call server + Media Server that can be interacted with via UA
  • Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
  • APIs that can trigger automated calls and perform preprogrammed routing.
  • Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
  • Some CPaaS providers also add features like transcribing ,transcoding , recording , playback etc to provide edge over other CPaaS providers

Datacentre vs Cloud server

-tbd

Advantages of using a CPaaS vs building your own RTC platform

Tech insights and experiences

companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday

keeping up with emerging trends

Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP , other ML based enhancements are provided by CPaaS company

Auto Scaling , High Availability

A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure

CAPEX and OPEX

using a Cpaas saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models

In a nutshell I have come across so many small size startups trying to build CPaaS solution from scratch but only realising it after weeks of trying to build a MVP that they are stuck with firewall, NAT, media quality or interoperability issues . Since there are so many solution already out in the market it is best to instead use them as underlying layer and build applications services using it such as callcentre or CRM services etc .

VoIP system DevOps, operations and Infrastructure management, Automation

This article is focussed around various toold and technologies that are required to operate and mantain a growing or large scale VoIP Platform .

Read about VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform which includes :

  • Scalable and Flexible SIP platform building
  • Cluster SIP telephony Server for High Availability
  • Failure Recovery
  • Multi-tier cluster architecture
  • Role Abstraction / Micro-Service based architecture
  • Distributed Event management and Event-Driven architecture
  • Containerization
  • Autoscaling Cloud Servers
  • Open standards and Data Privacy
  • Flexibility for inter-working – NextGen911 , IMS , PSTN
  • security and Operational Efficiencies

Read more about SIP VoIP system Architecture which includes

  • Infrastructure Requirements
  • Integral Components of a VOIP SIP-based architecture
  • RTP ( Real-Time Transport Protocol ) / RtCP
  • SIP gateways, registrar, proxy, redirect, application
  • Developing SIP-based applications – basic call routing, media management
  • SIP platform Development – NAt and DNS , Cross-platform and integration to External Telecommunication provider landscape , Databases

The contenst of this article are

  • PCAP Collection
  • CICD over Jenkins
  • Configuration management using chef cookbooks
  • virtualization and containerization using Docker
  • Infrastructure management using terraform / Kubernetes
  • Logs Analysis and Alarming
Overview of VoIP platform DevOPS tools

PCAP Collection

Data to be captured from Pcap,

  1. DTMF – Both in-band and out of band DTMF for every call, along with the time stamp.
  2. Codec negotiations –  Extracting codecs from PCAP lets us 
    1. Validate later whether there were codec changes without prior SIP message,
    2. If the call has been hung up with 488 error code then it was due to which  codec 
  3. SIP errors – track deviations from standard SIP messaging. 
    1. Identify known erroneous SIP messaging scenarios such as for MITM or replay attacks
  4. RTCP Media stats – extract Jitter, Loss, RTT with RTCP reports for both the incoming and outgoing stream.
  5. Identify Media or ACK Timeouts 
    1. Check whether a party has not sent any media packet for > 60 s (media time out threshold duration)
    2. When a call is hung up due to ACK time out.
  6. Audio stream – After GDPR, take explicit permission from users before storing audio streams

Continuous Integration and Delivery Automation using Jenkins

CICD provides continous delivery hub , distribute work across multiple machines, helping drive builds, tests and deployments across multiple platforms .

Jenkins jobs is a self-contained Java-based program extensible using plugins.

Jenkins pieline– orchestrates and automates building project in Jenkins

Configuration management using chef cookbooks

Alternatives like puppet and Ansible, which are also a cross-platform configuration management platform

Compute virtualization and containerization using Docker

Docker containers can be used instead of virtual machines such as VirtualBox , to isolates applications and be OS and platform independent
Makes distributed development possible and automates the deployment possible

  • stop Stop one or more running containers
  • top Display the running processes of a container
> docker top 4417600169e8
UID PID PPID C STIME TTY TIME CMD
root 9913 9888 0 08:50 ? 00:00:00 bash /point.sh
root 10083 9913 0 08:50 ? 00:00:01 /usr/sbin/worker
root 10092 10083 0 08:50 ? 00:00:02 /micro-service
  • unpause Unpause all processes within one or more containers
  • update Update configuration of one or more containers
  • wait Block until one or more containers stop, then print their exit codes

see all iamges

> docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             SIZE
sipcapture/homer-cron       latest              fb2243f90cde        3 hours ago         476MB
sipcapture/homer-kamailio   latest              f159d46a22f3        3 hours ago         338MB
sipcapture/heplify          latest              9f5280306809        21 hours ago        9.61MB
<none>                      <none>              edaa5c708b3a        

See all stats

>  docker stats
CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
f42c71741107        homer-cron          0.00%               52KiB / 994.6MiB      0.01%               2.3kB / 0B          602MB / 0B          0
0111765091ae        mysql               0.04%               452.2MiB / 994.6MiB   45.46%              1.35kB / 0B         2.06GB / 49.2kB     22

Run command from within a docker instnace

docker exec -it  bash

First see all processes

docker ps

select a process and enter its bash

docker exec -it 0472a5127fff bash

to edit or update a file inside docker either install vim everytime u login in resh docker conainer like

apt-get update
apt-get install vim

or add this to dockerfile

RUN [“apt-get”, “update”]
RUN [“apt-get”, “install”, “-y”, “vim”]

see if ngrep is install , if not then install and run ngrep to get sip logs isnode that docker container

apt update
apt install ngrep
ngrep -p "14795778704" -W byline -d any port 5060

docker volume – Volumes are used for persisting data generated by and used by Docker containers.
docker volumes have advantages over blind mounts such as easier to backup or migrate , managed by docker APIs, can be safely shared among multiple containers etc

docker stack – Lets to manager a cluster of docker containers thorugh docker swarm can be defined via docker-compose.yml file

docker service

  • create Create a new service
  • inspect Display detailed information on one or more services
  • logs Fetch the logs of a service or task
  • ls List services
  • ps List the tasks of one or more services
  • rm Remove one or more services
  • rollback Revert changes to a service’s configuration
  • scale Scale one or multiple replicated services
  • update Update a service

Run docker containers

sample run command

docker run -it -d --name opensips -e ENV=dev imagename:2.2

-it flags attaches to an interactive tty in the container.
-e gives envrionment variables
-d runs it in background and prints container id

Remove docker entities

To remove all stopped containers, all dangling images, and all unused networks:

docker system prune -a

To remove all unused volumes

docker system prune --volumes

To remove all stopped containers

docker container prune
sometimes docker images keep piling with stopped congainer such as 

REPOSITORY                                                             TAG                 IMAGE ID            CREATED             SIZE                                                                              d1dcfe2438ae        15 minutes ago      753MB                                                                           2d353828889b        16 hours ago        910MB                                                          ...
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                        PORTS               NAMES

0dd6698a7517        2d353828889b        "/entrypoint.sh"         13 minutes ago      Exited (137) 13 minutes ago                       hardcore_wozniak

to remove such images and their conainer , first stop and remove confainers

docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)

then remove all dangling images

docker rmi  $(docker images -aq --filter dangling=true)

Infrastructure management using terraform

Terraform is used for building, changing and versioning infrastructure.
Infra as Code – can run single application to datacentres via configuration files which create execution plan.
It can manage low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc.
Resource Graph – builds a graph of all your resources

tfenv can be used to manage terraform versions

> brew unlink terraform
tfenv install 0.11.14
tfenv list 

Terraform configuration language

used for declaring resources and descriptions of infrastructure
.tf or .tf.json file extension
Group of resources can be gathered into a module
Terraform configuration consists of a root module, where evaluation begins, along with a tree of child modules created when one module calls another.

Exmaple : launch a single AWS EC2 instance , fle server1.tf

provider "aws" {
  profile    = "default"
  region     = "us-east-1"
}

resource "aws_instance" "server1" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
}

note : AMI IDs are region specific.
profile attribute here refers to the AWS Config File in ~/.aws/credentials

Terraform command line interface (CLI)

engine for evaluating and applying Terraform configurations.
uses plugins called providers that each define and manage a set of resource types

Command Usage: terraform [-version] [-help] [args]

  • apply Builds or changes infrastructure
  • console Interactive console for Terraform interpolations
  • destroy Destroy Terraform-managed infrastructure
  • env Workspace management
  • fmt Rewrites config files to canonical format
  • get Download and install modules for the configuration
  • graph Create a visual graph of Terraform resources
  • import Import existing infrastructure into Terraform
  • init Initialize a Terraform working directory
  • output Read an output from a state file
  • plan Generate and show an execution plan
  • providers Prints a tree of the providers used in the configuration
  • refresh Update local state file against real resources
  • show Inspect Terraform state or plan
  • taint Manually mark a resource for recreation
  • untaint Manually unmark a resource as tainted
  • validate Validates the Terraform files
  • version Prints the Terraform version
  • workspace Workspace management
  • 0.12upgrade Rewrites pre-0.12 module source code for v0.12
  • debug Debug output management (experimental)
  • force-unlock Manually unlock the terraform state
  • push Obsolete command for Terraform Enterprise legacy (v1)
  • state Advanced state management

terraform init
initialize a working directory containing Terraform configuration files.

terraform validate
checks that verify whether a configuration is internally-consistent, regardless of any provided variables or existing state.

Kubernetes

container orchestration platform , automating deployment, scaling, and management of containerized applications. Can deploy to cluster of computers, automating the distribution and scheduling as well

Service discovery and load balancing – gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.

Automatic bin packing – Automatically places containers based on their resource requirements and other constraints, while not sacrificing availability. Mix critical and best-effort workloads in order to drive up utilization and save even more resources.

Storage orchestration – Automatically mount the storage system of your choice, whether from local storage, a public cloud provider such as GCP or AWS, or a network storage system such as NFS, iSCSI, Gluster, Ceph, Cinder, or Flocker.

Self-healing – Restarts containers that fail, replaces and reschedules containers when nodes die, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.

Automated rollouts and rollbacks – progressively rolls out changes to your application or its configuration, while monitoring application health to ensure it doesn’t kill all your instances at the same time.

Secret and configuration management – Deploy and update secrets and application configuration without rebuilding your image and without exposing secrets in your stack configuration.

Batch execution– manage batch and CI workloads, replacing containers that fail, if desired.

Horizontal scaling – Scale application up and down with a simple command, with a UI, or automatically based on CPU usage.

create minikube cluster and deploy pods

prerequisities : docker , curl , redis , others

install minikube

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
chmod +x minikube
install minikube /usr/local/bin

Install kubectl

snap install kubectl --classic
ln -s /snap/bin/kubectl /usr/local/bin

Setup Minikube

minikube start --vm-driver=none
minikube addons enable registry-creds
kubectl -n kube-system create secret generic registry-creds-ecr
kubectl -n kube-system create secret generic registry-creds-gcr
kubectl -n kube-system create secret generic registry-creds-dpr
minikube addons configure registry-creds
Starting Kubernetes…minikube version: v1.3.0
 commit: 43969594266d77b555a207b0f3e9b3fa1dc92b1f
 minikube v1.3.0 on Ubuntu 18.04
 Running on localhost (CPUs=2, Memory=2461MB, Disk=47990MB) …
 OS release is Ubuntu 18.04.2 LTS
 Preparing Kubernetes v1.15.0 on Docker 18.09.5 …
 kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
 Pulling images …
 Launching Kubernetes …
 Done! kubectl is now configured to use "minikube"
 dashboard was successfully enabled
 Kubernetes Started 

Basic Commands

  • start Starts a local kubernetes cluster
  • status Gets the status of a local kubernetes cluster
  • stop Stops a running local kubernetes cluster
  • delete Deletes a local kubernetes cluster
  • dashboard Access the kubernetes dashboard running within the minikube cluster

Images Commands:

  • docker-env Sets up docker env variables; similar to ‘$(docker-machine env)’
  • cache Add or delete an image from the local cache.

Configuration and Management Commands:

  • addons Modify minikube’s kubernetes addons
  • config Modify minikube config
  • profile Profile gets or sets the current minikube profile
  • update-context Verify the IP address of the running cluster in kubeconfig.

Networking and Connectivity Commands:

  • service Gets the kubernetes URL(s) for the specified service in your local cluster
  • tunnel tunnel makes services of type LoadBalancer accessible on localhost

Advanced Commands:

  • mount Mounts the specified directory into minikube
  • ssh Log into or run a command on a machine with SSH; similar to ‘docker-machine ssh’
  • kubectl Run kubectl

Troubleshooting Commands:

  • ssh-key Retrieve the ssh identity key path of the specified cluster
  • ip Retrieves the IP address of the running cluster
  • logs Gets the logs of the running instance, used for debugging minikube, not user code.
  • update-check Print current and latest version number

kubectl

controls the Kubernetes cluster manager.

Basic Commands (Beginner):

  • create Create a resource from a file or from stdin.
  • expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
  • run Run a particular image on the cluster
  • set Set specific features on objects
  • explain Documentation of resources
  • get Display one or many resources
  • edit Edit a resource on the server
  • delete Delete resources by filenames, stdin, resources and names, or by resources and label selector

Deploy Commands:

  • rollout Manage the rollout of a resource
  • scale Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
  • autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController

Cluster Management Commands:

  • certificate Modify certificate resources.
  • cluster-info Display cluster info
  • top Display Resource (CPU/Memory/Storage) usage.
  • cordon Mark node as unschedulable
  • uncordon Mark node as schedulable
  • drain Drain node in preparation for maintenance
  • taint Update the taints on one or more nodes

Troubleshooting and Debugging Commands:

  • describe Show details of a specific resource or group of resources
  • logs Print the logs for a container in a pod
  • attach Attach to a running container
  • exec Execute a command in a container
  • port-forward Forward one or more local ports to a pod
  • proxy Run a proxy to the Kubernetes API server
  • cp Copy files and directories to and from containers.
  • auth Inspect authorization

Advanced Commands:

  • diff Diff live version against would-be applied version
  • apply Apply a configuration to a resource by filename or stdin
  • patch Update field(s) of a resource using strategic merge patch
  • replace Replace a resource by filename or stdin
  • wait Experimental: Wait for a specific condition on one or many resources.
  • convert Convert config files between different API versions
  • kustomize Build a kustomization target from a directory or a remote url.

Settings Commands:

  • label Update the labels on a resource
  • annotate Update the annotations on a resource
  • completion Output shell completion code for the specified shell (bash or zsh)

Other Commands:

  • api-resources Print the supported API resources on the server
  • api-versions Print the supported API versions on the server, in the form of “group/version”
  • config Modify kubeconfig files
  • plugin Provides utilities for interacting with plugins.
  • version Print the client and server version information

DevOps monitoring tools nagios

Manage Docker configs

  • create Create a config from a file or STDIN
  • inspect Display detailed information on one or more configs
  • ls List configs
  • rm Remove one or more configs

Manage containers

  • attach Attach local standard input, output, and error streams to a running container
  • commit Create a new image from a container’s changes
  • cp Copy files/folders between a container and the local filesystem
  • create Create a new container
  • diff Inspect changes to files or directories on a container’s filesystem
  • exec Run a command in a running container
  • export Export a container’s filesystem as a tar archive
  • inspect Display detailed information on one or more containers
  • kill Kill one or more running containers
  • logs Fetch the logs of a container
  • ls List containers
  • pause Pause all processes within one or more containers
  • port List port mappings or a specific mapping for the container
  • prune Remove all stopped containers
  • rename Rename a container
  • restart Restart one or more containers
  • rm Remove one or more containers
  • run Run a command in a new container
  • start Start one or more stopped containers
  • stats Display a live stream of container(s) resource usage statistics
  • stop Stop one or more running containers
  • top Display the running processes of a container
  • unpause Unpause all processes within one or more containers
  • update Update configuration of one or more containers
  • wait Block until one or more containers stop, then print their exit codes

Alternatives, Senu multi-cloud monitoring or Raygun

Monitoring, debugging, logs analysis and alarms

Aggregate logs into logstash and provide search and filtering via Elastic Search and Kibana. Can also trigger alerts or notifications on specific keyword searches in logs such as WARNING or ERRRO or call_failed.

Some common alert scenarios include :

SBC and proxy gateways failures – check states of VM instance

DNS caching alerts – Domain Name System (DNS) caching, a Dynamic Host Configuration Protocol (DHCP) server, router advertisement and network boot alerts from service such as dnsmasq

Disk usage alert – setup alerts for 80% usage and trigger an alarm to either manually prune or create automatic timely archive backups.
check the percentage of DISK USAGE

df -h

Mostly it is either the logs file or pcap recorder which need to be archieved in external storage.

Use logrotate – it can rotates, compresses, and mails system logs

config file for logrorate – logrotate -vf /etc/logrotate.conf

/var/log/messages {
    rotate 5
    weekly
    postrotate
        /usr/bin/killall -HUP syslogd
    endscript
}

Elevated Call failure SIP 503 or Call timeout SIP 408 – high frequency of failed calls indicate an internal issue and must be followed up by smoke testing the entire system to identify any probable issue such as undetected frequent crashes of any individual component or any blacklisting by a destination endpoint etc

sudo tail -f sip.log | grep 503

or

sudo tail -f sip.log | grep WARNING

cron service or processed alerts

 ps axf
  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        I<     0:00  \_ [rcu_gp]
    4 ?        I<     0:00  \_ [rcu_par_gp]
    5 ?        I      0:00  \_ [kworker/0:0-eve]
    6 ?        I<     0:00  \_ [kworker/0:0H-kb]
    7 ?        I      0:00  \_ [kworker/0:1-eve]
    8 ?        I      0:00  \_ [kworker/u4:0-nv]
    9 ?        I<     0:00  \_ [mm_percpu_wq]
   10 ?        S      0:00  \_ [ksoftirqd/0]
   11 ?        I      0:00  \_ [rcu_sched]
   12 ?        S      0:00  \_ [migration/0]
   13 ?        S      0:00  \_ [cpuhp/0]
   14 ?        S      0:00  \_ [cpuhp/1]
   15 ?        S      0:00  \_ [migration/1]
   16 ?        S      0:00  \_ [ksoftirqd/1]
   17 ?        I      0:00  \_ [kworker/1:0-eve]
   18 ?        I<     0:00  \_ [kworker/1:0H-kb]

or checks cron status

service cron status
● cron.service - Regular background program processing daemon
   Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2016-06-26 03:00:37 UTC; 1min 17s ago
     Docs: man:cron(8)
 Main PID: 845 (cron)
    Tasks: 1 (limit: 4383)
   CGroup: /system.slice/cron.service
           └─845 /usr/sbin/cron -f

Jun 26 03:00:37 ip-172-31-45-21 systemd[1]: Started Regular background program processing daemon.
Jun 26 03:00:37 ip-172-31-45-21 cron[845]: (CRON) INFO (pidfile fd = 3)
Jun 26 03:00:37 ip-172-31-45-21 cron[845]: (CRON) INFO (Running @reboot jobs)

restart or start cron service if required

DB connections / connection pool process – keep listening for any alerts on DB connections failure or even warnings as this can be due to too many read operations such as in DDOS and can escalate very quickly

netstat -nltp  | grep db 
tcp        0      0 0.0.0.0:5433            0.0.0.0:*               LISTEN      5792/db-server * 

Routine deepstatus checks is a good practice too

Port checks – Regular checks if servers are lsietning on ports such as 5060 for SIP

netstat -nltp | grep 5060
tcp        0      0 x.x.x.x:5060       0.0.0.0:*               LISTEN      8970/kamailio  

cron zombie process checks – zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the “Terminated state”. List xombie process and kill them with pid to free up .

kill -9 <PID1>

bulk calls checks – consult ongoing call cmd commands for application server such as
For Freeswitch use

fs_ctl> show channels 

For kamailio use kamcmd

kamcmd dlg.list

For asterisk watch or show cmmand

watch -n 1 "sudo asterisk -vvvvvrx 'core show channels' | grep call"

Incase of DDOS or other macious attacker IP identification block the IP

iptables -I INPUT -s y.y.y.y -j DROP   

Can also use fail2ban

>apt-get update && apt-get install fail2ban

Additionally check how many dispatchers are responding on outbound gateway

opensipsctl dispatcher dump

Process control supervisor or pm2 checks – supervisor is a Linux Process Control System that allows its users to monitor and control a number of processes

ps axf | grep supervisor

for pm2

> pm2 status
[PM2] Spawning PM2 daemon with pm2_home=/Users/altanai/.pm2
[PM2] PM2 Successfully daemonized
┌─────┬───────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │

htop to check memeory and CPU

Health and load on the reverse proxy, load balancer as Nginx – perform a direct curl request to host to check if Nginx responds with a non 4xx / 5xx response or not

curl -v <public-fqdn-of-server> 

Incase of error response , restart

/etc/init.d/nginx start

Incase of updates restart ngnix config

nginx -s reload

For HTTP/SSL proxy daemon such as tiny proxy which are used for fast resposne , set the MinSpareServers, MaxSpareServers , MaxClients , MaxRequestsPerChild etc appropriately

VPN checks – restart fireealls or IPsec incase of ssues

/etc/init.d/ipsec restart

Additionally also check ssh service

ps axf | grep sshd

restart sshd if required

SSL cert expiry checks – to keep the operations running securely and prevent and abrupt termination it is a good practise to run regular certificate expiry checks for SSL certs especially on secure HTTP endpoint like APIs , web server and also on SIP applications servers for TLS. If any expiry is due in < 10 days to trigger an alert to renew the certs

Health of Task scheduling services such as RabbitMQ, Celery Distributed Task Queue – remote debugging of these can be set up via pdb which supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame.

import pdb; pdb.set_trace()
python3 -m pdb myscript.py

It can also be set up via using the client libraries provided by these Queue services themselves

cluster status – setup an efficient health check service which monitors the cluster status for High Availability. Learn more about ensuring HA –
JSON object depicting the status of cluster shards

{
  "cluster_name" : "ABC-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 14,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 200,
  "active_shards" : 300,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0
}

Status of Crticial Server Server

fscli > show status
UP 0 years, 0 days, 0 hours, 58 minutes, 33 seconds, 15 milliseconds, 58 microseconds
FreeSWITCH (Version 1.6.20 git 987c9b9 2018-01-23 21:49:09Z 64bit) is ready
3 session(s) since startup
0 session(s) - peak 1, last 5min 1
0 session(s) per Sec out of max 30, peak 1, last 5min 1
1000 session(s) max
min idle cpu 0.00/80.83
Current Stack Size/Max 240K/8192K

Programming or Syntax error in the production environment – mostly arising due to incomplete QA/testing before pushing new changes to production. Should trigger alerts for dev teams and meet with hot patches.

Many programing application development frameworks have inbuild libs for debugging , exceotion handling and reporting such as

  • backend service in Django
  • API service in Go

Distributed memory caching – redis , memcahe

Redis info

>redis-cli info
# Server
redis_version:6.0.4
redis_git_dirty:0
redis_mode:standalone
os:Darwin 18.7.0 x86_64
arch_bits:64
multiplexing_api:kqueue
atomicvar_api:atomic-builtin
gcc_version:4.2.1
tcp_port:6379

# Clients
connected_clients:1
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:1065648
used_memory_human:1.02M
number_of_cached_scripts:0
maxmemory:0
allocator_frag_bytes:1123680
allocator_rss_ratio:1.00
rss_overhead_bytes:37888
mem_fragmentation_ratio:2.16
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:0
module_fork_last_cow_size:0

# Stats
total_connections_received:1
total_commands_processed:0
..

# Replication
role:master
connected_slaves:0
..

# CPU
used_cpu_sys:0.011198
used_cpu_sys_children:0.000000

# Modules

# Cluster
cluster_enabled:0

SMS service using smsc on Kannel From the kannel servers, you should see the PANIC error (most of the time Assertion error crashing kannel):

grep PANIC /var/log/kannel/bearerbox.log

IF you are going to restart , Flush redis cache

sudo redis-cli FLUSHALL
sudo redis-cli SAVE

restart kannel

sudo /etc/init.d/kannel restart

If the carriers are throttling the SMS request , verify “ERROR” responses using

sudo grep -i "throttling" bearerbox.log

Alternatives include AWS logs services ,

  • Scalyr logging
  • Sensu monitoring for multi-cloud monitoring using event pipeline

Ref :