Certificates , compliances and Security in VoIP

This article describes various Certificates and compliances, Bill and Acts on data privacy, Security and prevention of Robocalls as adopted by countries around the world pertaining to Interconnected VoIP providers, telecommunications services, wireless telephone companies etc

Compliance certificates by Industry types

HIPAA (Health Insurance Portability and Accountability Act)

Deals with privacy and security of personal medical records and electronic health care transaction

Applicability  : If voip company handles medical information

Includes : 

  • Not allowed Voice mail transcription
  • Should have End-to-End Encryption
  • Restrict  using unsecured WiFi networks to prevent Snooping
  • User security , strong password rules  and mandatory monthly change
  • Secure Firmware on VoIP phones
  • Maintaining Call and Access Logs

SOX( Sarbanes Oxley Act of 2002)

Also known as SOX, SarbOX or Public Company Accounting Reform and Investor Protection Act

Applicability : if managing the communications operations of a regulated, publicly traded company 

Includes : 

  • Retain records which include financial and other sensitive data
  • ways employees are provided or denied access to records or data based on their roles and responsibilities
  • do information audit by a trusted third party. 
  • Retention and deletion of files such as audio files like voicemails, text messages, video clips, declared paper records, storage, and logs of communications activities
  • Physical and digital security controls around cloud-based VoIP applications and the networks

Privacy Related Compliance certificates

COPPA (Children’s Online Privacy Protection Act ) of 1998 

prohibits deceptive marketing to children under the age of 13, or collecting personal information without disclosure to their parents. 

any information is to be passed on to a third party, must be easy for the child’s guardian to review and/or protect

2011 amendment  requires that the data collected was erased after a period of time,

2014 FTC issued guidelines that apps and app stores require “verifiable parental consent.”

CPNI (Customer Proprietary Network Information) 2007

CPNI (Customer Proprietary Network Information) in united states is the information that communication providers  acquire about their subscribers. This Individually identifiable information that is created by a customer’s relationship with a provider, such as data about the frequency, duration, and timing of calls, the information on a customer’s bill, and call identifying information. This processing information is governed strictly by FCC and certification should be renewed on an annual basis

Provider can pass along that information to marketers to sell other services, as long as the customer is notified

In 2007, the FCC explicitly extended the application of the Commission’s CPNI rules of the Telecommunications Act of 1996 to providers of interconnected VoIP service.


Communications Assistance for Law Enforcement Act (CALEA) conduct electronic surveillance by imposing specific obligations on “telecommunications carriers” for assisting law enforcement, including delivering call interception and call identification functionality to the government with a minimum of interference to customer service and privacy.

Read more about CALEA and its roles in VoIP here Regulatory and Legal Considerations with WebRTC development

GDPR (General Data Protection Regulation)  in European Union 2018

Supersedes the 1995 Data Protection Directive

Establishes requirements of organizations that process data, defines the rights of individuals to manage their data, and outlines penalties for those who violate these rights.

No personal data may be processed unless this processing is done under one of six lawful bases specified by the regulation (consent, contract, public task, vital interest, legitimate interest or legal requirement). When the processing is based on consent the data subject has the right to revoke it at any time.

Controllers must notify Supervising Authorities (SA)s of a personal data breach within 72 hours of learning of the breach.

California Consumer Privacy Act (CCPA) 2019

consumer rights relating to the access to, deletion of, and sharing of personal information that is collected by businesses. 

Allows consumers to know whether their personal data is sold or disclosed , to whom .

Allows opt-out right for sales of personal information

Right to deletion – to request a business to delete any personal information about a consumer collected from that consumer

Personal Data Protection Bill (PDP) – India 2018

This bill introduces various private and sensitive protection frameworks  like restriction on retention of personal data, Right to correction and erasure (such as right to be forgotten) , Prohibition and transparency of processing of personal data. It also classifies data fiduciaries  including certain social media intermediaries. 

The Bill amends the Information Technology Act, 2000 to delete the provisions related to compensation payable by companies for failure to protect personal data.

Other data privacy acts similar to GDPR 

  • South Korea’s Personal Information Protection Act  2011
  • Brazil’s Lei Geral de Proteçao de Dados (LGPD)  2020
  • Privacy Amendment (Notifiable Data Breaches) to Australia’s Privacy Act 2018
  • Japan’s Act on Protection of Personal Information 2017
  • Thailand Personal Data Protection Act (PDPA) 2020

Features offered by VOIP companies for Data privacy 

  • Access Control & Logging
  • Auto Data Redaction / Account Deletion policy 
  • SIEM (Security information and event management) alerts 
  • Information security , Encrypted Storage For Recordings & Transcripts
  • Disclosing all third party services that are involved in data processing too
  • Role Based Access Control and 2 Factor Authentication
  • Data Security Audits and appointing  data protection officer to oversee GDPR compliance

Against Robocalls and SPIT ( SPAM over Internet Telephony)

 2009 Truth in Caller ID Act 

Telephone Consumer Protection Act of 1991

Implementation of Do not call registry against use of robocalls, automatic dialers, and other methods of communication

Do-Not-Call Implementation Act of 2003

if a business has an established relationship with a customer, it can continue to call them for up to 18 months. If a consumer calls the company, say, to ask for information about the product or service, the company has three months to get back to him.

if the customer asks to not receive calls, the company must stop calling, or be subject to fines.

Exemptions – Calls from a not-for-profit B organisation , informational messages as flight cancellations , Calls from sales and debt collectors etc

Personal Data Privacy and Security Act 2009

Implemented to curb  identity theft and computer hacking. Sensitive personal identifiable information includes : victim’s name, social security number, home address, fingerprint/biometrics data, date of birth, and bank account numbers.

Any company that is breached must notify the affected individuals by mail, telephone, or email, and the message must include information on the company and how to get in touch with credit reporting agencies

If the breach involves government or national security , company must also contact the Secret Service within fourteen days 

TRACED Act (Telephone Robocall Abuse Criminal Enforcement and Deterrence) 2019

Canadian Radio-television and Telecommunications Commission (CRTC) 2018 -32

A solution mechanism has already been standardised and active in adoption called STIR / SHAKEN ( Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs) described in another article here.

Emergency services 

FCC E911 E911 / VoIP E911 rules

Unlike traditional telephone connections, which are tied to a physical location, VOIP’s packet switched technology allows a particular number to be anywhere making it more difficult for it to reach localised services like emergency numbers of Public Safety Answering Points (PSAPs) . Thus FCC regulations as well as the New and Emerging Technologies 911 Improvement Act of 2008 (NET 911 Act), interconnected VoIP providers are required to provide 911 and E911 service. 

Ref : 


To understand the need for implementing an identification verification technique in Internet protocol based network to network communication system , we need to evaluate the existing problem plaguing the VoIP setup .

What is Call ID spoofing ? 

Vulnerability of existing interconnection phone system which is used by robo-callers to mask their identity or to make it appear the call is from a legitimate source, usually originates from voice-over-IP (VOIP) systems.

In this context understand the Caller Line identification CLI/ NCLI techniques used by VoIP and OTT( over the top) providers today.

CLI (Caller Line Identification)

If call goes out on a CLI route ( White Route ) the received party will likely see your callerID information

  • Lawful – Termination is legal on the remote end ie abiding country’s telco infrastructure and stable
  • Expensive – usually with direct or via leased line (TDM) interconnections with the tier-1 carriers.

Non-CLI (Non-Caller Line Identification)

The Caller ID is not visible at the call
If call goes out on a Non-CLI route (Grey Route) goes out on a non-CLI routes they will see either a blocked call or some generic number.

  • Unlawful – questionable legality or maybe violating some providers AUP(Acceptable Use Policy ) on the remote end.
  • Cheaper – low quality , usually via VoIP-GSM gateways

Example include robocalls , tele-marketting / spam etc which are unwilling to share their Caller Id for call receiver, to not be blocked or cancelled.

To overcome the problem of non-verifiable spam , robocalls a suite of protocols and procedures are proposed that can combat caller ID spoofing on VOIP and connected public telephone networks.


Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs

Used by robocallers to mask their identity or to make it appear the call is from a legitimate source
usually orignates from voice-over-IP (VOIP) systems


Standards developed by the Internet Engineering Task Force (IETF) 

For telecommunication service providers implement  certificate management system to create and manage the public and private keys, digital certificates used to sign and verify Caller ID details. 

Adds information to the SIP headers that allow the endpoints along the system to positively identify the origin of the data , such as JSON web tokens encrypted with the provider’s private key, encoded using Base64,

There are three levels of verification, or “attestation”

  • A : Full Attestation
    indicates that the provider recognizes the entire phone number as being registered with the originating subscriber.
  • B : Partial Attestation
    call originated with a known customer but the entire number cannot be verified,
  • C : Gateway Attestation
    call can only be verified as coming from a known gateway

How can the Public Key Infrastructure be used ? 

In an interconnection network , each telephone service provider will obtain its digital certificate from a certificate authority (CA)  that is trusted by other telephone service providers. Calling party signs the SIP Header  caller ID as legitimate . The called party verifies that the calling number is authentic


Originating service provider’s encrypted SIP Identity Header includes the following data:

  1. Attestation level
  2. Date and Time
  3. Calling and Called Numbers
  4. Orig ID for analytics and/or traceback purposes among others
  5. Location of certificate repository
  6. Signature
  7. Encryption algorithm

FCC has also assigned the role of a Secure Telephone Identity Policy Administrator (STI-PA) which oversees that CAs do not provide certificate to spoofing robocallers and enforce the framework for STIR /SHAKEN .

Sample Identity header in SIP requst

INVITE sip:bob@biloxi.example.org SIP/2.0
Via: SIP/2.0/TLS pc33.atlanta.example.com;branch=z9hG4bKnashds8
To: Bob
From: Alice ;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Max-Forwards: 70
Date: Thu, 21 Feb 2002 13:02:03 GMT
Identity-Info: https://atlanta.example.com/atlanta.cer;alg=rsa-sha1
Content-Type: application/sdp
Content-Length: 147

o=UserA 2890844526 2890844526 IN IP4 pc33.atlanta.example.com
s=Session SDP
c=IN IP4 pc33.atlanta.example.com
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000


STIR is based on the SIP protocol and is designed to work with calls being routed through a VOIP network. Since traditional endpoints like POTS and SS7 networks also should be covered under this call authenticity framework , SHAKEN was developed to manage call via IP-to-telephone gateways .

Developed by the Alliance of Telecommunications Industry Solutions (ATIS)

Working Steps  :

  1. When a call is initiated, a SIP INVITE is received by the originating service provider.
  2. Originating service provider verifies the call source and number to determine how to confirm validity.
    1. Full Attestation (A) — The service provider authenticates the calling party AND confirms they are authorized to use this number. An example would be a registered subscriber.
    2. Partial Attestation (B) — The service provider verifies the call origination but cannot confirm that the call source is authorized to use the calling number. An example would be a calling number from behind an enterprise PBX.
    3. Gateway Attestation (C) — The service provider authenticates the call’s origin but cannot verify the source. An example would be a call received from an international gateway.
  3. Create a SIP Identity header that contains information on the calling number, called number, attestation level, and call origination, along with the certificate thus caller ID “signed” as legitimate
  4. SIP INVITE with the SIP Identity header with the certificate is sent to the destination service provider.
  5. Destination service provider verifies the identity of the header and certificate.

Diagrammatic depiction of flow of how Telecom carriers to digitally validates authenticity before receiving or handoff through their network



Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1


MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
generic coding of moving pictures and associated audio information
combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

better than MPEG 1

evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.


  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:


Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
introduced in 2004

Better than MPEG2

40-50% bit rate reduction compared to MPEG-2

Support Up to 4K (4,096×2,304) and 59.94 fps
21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )


  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat


High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC
video compression standard designed to substantially improve coding efficiency
stream high-quality videos in congested network environments or bandwidth constrained mobile networks
Jan 2013
product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

better than H264

overcome shortage of bandwidth, spectrum, storage
bandwidth savings of approx. 45% over H.264 encoded content

resolutions up to 8192×4320, including 8K UHD
Supports up to 300 fps
3 approved profiles, draft for additional 5 ; 13 levels
Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

Enhanced Hybrid spatial-temporal prediction model
CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures

Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
Individual rectangular regions that divide the image are independent

Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.

Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
Entropy coding is only CABAC


  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4                                              ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   libavutil      56. 22.100 / 56. 22.100   libavcodec     58. 35.100 / 58. 35.100   libavformat    58. 20.100 / 58. 20.100   libavdevice    58.  5.100 / 58.  5.100   libavfilter     7. 40.101 /  7. 40.101   libavresample   4.  0.  0 /  4.  0.  0   libswscale      5.  3.100 /  5.  3.100   libswresample   3.  3.100 /  3.  3.100   libpostproc    55.  3.100 / 55.  3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     creation_time   : 2019-06-23T04:58:13.000000Z   Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping:   Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices                              : 1 x265 [info]: frame threads / pool features       : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0 x265 [info]: References / ref-limit  cu / depth  : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     encoder         : Lavf58.20.100     Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1       encoder         : Lavc58.35.100 libx265 frame=  746 fps= 64 q=-0.0 Lsize=     606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x     video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I:      3, Avg QP:27.18  kb/s: 1884.53  x265 [info]: frame P:    179, Avg QP:27.32  kb/s: 523.32   x265 [info]: frame B:    564, Avg QP:35.17  kb/s: 38.69    x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9%  encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25

if you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support


Realtime High quality video encoder
product of product of the Alliance for Open Media (AOM)
Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)

better than H265

AV1 is royalty free and overcomes the patent complexities around H265/HVEC


  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

Audio and Acoustic Signal Processing

Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

Application of audio Signal processing in general

  • storage
  • data compression
  • music information retrieval
  • speech processing ( emotion recognition/sentiment analysis , NLP)
  • localization
  • acoustic detection
  • Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
  • noise cancellation
  • acoustic fingerprinting
  • sound recognition ( speaker Identification , biometric speech verification , voice commands )
  • synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
  • enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

Effects for audio streams processing

  • delay or echo
    To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
    Implemented using tape delays or bucket-brigade devices.
  • flanger
    delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
    signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
  • phaser
    signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
  • chorus
    delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
  • equalization
    frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
    overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
  • pitch shift
    shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
  • time stretching
    changing the speed of an audio signal without affecting its pitch.
  • resonators
    emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
  • modulation
    change the frequency or amplitude of a carrier signal in relation to a predefined signal.
  • compression
    reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
  • 3D audio effects
    place sounds outside the stereo basis
  • reverse echo
    swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
  • wave field synthesis
    spatial audio rendering technique for the creation of virtual acoustic environments

ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

  • Acoustic echo control
    aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
  • Noise control
    microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
  • Gain control
    how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
  • Linear filtering
    ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
  • Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

ASP applications in music playback

AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

  • Post-processing
    techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
  • Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

ASP for virtual assitants

Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

  • Speech enhancement
    multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
  • Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
  • Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

Other areas of ASP

  • Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

Ref :
wikipedia – https://en.wikipedia.org/wiki/Audio_signal_processing
IEEE – https://signalprocessingsociety.org/publications-resources/blog/audio-and-acoustic-signal-processing%E2%80%99s-major-impact-smartphones

Webrtc handshake

Interfaces of webrtc and tracks to stream addition

Process to perform webrtc handshake

1.Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

2.caller creates SDP offer for the callee

3.Callee process the offer

4.Callee generates an SDP answer for the caller

5.Caller receives and prcesses the answer from callee

6.Proceed to Add stream
7. Proceed to do ICE for NAT

Webrtc call setup and incoming call callflow between remote peer , peerconnection actory , peerconnection and application

setup a call
receive a call

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

an extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN
also allows for address selection for multihomed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

Gathering Candidate Addresses

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    1.translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    2.addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address
Agent sends the TURN Allocate request from IP address and port X:x,
NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

only STUN based Binding
agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

TRIGGERED CHECKS – accelerates the process of finding a valid candidate
ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first

Checks ensure mainting frozen candidates and pairs with some foundation for media stream

Each candidate pair in the check list has a foundation and a state. States for candidates pairs
1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.
2. In-Progress: A check has been sent for this pair, but the transaction is in progress.
3. Succeeded: A check for this pair was already done and produced a successful result.
4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.
5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

Example of ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 37542 typ srflx raddr rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 27190 typ relay raddr rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 28049 typ relay raddr rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Exmaple Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4

m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4
a=rtcp:9 IN IP4

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

selectig low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement
controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. Ways
regular nomination and aggressive nomination


Ref :

http://w3c.github.io/webrtc-pc/ WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019
RFC 5245 Inter

Websockets as VOIP signal transport medium

Web resources are usually build on request/response paradigm such as HTTP , SIP messages . This means that server responds only when a client requests it to. This made web intercations very slow and unsuited for VOIP signalling
Long Poll involved repeated polling checks to load new server resources by itself instead of client made explicit request
AJAX and multipart XHR tried to patch the problem by selective reloading however they still required that client perform the mapping for an incomig reply to map to correct request.
However due to overhead latency involved with HTTP transaction and its working mode to open new TCP connetion for every request and reponse and add HTTP headers, none of them were suited to realtime operations

Websocket is the current (2017) most idelistic solution to perform realtime sigalling suited to VOIP requirnments due to its nature os establish a socket .

Websocket Protocol

Enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code.

protocol consists of an opening handshake followed by basic message framing, layered over TCP.
handshake is interpreted by HTTP servers as an Upgrade request.

Secure websocket example :

Request URL: wss://site.com:8084/socket.io/?transport=websocket&sid=hh3Dib_aBWgqyO1IAAEL
Request Method: GET
Status Code: 101 Switching Protocols

Response Headers
Connection: Upgrade
Sec-WebSocket-Accept: UVhTdFOWfywGyQTKDRZyGuhkfls=
Sec-WebSocket-Extensions: permessage-deflate
Upgrade: websocket

Request Headers
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: no-cache
Connection: Upgrade
Host: site.com:8085
Origin: https://site.com:8084
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: 06FNaHge8GLGVuPFxV2fAQ==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36

Query String parameters
transport: websocket
sid: hh3Dib_aBWgqyO1IAAEL

Working with websockets

A new websocket can be opned with ws or wss and it can have sub protocols like in example .

var wsconnection = new WebSocket('wss://voipsever.com', ['soap', 'xmpp']);

It can be attached with event handlers

wsconnection.onopen = function () {
wsconnection.onerror = function (error) {
  console.log('WebSocket Error ' + error);
wsconnection.onmessage = function (e) {
  console.log('message received : ' + e.data);

Send Data on websocket

message string


Blob or ArrayBuffer object to send binary data
Ex : Sending canvas ImageData as ArrayBuffer

var img = canvas_context.getImageData(0, 0, 400, 320);
var binary = new Uint8Array(img.data.length);
for (var i = 0; i < img.data.length; i++) {
  binary[i] = img.data[i];

Ex : sending file as Blob

var file = document.querySelector('input[type="file"]').files[0];

Closing the connection

if (socket.readyState === WebSocket.OPEN) {

Registry for Close codes for WS
1000 Normal Closure [IESG_HYBI] [RFC6455]
1001 Going Away [IESG_HYBI] [RFC6455]
1002 Protocol error [IESG_HYBI] [RFC6455]
1003 Unsupported Data [IESG_HYBI] [RFC6455]
1004 Reserved [IESG_HYBI] [RFC6455]
1005 No Status Rcvd [IESG_HYBI] [RFC6455]
1006 Abnormal Closure [IESG_HYBI] [RFC6455]
1007 Invalid frame payload data [IESG_HYBI] [RFC6455]
1008 Policy Violation [IESG_HYBI] [RFC6455]
1009 Message Too Big [IESG_HYBI] [RFC6455]
1010 Mandatory Ext. [IESG_HYBI] [RFC6455]
1011 Internal Error [IESG_HYBI] [RFC6455][RFC Errata 3227]
1012 Service Restart [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1013 Try Again Later [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1014 The server was acting as a gateway or proxy and received an invalid response from the upstream server. This is similar to 502 HTTP Status Code. [Alexey_Melnikov] [https://www.ietf.org/mail-archive/web/hybi/current/msg10748.html]
1015 TLS handshake [IESG_HYBI] [RFC6455]
1016-3999 Unassigned
4000-4999 Reserved for Private Use [RFC6455]

WebSocket Subprotocol Name Registry

  • MBWS.huawei.com MBWS
  • MBLWS.huawei.com MBLWS
  • soap soap
  • wamp WAMP (“The WebSocket Application Messaging Protocol”)
  • v10.stomp Name: STOMP 1.0 specification
  • v11.stomp Name: STOMP 1.1 specification
  • v12.stomp Name: STOMP 1.2 specification
  • ocpp1.2 OCPP 1.2 open charge alliance
  • ocpp1.5 OCPP 1.5 open charge alliance
  • ocpp1.6 OCPP 1.6 open charge alliance
  • ocpp2.0 OCPP 2.0 open charge alliance
  • ocpp2.0.1 OCPP 2.0.1
  • rfb RFB [RFC6143]
  • sip WebSocket Transport for SIP (Session Initiation Protocol) [RFC7118]
  • notificationchannel-netapi-rest.openmobilealliance.org OMA RESTful Network API for Notification Channel
  • wpcp Web Process Control Protocol (WPCP)
  • amqp Advanced Message Queuing Protocol (AMQP) 1.0+
  • mqtt mqtt [MQTT Version 5.0]
  • jsflow jsFlow pubsub/queue protocol
  • rwpcp Reverse Web Process Control Protocol (RWPCP)
  • xmpp WebSocket Transport for the Extensible Messaging and Presence Protocol (XMPP) [RFC7395]
  • ship SHIP – Smart Home IP SHIP (Smart Home IP) is a an IP based approach to plug and play home automation and smart energy / energy efficiency, which can easily be extended to additional domains such as Ambient Assisted Living (AAL). SHIP can be used solely on the customer premises or can be integrated into a cloud based solution.
  • mielecloudconnect Miele Cloud Connect Protocol This protocol is used to securely connect household or professional appliances to an internet service portal via a public communication network in order to enable remote services.
  • v10.pcp.sap.com Push Channel Protocol
  • msrp WebSocket Transport for MSRP (Message Session Relay Protocol) [RFC7977]
  • v1.saltyrtc.org
  • TLCP-2.0.0.lightstreamer.com TLCP (Text Lightstreamer Client Protocol)
  • bfcp WebSocket Transport for BFCP (Binary Floor Control Protocol)
  • sldp.softvelum.com Softvelum Low Delay Protocol SLDP is a low latency live streaming protocol for delivering media from servers to MSE-based browsers and WebSocket-enabled applications.
  • opcua+uacp OPC UA Connection Protocol
  • opcua+uajson OPC UA JSON Encoding
  • v1.swindon-lattice+json Swindon Web Server Protocol (JSON encoding)
  • v1.usp USP (Broadband Forum User Services Platform)
  • mles-websocket mles-websocket
  • coap Constrained Application Protocol (CoAP) [RFC8323]
  • TLCP-2.1.0.lightstreamer.com TLCP (Text Lightstreamer Client Protocol)
  • sqlnet.oracle.com sqlnet This protocol is used for communication between Oracle database client and database server, and its usage as subprotocol of websocket is primarly geared towards cloud deployments. sqlnet supports bi-directional data transfer and is full duplex in nature.
  • oneM2M.R2.0.json oneM2M R2.0 JSON
  • oneM2M.R2.0.xml oneM2M R2.0 XML
  • oneM2M.R2.0.cbor oneM2M R2.0 CBOR
  • transit Transit
  • 2016.serverpush.dash.mpeg.org MPEG-DASH-ServerPush-23009-6-2017
  • 2018.mmt.mpeg.org MPEG-MMT-23008-1-2018
  • webrtc.softvelum.com Softvelum WebSocket signaling protocol WebRTC live streaming requires WebSocket-based signaling protocol for every specific implementation. Softvelum products will use this subprotocol for signaling

websocket libraries

C++: libwebsockets
Erlang: Shirasu.ws
Java: Jetty
Node.JS: ws
PHP: Ratchet, phpws

Ref :
RFC 6455 – The websocket protocol
Websocket Protocol Registeries : http://www.iana.org/assignments/websocket/websocket.xml
IANA websocket -https://www.iana.org/assignments/websocket/websocket.xhtml

Wifi 6

Wi‑Fi is a trademark of the Wi-Fi Alliance
family of radio technologies commonly used for wireless local area networking (WLAN) of devices.

Current and older Wifi standards

standards operate on varying frequencies, deliver different bandwidth, and support different numbers of channels.

802.11a – transmits at 5 GHz frequency band of the radio spectrum with 54 megabits of data per second.
uses orthogonal frequency-division multiplexing (OFDM) which splits that radio signal into several sub-signals before they reach a receiver to reduces interference.

802.11b – transmits at 2.4 GHz with speed of 11 megabits of data per second,
uses complementary code keying (CCK) modulation to improve speeds.

802.11g – transmits at 2.4 GHz but faster upto 54 megabits of data per second.
uses OFDM coding

802.11n – speeds 140 megabits per second
backward compatible with a, b and g.
can transmit up to four streams of data, each at a maximum of 150 megabits per second, but most routers only allow for two or three streams.

802.11ac – n on the 2.4 GHz band and ac on the 5 GHz band.
backward compatible with 802.11n and thus others
450 megabits per second on a single stream
allows for transmission on multiple spatial streams upo 8
called 5G WiFi because of its frequency band
Very High Throughput (VHT)

Wifi 6

Wi-Fi CERTIFIED 6 networks enable lower battery consumption in devices, making it a solid choice for any environment, including smart home and Internet of Things (IoT) uses.

Wifi Compoents

  • wireless access point (AP) allows wireless devices to connect to the wireless network.
    takes the bandwidth coming from a router and stretches it so that many devices can go on the network from farther distances away.
    give useful data about the devices on the network, provide proactive security, and serve many other practical purposes.
  • Wireless routers are hardware devices that Internet service providers use to connect you to their cable or xDSL Internet network.
    combines the networking functions of a wireless access point and a router.
  • mobile hotspot – feature on smartphones with both tethered and untethered connections
    share your wireless network connection with other devices

Wifi performance

Wi-Fi operational range depends on factors such as the frequency band, radio power output, receiver sensitivity, antenna gain and antenna type as well as the modulation techniquea and propagation charestristics of the signal

Transmitter power
Compared to cell phones and similar technology, Wi-Fi transmitters are low power devices. In general, the maximum amount of power that a Wi-Fi device can transmit is limited by local regulations, such as FCC Part 15 in the US. Equivalent isotropically radiated power (EIRP) in the European Union is limited to 20 dBm (100 mW).

An access point compliant with either 802.11b or 802.11g, using the stock omnidirectional antenna might have a range of 100 m


  • Wired Equivalent Privacy WEP
    client connects to a WEP-protected network, the WEP key is added to some data to create an “initialization vector”/ IV
  • WiFi Protected Access version 2 (WPA2)
    successor to WEP and WPA
    uses either TKIP or Advanced Encryption Standard (AES) encryption
  • WiFi Protected Setup (WPS)
    ties a hard-coded PIN to the router for setup is vulnerabile for exploitation by hackers
  • WPA3™
    Use the latest security methods , higher grader security protocls
    Disallow outdated legacy protocols
    Require use of Protected Management Frames (PMF)
    increased protections from password guessing attempts
    better password protection through Simultaneous Authentication of Equals (SAE), which replaces Pre-shared Key (PSK) in WPA2-Personal.
  • WPA3-Enterprise
    192-bit minimum-strength security protocols and cryptographic tools
    Authenticated encryption: 256-bit Galois/Counter Mode Protocol (GCMP-256)
    Key derivation and confirmation: 384-bit Hashed Message Authentication Mode (HMAC) with Secure Hash Algorithm (HMAC-SHA384)
    Key establishment and authentication: Elliptic Curve Diffie-Hellman (ECDH) exchange and Elliptic Curve Digital Signature Algorithm (ECDSA) using a 384-bit elliptic curve
    Robust management frame protection: 256-bit Broadcast/Multicast Integrity Protocol Galois Message Authentication Code (BIP-GMAC-256)

Ref :

Long Term Evolution (LTE), VOLTE and VOWifi

Both radio and core network evolution
all-IP packet-switched architecture
standardised by 3GPP
lower CAPEX ans OPEX involved

Evolved from Universal Mobile Telecommunication System (UMTS), which in turn evolved from the Global
Also aligned with 4G (fourth-generation mobile)

LTE is backward compatible with GSM/EDGE/UMTS/CDMA/WCDMA systems on existing 2G and 3G spectrum , even hand-over and roaming to existing mobile networks.

Motivation for evolution

Wireless/cellular technology standards are constantly evolving for better efficiency and performance.
LTE evolved as a result of rapid increase of mobile data usage. Applications such as voice over IP (VOIP), streaming multimedia, videoconferencing , cellular modemetc.
It provides packet-switched traffic with seamless mobility and higher qos than predecessors.
Also high data rate, throughput, low latency and packet optimized radioaccess technology on flexible bandwidth deployments.


Peak Data Rate
uplink – 75Mbps(20MHz bandwidth)
downlink – 150 Mbps(UE Category 4, 2×2 MIMO, 20MHz bandwidth) , 300 Mbps(UE category 5, 4×4 MIMO, 20MHz bandwidth)

carrier bandwidth can range from 1.4 MHz up to 20 MHz. Ultimately bandwidth used by carrier depends on frequency band and the amount of spectrum available with a network operator
Mobility 350 km/h

Multiple Access Schemes
uplink: SC-FDMA (Single Carrier Frequency Division Multiple Access) 50Mbps+ (20MHz spectrum)
downlink: OFDM (Orthogonal Frequency Division Multiple Access) 100Mbps+ (20MHz spectrum)

Multi-Antenna Technology , Multi-user collaborative MIMO for Uplink and TxAA, spatial multiplexing, CDD ,max 4×4 array for downlink

Coverage 5 – 100km with slight degradation after 30km

LTE architecture supports hard QoS and guaranteed bit rate (GBR) for radio bearers.


All interfaces between network nodes are IP based
Duplexing – Time Division Duplex (TDD) , Frequency Division Duplex (FDD) and half duples FD
(MIMO) Multiple Input Multiple Output transmissions – LTE devices have to support this. Allows the base station to transmit several data streams over the same carrier simultaneously.
Modulation Schemes QPSK, 16QAM, 64QAM(optional)

LTE Architecture

Primarily composed of

  1. User Equipment (UE)
  2. Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
  3. Evolved Packet Core (EPC).


LTE devices capable of CAT6 speeds (Category 6 )
Increased peak data rate, downlink 3 Gbps, Uplink 1.5 Gbps ( 1 Gbps = 1000 Mbps)
spectral efficiency from 16bps/Hz in R8 to 30 bps/Hz in R10
Carrier Aggregation (CA)
enhanced use of multi-antenna techniques
support for Relay Nodes (RN)

3GPP on LTE – https://www.3gpp.org/technologies/keywords-acronyms/98-lte
ETSI on LTE; Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment (UE) radio transmission and reception – https://www.etsi.org/deliver/etsi_ts/136100_136199/136101/10.03.00_60/ts_136101v100300p.pdf

MIMO ( multiple-input and multiple-output )

SISO – Single Input Single Output
SIMO – Single Input Multiple output
MISO – Multiple Input Single Output
MIMO – Multiple Input multiple Output

Multiplying the capacity of a radio link using multiple transmission and receiving antennas to exploit multipath propagation.
Key technology for achieving a vast increase of wireless communication capacity over a finite electromagnetic spectrum.

Antenna configuration – implies antenna spatial diversity by useing arrays of multiple antennas on one or both ends of a wireless communication link
boost channel capacity.
combats multipath fading
enhance signal to noise ratio,
create multiple communication paths

Applies to wifi
IEEE 802.11n (Wi-Fi), IEEE 802.11ac (Wi-Fi)
as well as cellular networks
HSPA+ (3G)
WiMAX (4G)
Long Term Evolution (4G LTE)
power-line communication for 3-wire installations as part of ITU G.hn standard and HomePlug AV2 specification

Large capacity increases over given bandwidth and S/N resources
Greater throughputs on bands below 6 GHz,

multi-user MU-MIMO

simultaneous independent data links to multiple users over a common time-frequency resource

massive MIMO

enable the expansion of the useful spectrum to microwave and millimeter wave bands within the framework of 5G cellular communication.

microdiversity MIMO

MIMO modes (60m)

Diversity – Alamouti algorithm
Beam forming – create and aim the antenna pattern electronically
Spatial multiplex – use of precoding and shaping to unravel the multipath signals

challenges faced by mobile equipment vendors implementing MIMO in small portable devices.


3main categories: precoding, spatial multiplexing (SM), and diversity coding.


multi-stream beamforming ( signal is emitted from each of the transmit antennas with appropriate phase and gain weighting such that the signal power is maximized at the receiver input ) , increases reception and reduce multipath fading

In line-of-sight propagation, beamforming results in a well-defined directional pattern. However, conventional beams are not a good analogy in cellular networks, which are mainly characterized by multipath propagation. When the receiver has multiple antennas, the transmit beamforming cannot simultaneously maximize the signal level at all of the receive antennas, and precoding with multiple streams is often beneficial. Note that precoding requires knowledge of channel state information (CSI) at the transmitter and the receiver.

Spatial multiplexing

High-rate signal is split into multiple lower-rate streams and each stream is transmitted from a different transmit antenna in the same frequency channel. If these signals arrive at the receiver antenna array with sufficiently different spatial signatures and the receiver has accurate CSI, it can separate these streams into (almost) parallel channels.

increasing channel capacity at higher signal-to-noise ratios (SNR).

Diversity coding

when there is no channel knowledge at the transmitter , a single stream is transmitted. The signal is emitted from each of the transmit antennas with full or near orthogonal coding. Diversity coding exploits the independent fading in the multiple antenna links to enhance signal diversity.

Ref :

NLP ( Natural Language Processing ) in VoIP

NLP ( Natural Language Processing ) can be defined as the automatic manipulation of natural languages ( text or audio) using computer algorithms and softwares. As such NLP has great potential in cognitive and artificial intelligence , but also with increasing human to machine interaction and enhancement in Machine learning ,NLP is set to revolutionize the Voice over IP space.

Note : although not obvious but some people confuse Natural language procession with Neurolinguistic pressing which is a science in Psychology.

NLP evolves from linguistics which itself is a study of language along with its semantics , phonetics and gramer. Every language has rules and NLP uses mathematical formulation to understand it. Discrete mathematical formalisms will be discussed later in this article.

Inputs for NLP is usually though conversation, speech, correspondence, reading, print, written composition, dictation, publishing, translation, lip reading, signing etc .

Rule based vs Statistical NLP – In contrast to rule based engines which work on hard preset values using maybe a decision tree , statistical models work in a more probabilistic fashion which produces more reliable results even in unfamiliar scenarios.

Linear classifier vs Convolutional Neural Nets– CNNs are powerful supervised deep learning technique. As opposed to a linear classifier whose decision boundary on feature space is linear function , CNN increases model complexity by adding more layers . tbd-

NLP tasks


Grammer induction , lemmitization , morphological segmentation , part of speech tagging , parsing , sentence breaking , stemming , word segmentation , terminology extraction


lexical , distributional , machine translation , Named entity recognition ( NER) , natural language understanding and generation, relationship establishment , sentimental analysis , work sense disambiguation , OCR( optical Character recognition) , recognizing textual entailment


speech recognition , specch segmentation , text to speech , dialogues


automatic summarizations , conference resolution , discourse analysis

Key techniques

Out of above its worthy to point out few key techniques

Parts of speech (POS )

A primary tasks in NLP is to extract tokens and sentences, identify parts of speech ( like nouns , verbs , adjectives ) and create parse trees.

POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag . By tagging, algorithm builds lemmatizers which are used to reduce a word to its root form.

POS methods significantly differs from Bag-of-words(BOW) methods which disregards semantic relation relationship and only takes into account words and their frequencies. Whereas POS takes context and definition into consideration.

POS tagging techniques include lexical , rule based , probablistic and deep learning methods.

Named entity recognition (NER)

Given a stream of text, determine which items in the text map to proper names, such as people or places, and their types such as person, location, Organization. Example for raw test as below using Spacy.io

“Hello ! My name is Atanai and I work on Solution design and architecture, developed many custom WebRTC and SIP based solutions such as telecom applications, media stream inetgration into IOT,Unified communication-collaboration ,signalling gateways ,SBC etc. I passed out from Anna university with Betch degree in 2011 and currenlty stay in Bangalore India.”

Analysis of NER is

Noun phrases: ['My name', 'Atanai', 'I', 'Solution design', 'architecture', 'many custom', 'WebRTC and SIP based solutions', 'telecom applications', 'media stream integration', 'IOT', 'Unified communication-collaboration', 'signalling gateways', 'I', 'Anna university', 'Betch degree', 'currently stay', 'Bangalore India'] 
Verbs: ['be', 'work', 'develop', 'base', 'signal', 'pass'] 
Atanai PERSON 
Betch NORP 
2011 DATE 
Bangalore India LOC

Sentiment Analysis

Understand the overall opinion, feeling, or attitude expressed in given media ( speech , text or video) .

NLP in action

NLP application layout

Steps to obtain insights and relevant information from an unclassified document , raw tex file or speech to text content such as recording from VOIP meeting

step 1 : upload a document which could be an invoice , order , feedback , complaint or any other unstructured raw text

Step 2 : Collect the data from the document

  • use OCR (optical character recognition) for hand written or signed components
  • perform search , index , duplication detection etc
  • can use MNIST database as
  • phrase matching and vocabulary
  • Can use translation APIs to trans late from other languages

Step 3 : Collect meaning-full data

  • perform Part of Speech (POS) tagging and chunking process
  • topic discovery and modelling
  • tokenizations and text classification , obtain domain specific entities from the document
  • can use standard model language to collect relevant frequently used words
  • NER ( Named Entity recognition ) to validate names , places and locations
  • can extract out time and date from mentioned entities
  • build relationship graphs

step 4 : extract sentiments using a trained model

  • utilize Regular Expressions for pattern searching
  • sentiment analysis

General Applications:

Application of NLP find its way into many domains

1.VOIP platforms ,media servers and automatic summarization of conference / meetings like “Minutes of Meetings” to highlight the key takeaways from a VOIP session

2. Automatic essay assessment and scripting in education setting alike.

3. Image annotation using metadata describing digital images for categorizations and easy retrieval based on keywords.

4. Spam filtering

5. Building automatic assistants and chatbots with Speech Recognition and using auto suggest with sentence completion ( Siri , Alexa , google voice etc )

6. Social Media Analytics , to track sentiments about topic , figure out influencers such as for movie or restaurant reviews .

NLP in VOIP system

To know more about sound waves go here which describes fundamental characteristics of analog waves . To know more about analog wave modulation go here , this describes how waves are modulated such as frequency , phase , amplitude etc to hold information for propagation . click here to know more about digital wave modulation such as amplitude , frequency , phase shift keying etc . This section build on top of audio streams captured or live .

Based on NLP and trained models on extracted features ,an unknown audio wave can be classified and possibly identified.

Replacing auto attendants with IVR


Ref :

Tools ref:

WebRTC CPaaS ( Communication Platform as a Service )

A CPasS ( communication platform as a service ) is cloud based communication platform that provides real time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces .

Traditionally , with IP protected protocols , licensed codecs maintaining a signalling protocol stack , network interfaces building communication platform was a costly affair. Cisco , Facetime , Skype were the only OTT ( over the top) players taking away from telco’s call revenue .

However with the advent of standardize , open source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real time communications on his platform has many options to choose from. This article provides an insight to how CPaaS solution are architectured and programmed

Sample CPass Architecture build on open source technologies

I have written an article before on Steps for building and deploying WebRTC solution , which includes standalone , cloud hosted and TURN based NAT handler systems .

A typical CPaaS solution provides

  • Call server + Media Server that can be interacted with via UA
  • Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
  • APIs that can trigger automated calls and perform preprogrammed routing.
  • Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
  • Some CPaaS providers also add features like transcribing ,transcoding , recording , playback etc to provide edge over other CPaaS providers

Advantages of using a CPaaS vs building your own RTC platform

Tech insights and experiences

companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday

keeping up with emerging trends

Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP , other ML based enhancements are provided by CPaaS company

Auto Scaling , High Availability

A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure


using a Cpaas saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models

In a nutshell I have come across so many small size startups trying to build CPaaS solution from scratch but only realising it after weeks of trying to build a MVP that they are stuck with firewall, NAT, media quality or interoperability issues . Since there are so many solution already out in the market it is best to instead use them as underlying layer and build applications services using it such as callcentre or CRM services etc .


SIP stack written in C. Available under GPL

pjsip dev guide architecture diagram

PJSip user agent

local_info+tag, local_contact,

pj_status_t pjsip_ua_init(endpt, param);
pj_status_t pjsip_ua_destroy(void);
pjsip_module* pjsip_ua_instance(void);
pjsip_endpoint* pjsip_ua_get_endpt(ua);

PJSip dialog


state, session_counter, initial_cseq, local_cseq, remote_cseq, route_set,

local_info+tag, local_contact, remote_info+tag, remote_contact, next_set


pj_status_t pjsip_dlg_create_uac(ua, local_uri, contact, …);
pj_status_t pjsip_dlg_create_uas(ua, rdata, contact, &dlg);
pj_status_t pjsip_dlg_fork(old_dlg,rdata,&dlg);
pj_status_t pjsip_dlg_set_route_set(dlg, route_set);
pj_status_t pjsip_dlg_inc_session(dlg);
pj_status_t pjsip_dlg_dec_session(dlg);
pj_status_t pjsip_dlg_add_usage(dlg, mod);
pj_status_t pjsip_dlg_create_request(dlg,method,cseq,&tdata);
pj_status_t pjsip_dlg_send_request(dlg,tdata,&tsx);
pj_status_t pjsip_dlg_create_response(dlg,rdata,code,txt,&tdata);
pj_status_t pjsip_dlg_modify_response(dlg,tdata,code,txt);
pj_status_t pjsip_dlg_send_response(dlg,tsx,tdata);
pj_status_t pjsip_dlg_send_msg(dlg,tdata);
pjsip_dialog* pjsip_tsx_get_dlg(tsx);
pjsip_dialog* pjsip_rdata_get_dlg(rdata);

PJsip module

Attributes: name, id, priority, …

pj_bool_t on_rx_request(rdata);
pj_bool_t on_rx_response(rdata);
void on_tsx_state(tsx,event);

SDP state Offer/ Answer transition

SDP negotiator structure

SDP negotiator class diagram PJSIP dev guide


Pre requisities :

GNU make (other make will not work),
GNU binutils for the target, and
GNU gcc for the target.
In addition, the following libraries are optional, but they will be used if they are present:

ALSA header files/libraries (optional) if ALSA support is wanted.
OpenSSL header files/libraries (optional) if TLS support is wanted.

Video support

Video4Linux2 (v4l2) development library
​ffmpeg development library for video codecs: H.264 (together with libx264) and H263P/H263-1998.
Simple DirectMedia Layer SDL
libyuv for format conversion and video manipulation / Ffmpeg
OpenH264 / VideoToolbox (only for Mac) / ffmpeg

get tar and untar sourcecode

wget https://www.pjsip.org/release/2.9/pjproject-2.9.tar.bz2
tar -xvzf pjproject-2.9.tar.bz2

configure to make ‘build.mak’, and ‘os-auto.mak’

./configure --enable-shared --disable-static --enable-memalign-hack --enable-gpl --enable-libx264

It runs checks , pay attention to traces such as


checking opus/opus.h usability... yes
checking opus/opus.h presence... yes
checking for opus/opus.h... yes
checking for opus_repacketizer_get_size in -lopus... yes
OPUS library found, OPUS support enabled

For more features customization `configure’ configures pjproject 2.x to adapt to many kinds of systems.

Usage: ./aconfigure [OPTION]… [VAR=VALUE]…

To assign environment variables (e.g., CC, CFLAGS…), specify them as
VAR=VALUE. See below for descriptions of some of the useful variables.

Defaults for the options are specified in brackets.

-h, –help display this help and exit
–help=short display options specific to this package
–help=recursive display the short help of all the included packages
-V, –version display version information and exit
-q, –quiet, –silent do not print checking ...' messages --cache-file=FILE cache test results in FILE [disabled] -C, --config-cache alias for–cache-file=config.cache’
-n, –no-create do not create output files
–srcdir=DIR find the sources in DIR [configure dir or `..’]

Installation directories:
–prefix=PREFIX install architecture-independent files in PREFIX
–exec-prefix=EPREFIX install architecture-dependent files in EPREFIX

By default, make install' will install all the files in /usr/local/bin’, /usr/local/lib' etc. You can specify an installation prefix other than/usr/local’ using --prefix', for instance–prefix=$HOME’.

For better control, use the options below.

Fine tuning of the installation directories:
–bindir=DIR user executables [EPREFIX/bin]
–sbindir=DIR system admin executables [EPREFIX/sbin]
–libexecdir=DIR program executables [EPREFIX/libexec]
–sysconfdir=DIR read-only single-machine data [PREFIX/etc]
–sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
–localstatedir=DIR modifiable single-machine data [PREFIX/var]
–libdir=DIR object code libraries [EPREFIX/lib]
–includedir=DIR C header files [PREFIX/include]
–oldincludedir=DIR C header files for non-gcc [/usr/include]
–datarootdir=DIR read-only arch.-independent data root [PREFIX/share]
–datadir=DIR read-only architecture-independent data [DATAROOTDIR]
–infodir=DIR info documentation [DATAROOTDIR/info]
–localedir=DIR locale-dependent data [DATAROOTDIR/locale]
–mandir=DIR man documentation [DATAROOTDIR/man]
–docdir=DIR documentation root [DATAROOTDIR/doc/pjproject]
–htmldir=DIR html documentation [DOCDIR]
–dvidir=DIR dvi documentation [DOCDIR]
–pdfdir=DIR pdf documentation [DOCDIR]
–psdir=DIR ps documentation [DOCDIR]

System types:
–build=BUILD configure for building on BUILD [guessed]
–host=HOST cross-compile to build programs to run on HOST [BUILD]
–target=TARGET configure for building compilers for TARGET [HOST]

Optional Features:
–disable-option-checking ignore unrecognized –enable/–with options
–disable-FEATURE do not include FEATURE (same as –enable-FEATURE=no)
–enable-FEATURE[=ARG] include FEATURE [ARG=yes]
Disable floating point where possible
–enable-epoll Use /dev/epoll ioqueue on Linux (experimental)
–enable-shared Build shared libraries
–disable-resample Disable resampling implementations
–disable-sound Exclude sound (i.e. use null sound)
–disable-video Disable video feature
–enable-ext-sound PJMEDIA will not provide any sound device backend
–disable-small-filter Exclude small filter in resampling
–disable-large-filter Exclude large filter in resampling
–disable-speex-aec Exclude Speex Acoustic Echo Canceller/AEC
–disable-g711-codec Exclude G.711 codecs from the build
–disable-l16-codec Exclude Linear/L16 codec family from the build
–disable-gsm-codec Exclude GSM codec in the build
–disable-g722-codec Exclude G.722 codec in the build
–disable-g7221-codec Exclude G.7221 codec in the build
–disable-speex-codec Exclude Speex codecs in the build
–disable-ilbc-codec Exclude iLBC codec in the build
–enable-libsamplerate Link with libsamplerate when available.
–enable-resample-dll Build libresample as shared library
–disable-sdl Disable SDL (default: not disabled)
–disable-ffmpeg Disable ffmpeg (default: not disabled)
–disable-v4l2 Disable Video4Linux2 (default: not disabled)
–disable-openh264 Disable OpenH264 (default: not disabled)
–enable-ipp Enable Intel IPP support. Specify the Intel IPP
package and samples location using IPPROOT and
IPPSAMPLES env var or with –with-ipp and
–with-ipp-samples options
–disable-darwin-ssl Exclude Darwin SSL (default: autodetect)
–disable-ssl Exclude SSL support the build (default: autodetect)

–disable-opencore-amr Exclude OpenCORE AMR support from the build
(default: autodetect)

–disable-silk Exclude SILK support from the build (default:

–disable-opus Exclude OPUS support from the build (default:

–disable-bcg729 Disable bcg729 (default: not disabled)
–disable-libyuv Exclude libyuv in the build
–disable-libwebrtc Exclude libwebrtc in the build

Optional Packages:
–with-PACKAGE[=ARG] use PACKAGE [ARG=yes]
–without-PACKAGE do not use PACKAGE (same as –with-PACKAGE=no)
–with-external-speex Use external Speex development files, not the one in
“third_party” directory. When this option is set,
make sure that Speex is accessible to use (hint: use
CFLAGS and LDFLAGS env var to set the include/lib
–with-external-gsm Use external GSM codec library, not the one in
“third_party” directory. When this option is set,
make sure that the GSM include/lib files are
accessible to use (hint: use CFLAGS and LDFLAGS env
var to set the include/lib paths)
–with-external-srtp Use external SRTP development files, not the one in
“third_party” directory. When this option is set,
make sure that SRTP is accessible to use (hint: use
CFLAGS and LDFLAGS env var to set the include/lib
–with-external-yuv Use external libyuv development files, not the one
in “third_party” directory. When this option is set,
make sure that libyuv is accessible to use (hint:
use CFLAGS and LDFLAGS env var to set the
include/lib paths)
–with-external-webrtc Use external webrtc development files, not the one
in “third_party” directory. When this option is set,
make sure that webrtc is accessible to use (hint:
use CFLAGS and LDFLAGS env var to set the
include/lib paths)
–with-external-pa Use external PortAudio development files. When this
option is set, make sure that PortAudio is
accessible to use (hint: use CFLAGS and LDFLAGS env
var to set the include/lib paths)
–with-sdl=DIR Specify alternate libSDL prefix
–with-ffmpeg=DIR Specify alternate FFMPEG prefix
–with-openh264=DIR Specify alternate OpenH264 prefix
–with-ipp=DIR Specify the Intel IPP location
–with-ipp-samples=DIR Specify the Intel IPP samples location
–with-ipp-arch=ARCH Specify the Intel IPP ARCH suffix, e.g. “64” or
“em64t. Default is blank for IA32”
–with-ssl=DIR Specify alternate SSL library prefix. This option
will try to find OpenSSL first, then if not found,
GnuTLS. To skip OpenSSL finding, use –with-gnutls
option instead.
–with-gnutls=DIR Specify alternate GnuTLS prefix
This option is obsolete and replaced by
–with-opencore-amr=DIR Specify alternate libopencore-amr prefix
Specify alternate libvo-amrwbenc prefix
–with-silk=DIR Specify alternate SILK prefix
–with-opus=DIR Specify alternate OPUS prefix
–with-bcg729=DIR Specify alternate bcg729 prefix

Some influential environment variables:
CC C compiler command
CFLAGS C compiler flags
LDFLAGS linker flags, e.g. -L if you have libraries in a
nonstandard directory
LIBS libraries to pass to the linker, e.g. -l
CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I if
you have headers in a nonstandard directory
CXX C++ compiler command
CXXFLAGS C++ compiler flags
CPP C preprocessor

Make and install

make dep
make install


create library instance and initiate with default config and logging

lib = pj.Lib()
lib.init(log_cfg = pj.LogConfig(level=LOG_LEVEL, callback=log_cb))

create UDP transport listening on any available port

transport = lib.create_transport(pj.TransportType.UDP,

create sipuri and local account

my_sip_uri = "sip:" + transport.info().host + ":" + str(transport.info().port)
acc = lib.create_account_for_transport(transport, cb=MyAccountCallback())

Function to make call

def make_call(uri):
        print "Making call to", uri
        return acc.make_call(uri, cb=MyCallCallback())
    except pj.Error, e:
        print "Exception: " + str(e)
        return None

Ask user input for destination URI and Call

print "Enter destination URI to call: ",
input = sys.stdin.readline().rstrip("\r\n")
if input == "":
lck = lib.auto_lock()
current_call = make_call(input)
del lck

shutdown the library

transport = None
 acc = None
 lib = None


➜  ~ python simplecall.py sip:altanai@

09:58:09.571        os_core_unix.c !pjlib 2.9 for POSIX initialized
09:58:09.573        sip_endpoint.c  .Creating endpoint instance…
09:58:09.574        pjlib  .select() I/O Queue created (0x7fcfe00590d8)
09:58:09.574        sip_endpoint.c  .Module "mod-msg-print" registered
09:58:09.574        sip_transport.c  .Transport manager created.
09:58:09.575        pjsua_core.c  .PJSUA state changed: NULL --> CREATED
09:58:10.073        pjsua_core.c  .pjsua version 2.9 for Darwin-18.7/x86_64 initialized
Call is  CALLING last code = 0 ()

Debug Help

Isssue1 Undefined symbols for architecture x86_64:
“_pjmedia_codec_opencore_amrnb_deinit”, referenced from:
_amr_encode_decode in mips_test.o
_create_stream_amr in mips_test.o
“_pjmedia_codec_opencore_amrnb_init”, referenced from:
_amr_encode_decode in mips_test.o
_create_stream_amr in mips_test.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [../bin/pjmedia-test-x86_64-apple-darwin18.7.0] Error 1
make[1]: *** [pjmedia-test-x86_64-apple-darwin18.7.0] Error 2
make: *** [all] Error 1
*Solution* Mac often throws this linker error. Follow

./configure --enable-shared --disable-static --enable-memalign-hack
make dep
make install
cd pjproject-2.9/pjsip-apps/src/python/
python setup.py install

Hosted IP-PBX and its SBC

SBC ( Session Borde Controllers ) are basically gateways that provide interconnectivity between the hosted IP-PBX of the enterprise to the outside world endpoints such as telco service provider, PSTN/ TDM , SIP trunking providers or even third party OTT provider apps like skype for business etc.

If you have a hosted IPPBX or PBX in your data-centre or on premise and you need controlled but heavy outflowing traffic, it is a good idea to integrate a resilient and efficient SBC to provide seamless interconnectivity.

Hosted PBX

For an enterprises such as an Trading floor or warehouse with multiple phone types , softphones , hardphones , turrets etc distributed across various geographies and zones a device agnostic architectural setup is prime . Listing the essentials for setting up such a system. Note supplementary services are data-services , logging , licensing etc are important but kept out of scope to keep focus on functional aspects .

An enterprise application usually is structured in tiers or layers

  • Client tier – the networks clients communication to the central java programs . Runs on client machines
  • web tier – state full communication between client and business tier . Runs in server machine.
  • business tier- handles the logic of the application. The business tier uses the Enterprise Java Bean (EJB) container, which manages the execution of the beans
  • data tier – encompasses DB drivers . Runs on separate machines for database storage

Event services for Line status notifications

providers lines status notification across enterprise for inter zone and softphone to hardphone .

Routing services

routing calls within enterprise and hardphone sites read more about resource zones later in the article

Call Control Manager (CCM)

consolidated set of all service and component that make up the VOIP platform besides media handlers . It includes SIP adapters , bridge managers , call processing frameworks , API frameworks , healthchecks etc .

Call processing framework ( CPF)

signalling and call routing logic , mostly in SIP and trunks . Manages identities such as Call Line information , Called Party Information , line status etc in shared memory.

Multiple shared Lines and their statuses

Incases where there is a need to process multiple calls from a single User agent device such as a softphone or hardphone ( common scenario for a turret phone) , the design involves assigning it multiple sip uris and each sip uri will establish a line.

When caller calls callee , the line is said to be BUSY , otherwise said to be IDLE. Transition of a shared sip line from IDLE to BUSY is transmitted to others via SIP PUBLISH as other UAs holding the same sip

Similarly any other event like transfer is propagated to other via SIP UPDATE

Clustering Call control managers (CCM)

A Call Communication manager (CCM) from various zones should be able to cowork on call and session management and advanced features such as routing from home guest zone to home zone , call transfer , refer , barge etc. Designing a clustered setup will also provide elasticity , fail-over and high availability. Can use clustered , HA compliant framework such as Oracle Communication Application Server , suited for enterprise level deployments.

Call Replication and distributed memory management

A node will store two types of data: active sessions and passive sessions. The active sessions are used by the node and stored in cache. The passive sessions are the replicas from the other nodes’ active sessions. The passives sessions are stored on a persistent storage.

Controlling Line Calls using AOR and Resource Zones

When dealing with many SIP endpoints , now referred to as resource, it is best to assign the resources to their respective zones. Thus a resource’s status updates will be only updated by its active resource zone while can be read by any resource zone.

Incoming request Zone vs Active Resource Zone

For an Incoming request such a INVITE , check whether the zone sending the request is its active resource zone or not .If the Active Resource Zone is the same zone on which the INVITE came in, then the call is handled by that zone. If the Active Resource Zone is a different zone, then the call needs to be forwarded to the Active Resource Zone.

Bridges for Local Media connections

Although call signalling is handled by a resources active resource zone only, we can still create media bridges in local zone of the resource .

Local MM bridges are used to auto answer an incoming sip line call and create trunk , especially from hardphones which do not support provisional responses.

Interzone proxy Handler

proxies call control messages between active and non active resource zones. Primarily mapping the sip messages with all custom headers inbetween the communication device interfaces.

Dial Trunk using multiple dedicated sip lines and connect via Media Bridge

To save up on call routing /connection time and to support te ability to add as many users on call at runtime , a dedicated media bridge is established for every call.

  • A sip line activated is auto-answered by MM , creates a trunk and waits for other endpoint to join the bridge. The flow is as follows :
  • As INVITE arrives for an IDLE sip line , it is connected to a trunk and auto answered by a local MM bridge .
  • Since the call is already answered , when caller dials number for callee , collect the DTMF digits over RTP using RFC 2833 DTMF events.
  • Run inter-digit timer for digit collection and detect end of dialing on timeout.
  • The dialed trunk connection is made and call is added to media bridge
  • When provisional responses are received on the trunk connection, generate in-band call progress tones (ringing, proceeding etc) via the MM
  • When the line answers, the progress tones have to be stopped and the called party gets bridged to the calling party via the media bridge.

Call Diversion involves forwarding calls from zone to another zone. joinjed parties get call UPDATE status and forward response .

Call barge is the processing of joining an ongoing call . The barge event is usually propagated to joined parities via SIP INFO. Private lines do not allow barge in and are exclusively reserved for only few users.

Interconnectivity provided by an SBC ( Session Border Controller)

Hold-Resume and Music on Hold in multi-line evironment

While a regular p2p call involves simple reinvite based hold and resume with varrying SDP, the scenario is slightly more detailed for hold resume on bridged trunk connection , as explained below.

As the calls made are on bridge , a hold signal involves a RE-INIVITE with held-SDP to media manager (MM). If hold status on trunk is 200 OK the hold status will be sent to other call interfaces connected on the trunk. Else if hold is denied ,403 is sent back to hold-initiates.

Music on hold is an one way RTP mostly from media server.

For a bridged scenarios , separate Music on hold bridges are kept on Media Managers. When an UA has to hold , it is removed from original bridge and place on music on hold bridge . To be unhold/ resume it is placed back into the orignal bridge from music on hold bridge .


user initiates conference, the conference feature can execute on the zone where the user was logged on, irrespective of zones where the other conference attendees join from . The Call processing framework of originators zone completes the SDP exchange to establish two-way speech path among all the parties.

Incases there are multiple connections from a zone , a local MM conference bridge can be created for them which would connect back to originators MM conf bridge . this two part conf bridge will be transparent to the sip line sand users .

For provisioning inputs and settings setup a Diagnostics , Administration and Configuration platform which can process APIs for data services , licences , alarms or do remote device control such as using SNMP

Session Border Controllers (SBC)

At network level SBC operations include

  • bridging multiple interfaces in different networks even between the IPv4 and IPv6 networks
  • auto NAT discovery and STUN
  • protocol conversion such as TLS to UDP etc
  • Flood detection and IP filtering

For SIP specific functionalities , SBC does

  • SIP validation involving checks on syntax and message contents also consistency checks are performed.
  • stateful and call aware. tracing, monitoring and checking for validitya and health of all the SIP messages
  • Topology hiding
  • Traffic filtering
  • Codec filtering , reordering , media pinning, transcoding, or call recording
  • Data replication brings High Availability (HA) with hot backups or even Active-Active solutions.

Traffic sharing and routing roles of SBC can include

  • IP-based and Digest-based authentication
  • limiting traffic by number of concurrent calls or calling rate.
  • Dialplan and/or Custom routing
  • Dispatching/Load-balancing to a backend cluster of servers

SBC’s can be physical hardware boxes or software based applications, as the name suggests their purpose is to control the session at border between the enterprise and external service provider.

SIP to PSTN – SIP is an IP protocol whereas PSTN is a TDM one , achieving interoperability is also the KRA of an SBC

SIP trunking – SBC provide a secure sip connectivity to connect calls to sip trunks which provide bulk calls functionality at a flat pricing.

support for various fixed or mobile endpoints – SBC ensure they are RFC compliant and can extend SIP to any kind of telecom endpoint like PSTN , GSM, fax , Skype , sipphone , IP phones etc.

NAT / Network address translator – To meet the packet routing challenges across a firewall or even during private -public mapping. A combo of DHCP servers and NAT provider comes very handy to reroute or perform hole punching such that signalling and media packets are not dropped and meet the required endpoint. More about NAT here – NAT traversal using STUN and TURN.

Load balancing – Reverse proxies and Load balancers is a much adopted industry practise to mask the inner IPs of the VoIP platform and also route traffic appropriately between control and media server .

Security , QoS and Regulatory compliance – since SBCs are required to typically support a large array of clients they adhere to regulatory and industry accepted standards ,which also involves security features like AAA, TLS/SSL and other means for quality of assurance like logging and fault detection, preventing DDoS etc . In many cases SBC can also encrypt / decrypt RTP streams for probing , tapping or lawful inspection .

Terminating at carriers , PSTN and IP gateways

Additional SBC features

Inaddition to above it is good to have if an SBC provides extra features like forking , emergency number dialing ( 911 ) or active directory integration . Real Time Analysis and monitoring of call and metrics are also expected from a SBC since they reside on edge of the network and are more vulnerable to threats . For example Dialogic Mediant SBC’s and gateways , Audio Codes SBCs

With the shift from on premise PBXs to cloud based VM or microservice architecture , SBC vendors adopt a lager umbrella of services also including automation scripts for checks , reporting tools / consoles , developer friendly APIs to manage sessions via SBC and even WebRTC gateways to connect browser endpoints .

Usage Scenarios

Any VOIP dependant system which deals with bulksome voice / video traffic from external endpoints is a usages scenarios. Listing few

  • Contact Call centres
  • Remote work / offsite monitoring
  • CRM solution for sales/marketing
  • Connecting webrtc click to dial from webpage to enterprise representatives
  • connecting enterprise UCC clients to PSTN endpoints

There are many more.

RealTime Transport protocol (RTP) & RTP control protocol (RTCP )

In a VOIP system, where SIP is a signaling protocol , a SIP proxy never participates in the media flow, thus it is media agnostic.

SDP packets describing a session with codecs , open ports , media formats etc are embedded in a SIP request such as invite .
Post a SDP Offer/Answer flow , RTP and RTCP esnsure that mediastream flow between the endpoints .

RTP is the provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services.

RTCP is the control protocl which provides monitoring of the data delivery, qos in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.

RTP (Real-time Transport Protocol)

A protocol framework
supports use of RTP-level translators and mixers.
independent of the underlying transport and network layers.
does not address resource reservation
does not guarantee quality-of-service for real-time services.
services like payload type identification, sequence numbering, timestamping and delivery monitoring.

RTP Packet via Wireshark
RTP Packet

The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence,

Usage :
Multimedia Multi particpant conferences
Storage of continuous data
Interactive distributed simulation
active badge, control and measurement applications

UDP provides best-effort delivery of datagrams for point-to-point as well as for multicast communications.

SRTP (Secure Real-time Transport Protocol)

Provides confidentiality, message authentication, and replay protection for both unicast and multicast RTP and RTCP streams.
Security layer which resides between the RTP/RTCP application layer and the transport layer

SRTP Packet

Cryptographic context includes includes

  • session key used directly in encryption/message authentication
  • master key securely exchanged random bit string used to derive session keys
  • other working session parameters ( master key lifetime, master key identifier and length, FEC parameters, etc)
    it must be maintained by both the sender and receiver of these streams.

Salting keys” are used to protect against pre-computation and time-memory tradeoff attacks.

To learn more about SRTP specifically visit : https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/

RTP Session

In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

Real-Time Transport Protocol
    [Stream setup by SDP (frame 554)]
        [Setup frame: 554]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 .... = Extension: False
    .... 0000 = Contributing source identifiers count: 0
    0... .... = Marker: False
    Payload type: ITU-T G.711 PCMU (0)
    Sequence number: 39644
    [Extended sequence number: 39644]
    Timestamp: 2256601824
    Synchronization Source identifier: 0x78006c62 (2013293666)
    Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

Synchronization source (SSRC)

32-bit numeric SSRC identifier for source of a stream of RTP packets.
All packets from a synchronisation source form part of the same timing and sequence number space, so a receiver groups packets by synchronisation source for playback.

Binding of the SSRC identifiers is provided through RTCP.
If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

Contributing source (CSRC)

A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer.
The mixer inserts a list of the SSRC identifiers of the sources , called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet.

An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

RTSP (Real-Time Streaming Protocol)

network control protocol
TCP to maintain an end-to-end connection
control real-time streaming media applications such as live audio and HD video streaming
establishes a media session between RTSP end-points ( can be RTSP media servers too) and initiates RTP streams to deliver the audio and video payload from the RTSP media servers to the clients.

RTCP (Real-Time Transport Control Protocol )

Real-time Transport Control Protocol (RTCP) defined in RFC 3550 and is used to send control packets and feedback on QoS to participants in a call along with RTP which sends actual media packets

  • Periodic transmission of control packet
  • Monitor data deliver on large multicast networks
  • Underlying protocol must provide multiplexing of the data and control packets
  • Provide feedback on the quality of the data distribution , congestion control, fault diagnosis , control of adaptive encoding
  • Observer for number of participants to rate of sending packets for scaling up
  • convey minimal session control information

RTCP often uses the next consecutive port as RTP
Example screenshot below 20720 for RTP

And 20721 fo RTCP

Types of RTCP packet

  1. SR: Sender report, for transmission and reception statistics from
    participants that are active senders
  2. RR: Receiver report, for reception statistics from participants
    that are not active senders and in combination with SR for
    active senders reporting on more than 31 sources
  3. SDES: Source description items, including CNAME
  4. BYE: Indicates end of participation
  5. APP: Application-specific functions

The SR: Sender Report RTCP Packet

Sender Report RTCP PAcket

Expanced Sender Report RTCP Packet

SR Report in RTCP

Explanation for some attributes

  • fraction lost: 8 bits size , fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent
  • cumulative number of packets lost: 24 bits size , total number of RTP data packets from source SSRC_n that have been lost since the beginning of reception.
  • interarrival jitter: 32 bits , estimate of the statistical variance of the RTP data packet interarrival time, measured in timestamp unit

RR: Receiver Report RTCP Packet

SDES: Source Description RTCP Packet

SDES items can contain

CNAME: Canonical End-Point Identifier SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| CNAME=1 | length | user and domain name …

NAME: User Name SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| NAME=2 | length | common name of source …

EMAIL: Electronic Mail Address SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| EMAIL=3 | length | email address of source …

PHONE: Phone Number SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| PHONE=4 | length | phone number of source …

LOC: Geographic User Location SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| LOC=5 | length | geographic location of site …

TOOL: Application or Tool Name SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| TOOL=6 | length |name/version of source appl. …

NOTE: Notice/Status SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| NOTE=7 | length | note about the source …

PRIV: Private Extensions SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PRIV=8 | length | prefix length |prefix string... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... | value string ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

BYE: Goodbye RTCP Packet

APP: Application-Defined RTCP Packet

Intended for experimental use

Instance of RTCP sender and receiver reports on transmission and reception statistics

Real-time Transport Control Protocol (Receiver Report)
    [Stream setup by SDP (frame 4)]
        [Setup frame: 4]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 0001 = Reception report count: 1
    Packet type: Receiver Report (201)
    Length: 7 (32 bytes)
    Sender SSRC: 0x796dd0d6 (2037240022)
    Source 1
        Identifier: 0x00000000 (0)
        SSRC contents
            Fraction lost: 0 / 256
            Cumulative number of packets lost: 1
        Extended highest sequence number received: 6534
            Sequence number cycles count: 0
            Highest sequence number received: 6534
        Interarrival jitter: 0
        Last SR timestamp: 0 (0x00000000)
        Delay since last SR timestamp: 0 (0 milliseconds)
Real-time Transport Control Protocol (Source description)
    [Stream setup by SDP (frame 4)]
        [Setup frame: 4]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 0001 = Source count: 1
    Packet type: Source description (202)
    Length: 6 (28 bytes)
    Chunk 1, SSRC/CSRC 0x796DD0D6
        Identifier: 0x796dd0d6 (2037240022)
        SDES items
            Type: CNAME (user and domain) (1)
            Length: 8
            Text: 796dd0d6
            Type: NOTE (note about source) (7)
            Length: 5
            Text: telecomorg
            Type: END (0)

Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)

RTP provides continuous feedback about the overall reception quality from all receivers — thereby allowing the sender(s) in the mid-term to adapt their coding scheme and transmission behaviour to the observed network quality of service (QoS). And also perform

RTP makes no provision for timely feedback that would allow a sender to repair the media stream immediately: through retransmissions, retroactive Forward Error Correction (FEC) control, or media-specific mechanisms for some video codecs, such as reference picture selection.

Components of RTCP based feedback

  • Status reports contained in sender report (SR)/received report (RR) packet transmitted at regular intervals . Can also contain SDES
  • FB ( Feedback ) messages . Indicate loss or reception of particular pieces of a media stream

Types of RTCP Feedback packet

Minimal compound RTCP feedback packet

minimize the size of the RTCP packet transmitted to convey feedback
maximize the frequency at which feedback can be provided
MUST contain only the mandatory information :

  • encryption prefix if necessary,
  • exactly one RR or SR,
  • exactly one SDES with only the CNAME item present, and
  • FB message(s)

Full compound RTCP feedback packet

MAY contain any additional number of RTCP packet

RTCP operation modes

  1. Immediate Feedback mode
  2. Early RTCP mode
  3. Regular RTCP Mode

The Application specific feedback threshold is a function of a number of parameters including (but not necessarily limited to):

  • type of feedback used (e.g., ACK vs. NACK),
  • bandwidth,
  • packet rate,
  • packet loss
  • probability and distribution,
  • media type,
  • codec, and
  • (worst case or observed) frequency of events to report (e.g., frame received, packet lost).

To read on SRTP session with RTP/SAVP and crypto attributes , read https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/

Conference streaming


client encodes the same audio/video stream twice in different resolutions and bitrates and sending these to a router who then decides who receives which of the streams.

Multicast Audio Conference

Assume obtaining a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets.
The audio conferencing application used by each conference participant sends audio data in small chunks of ms duration.
Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.

The RTP header indicates what type of audio encoding (such as PCM, ADPCM or LPC) is contained in each packet so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.

Every packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. Thus RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source.
The sequence number can also be used by the receiver to estimate how many packets are being lost.

For QoS, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP(control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits.

A site sends the RTCP BYE packet when it leaves the conference.

Audio and Video Conference

Audio and video media are transmitted as separate RTP sessions, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name in the RTCP packets for both so that the sessions can be associated.

Synchronized playback of a source’s audio and video is achieved using timing information carried in the RTCP packets

Layered Encodings

In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
Rate-adaptation should be done by a layered encoding with a layered transmission system.

In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

Mixers , Translators and Monitors


An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

Example of Mixer for hi-speed to low-speed packet stream conversion . In conference cases where few participants are connected through a low-speed link where other have hi-speed link, instead of forcing lower-bandwidth, reduced-quality audio encoding for all, an RTP-level relay called a mixer may be placed near the low-bandwidth area.
This mixer resynchronises incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed links.

All data packets originating from a mixer will be identified as having the mixer as their synchronization source.
The RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.


An intermediate system that forwards RTP packets with their synchronization source identifier intact.

Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls.

Tranasltor for Firewall Limiting IP packet pass

Some of the intended participants in the audio conference may be connected with high bandwidth links but might not be directly reachable via IP multicast, for reasons such as being behind an application-level firewall that will not let any IP packets pass. For these sites, mixing may not be necessary, in which case another type of RTP-level relay called a translator may be used.

Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through asecure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site’s internal network.

Other cases :

video mixers can scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

Translator usage when connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.


An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics.

Layered Encodings

In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
Rate-adaptation should be done by a layered encoding with a layered transmission system.

In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

RTP Session

In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

Real-Time Transport Protocol
    [Stream setup by SDP (frame 554)]
        [Setup frame: 554]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 .... = Extension: False
    .... 0000 = Contributing source identifiers count: 0
    0... .... = Marker: False
    Payload type: ITU-T G.711 PCMU (0)
    Sequence number: 39644
    [Extended sequence number: 39644]
    Timestamp: 2256601824
    Synchronization Source identifier: 0x78006c62 (2013293666)
    Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

Synchronization source (SSRC)

32-bit numeric SSRC identifier for source of a stream of RTP packets.
All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.

the binding of the SSRC identifiers is provided through RTCP.
If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

Contributing source (CSRC)

A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer.
The mixer inserts a list of the SSRC identifiers of the sources , called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

Multiplexing RTP Sessions

In RTP, multiplexing is provided by the destination transport address (network address and port number) which is different for each RTP session ( seprate for audio and video ). This helps in cases where there is chaneg in encodings , change of clockrates , detection of packet loss suffered and RTCP reporting .
Moreover RTP mixer would not be able to combine interleaved streams of incompatible media into one stream.

Interleaving packets with different RTP media types but using the same SSRC would introduce several problems.
But multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions.

REMB ( Receiver Estimated Maximum Bitrate)

RTCP message used to provide bandwidth estimation in order to avoid creating congestion in the network.
support for this message is negotiated in the Offer/Answer SDP Exchange.

contains total estimated available bitrate on the path to the receiving side of this RTP session (in mantissa + exponent format).
used by sender to configure the maximum bitrate of the video encoding.

also notify the available bandwidth in the network and by media servers to limit the amount of bitrate the sender is allowed to send.

In Chrome it is deprecated in favor of the new sender side bandwidth estimation based on RTCP Transport Feedback messages.

Session Description Protocol (SDP) Capability Negotiation

negotiate use of one out of several possible transport protocols. The offerer uses the expected least-common-denominator (plain RTP) as the actual configuration, and the alternative transport protocols as the potential configurations.

m=audio 53456 RTP/AVP 0 18

plain RTP (RTP/AVP)
RTP with RTCP-based feedback (RTP/AVPF)
Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)

Adaptive bitrate control

Adapt the audio and video codec bitrates to the available bandwidth, and hence optimize audio & video quality
For video, since resolution is chosen at the start only , encoder use bitrate and frame-rate attributes only during runtime to adapt

RTCP packet called TMMBR (Temporary Maximum Media Stream Bit Rate Request) is sent to the remote client


Kamailio DNS and NAT

DNS sub-system in Kamailio

To resolve hostname into ips it can do either of below

  • use libresolv and a combination of the locally configured DNS server /etc/hosts and the local Network Information Service (NIS/YP a.s.o)
  • or cache the query results and first look into internal cache

DNS failover – if destination resolves to multiple addresses tm can try all of them until it finds one to which it can successfully send the packet or it exhausts all of them , with internal DNS cache. Also used when the destination host doesn’t send any reply to a forwarded invite within the SIP timeout interval (tm fr_timer parameter).

DNS load balancing – SRV based load balancing with weight value in the DNS SRV record.


  1. Only the locally configured DNS server (usually in /etc/resolv.conf) is
    used for the requests (/etc/hosts and the local Network Information Service are ignored).
    Workaround: disable the DNS cache (use_dns_cache=off or compile without -DUSE_DNS_CACHE).
  2. The DNS cache uses extra memory
    Workaround: disable the DNS cache.
  3. The DNS failover introduces a very small performance penalty
    Workaround: disable the DNS failover (use_dns_failover=off).
  4. The DNS failover increases the memory usage (the internal structures
    used to represent the transaction are bigger when the DNS failover support is compiled).
    Workaround: compile without DNS failover support (DUSE_DNS_FAILOVER).Turning it off from the config file is not enough in this case (the extra
    memory will still be used).

NAT ( Network Address Translation)

Network address translation replaces the IP address within packets with a different IP address which internet endpoints can relate with
Enables multiple hosts in a private subnet with their pwn private address ( 10.x.x.x or 192.x.x.x etc ) to share single public IP address interface, to access the Internet.

NAT ( Network Address Translation)

NAT is bidirectional- If the private ip:port got translated to public ip:port on the inside interface while entering outside internet, on arriving from outside interface it will get translated from public ip:port to private ip:port

For a SBC ( Session border controller ) or where the kamailio server is directly customer facing , where you dont have a private line or VPN to clients, then it is often encountered with NATed endpoints. Read more about NAT traversal using STUN and TURN here

Why is Nat important in SIP?

These characteristics of SIP design and operation flows demonstrate why NAT solutions are so important ,

  • RFC 3261 for SIP presumed end-to-end reachability and does not specify much around ANT issues .
  • No NLRI (Network Layer Reachability Information) translation layer exists, such as DNS or ARP
  • SIP is designed to used RTP which uses dynamically allocated ports to stream media.
    It is comparable to FTP which creates ephemeral connections on unpredictable dynamic ports to send multiplexed data and “metadata”, instead of protocol like HTTP where all data is sent on same connection.
  • UDP (default transport for SIP) is connection less and session tracking requires these be mapped onto a statelful flow, rigorous keepalives and other such techniques like using TCP instead have their own tradeoffs
  • since sip packets put network and transport information right on sip header they are limited by the rateability and awareness of their network interface thereby prevent other endpoint from reaching its ip or port

Types of NAT solutions

Client-side NAT traversal – clients are responsible for identifying their WAN NLRI and adding ip and port to navigate them in outside world

Server-side NAT traversal – SIP server should discover the client’s WAN addressing while clients continue to work transparently behind NAT. Requires that DIP server look at the source and destination ip and port of actual packets instead of relying on the encapsulated sip headers and SDP body.

ALG (Application Layer Gateways) – mostly applied at router itself. wodk by susbtitung public IP/port information inplace of provate and vice versa for return packets . Limitataions – they dont provide a fullproof fix example they may fix Via but not the Contact address or SDP body or RTP ports

NAT behaviours

Cone NAT

Local client performs an outbound connection to a remote UA and a dynamic rule is created for the destination IP tuple, allowing the remote machine to connect back. Further subdivied into:
– Full Cone NAT
– Restricted Cone NAT
– Port-Restricted Cone NAT

Symmetric NAT

Local client allows inbound connections from a specific source IP address and port, also NAT assigns a new random source port for each destination IP tuple

NAT behaviours

Cone NAT

Local client performs an outbound connection to a remote UA and a dynamic rule is created for the destination IP tuple, allowing the remote machine to connect back. Further subdivied into:
– Full Cone NAT
– Restricted Cone NAT – all requests from the same internal IP address and port are mapped to the same external IP address and port.
– Port-Restricted Cone NAT

Symmetric NAT

Local client allows inbound connections from a specific source IP address and port, also NAT assigns a new random source port for each destination IP tuple


NAT not only applies to sip signalling packets but also to RTP. Even SIP packets are abel to transverse accross private -public network interfaces to the right place across a NAT’d connection, that doesn’t solve two-way media.
RTP performs RTP latching where client listens for at least one RTP frame arriving at the destination port it advertised, and harvests the source IP and port from that packet and uses that for the return RTP path. RTP latching works out of the box for puclin RTP endpoints but not for ones behind NAT.

It is thus recommended to use an intermediate RTP relay such as RTPengine on kamailio. It is controlled via a UDP control socket by kamailio as an external process. More on installation and descrition of RTP engine on kamailio is covered here. When RTPengine control module receives RTP offer /answer from akmailio , it opens a pair of RTP/RTCP ports to receive traffic and substitues in SDP. Doing so for both ends makes RTP engine come in media stream packets of both directions

Fixing NAT

when the client is behind NAT, following needs to be taken careof to provide smooth operation

  1. Ensuring Tranactional replies are sent to correct source address ( maybe using ;rport param and forcerport() method ) instead of just relying on via header transport protocol and port.
if (client_nat_test("3")){

also Change Media ip address to public IP

if(nat_uac_test("8") && search("Content-type: application/sdp")) {
  1. Any far-end NAT traversal solution ( TURN server) if employed should stay i path of entire Dialog not just for initial INVITE transaction which many times results in ACK being dropped. This can be achived by adding Record-Route header of rr module to the initial INVITE request itself
  2. set the advertised address of the public-facing inetrface to the Public NAT IP using “listen” parameter
  3. Ensure contact URI is NAT processed by using NATHelper modules which rewrites the domain portion of the Contact URI to contain the source IP and port of the request or reply. add_contact_alias([ip_addr, port, proto]) in NAThelper module which adds “;alias=ip~port~transport” parameter to the contact URI containing either received ip, port, and transport protocol or those given as parameters , so
    is turned into:
  4. implement RTP proxy which performs NAT for streams such as rtpengine module

NAT Traversal Module

Provides far-end NAT traversal to kamailio’s SIP signalling .
Its role is

  • detect user agents behind NAT
  • manipulate SIP headers so that user agents can continue working behind NAT transparently
  • keepalives to UA behind NAT to preserve their visibility in network


  • even detect UAs behind multiple cascaded NAT boxes, complex distributed env with multiple proxies
  • handle env where incoming and outgoing paths are diff for SIP messages
  • handle cases when routing path may even change between consecutive dialogs
  • can work for other than registered UA’s also


  • built for IPv4 NAT handling not adapted to support IPv6 session keepalives.

Why use keepalive when Registrations are already there for NATing ?

  1. NAT binding works for registered users who want incoming calls. However for cases like outgoing calls or for presence subscription notifications, failings registration implies inability to receive further in-dialog messages after the NAT binding expires. This artificial binding for registrations makes system unreliable and volatile as it doesnot guarantee the delivery of in-dialog messages for outgoing calls without registration renewal. Therefore keepalive are adopted which also works for unregistered users.
  2. Minimizes the traffic as only border proxies send keepalives which send keepalives statelessly, instead of having to relay messages generated by the registrars.
  3. Also for situations when DNS resolves diff proxies for outgoing or incoming path traditional register based keepalives fail to associate or dissociate correct routes.

How keepalives work for NATing ?

This mechanism works by sending a SIP request to a user agent behind NAT to make that user agent send back a reply. The purpose is to have packets sent from inside the NAT to the proxy often enough to prevent the NAT box from timing out the connection.

Module sends Keeplaives to preserve their visibility only in :

  • Registration – for user agent that have registered to for incoming calls, triggering keepalive for a REGISTER request.
  • Subscription – for presence agents that have subscribed to some events for receiving back notifications with SUBSCRIBE request.
  • Dialogs – for user agents that have initiated an outgoing call for receiving further in-dialog messages.
    When all the conditions to keepalive a NAT endpoint will disappear, that endpoint will be removed from the list with the NAT endpoints that need to be kept alive.

function nat_keepalive() :

  • the function needs to be called on proxy directly interacting with UA behind NAT.
  • call only once for the requests (REGISTER, SUBSCRIBE or outgoing INVITEs) that triggers the need for network visibility.
  • call before the request gets either a stateless reply or it is relayed with t_relay()
  • for outgoing INVITE , it triggers dialog tracing for that dialog and will use the dialog callbacks to detect changes in the dialog state.

Dependencies – sl , tm and dialog module


keepalive_interval – time interval between sending a keepalive message to all the endpoints that need being kept alive. A negative value or zero will disable the keepalive functionality.

modparam("nat_traversal", "keepalive_interval", 30) // 30 seconds keeplaive inetrval

keepalive_method – SIP method to use to send keepalive messages.usual ones are NOTIFY and OPTIONS. Default value is “NOTIFY”.

modparam("nat_traversal", "keepalive_method", "OPTIONS")

keepalive_from – SIP URI to use in the From header of the keepalive requests. default sip:keepalive@proxy_ip,with IP address of the outgoing interface

modparam("nat_traversal", "keepalive_from", "sip:keepalive@altanai.com")

keepalive_extra_headers – extra headers that should be added to the keepalive messages. Header must also include the CRLF (\r\n) line separator. Multiple headers can be specified by concatenating with \r\n separator.

modparam("nat_traversal", "keepalive_extra_headers", "User-Agent: Kamailio\r\nX-MyHeader: some_value\r\n")

keepalive_state_file – filename where information about the NAT endpoints and the conditions for which they are being kept alive is saved . It is used when Kamailio starts to restore its internal state and continue to send keepalive messages to the NAT endpoints that have not expired in the meantime. Also used at kamailio restart as it avoids losing keepalive state information about the NAT endpoints.

modparam("nat_traversal", "keepalive_state_file", "/var/run/kamailio/keepalive_state")


client_nat_test – Check if the client is behind NAT. Tests to be performed gievn by int can be :
1 – tests if client has a private IP address or one from shared address space in the Contact field of the SIP message.
2 – tests if client has contacted Kamailio from an address that is different from the one in the Via field.
4 – tests if client has a private IP address or one from shared address space in the top Via field of the SIP message.

For example calling client_nat_test(“3”) will perform test 1 and test 2 and return true if at least one succeeds, otherwise false.

fix_contact() – replace the IP and port in the Contact header with the IP and port the SIP message was received from. Usually called after a succesfull call to client_nat_test(type)

if (client_nat_test("3")) {

nat_keepalive() – Triggers keepalive functionality for the source address of the request. When called it only sets some internal flags, which will trigger later the addition of the endpoint to the keepalive list if a positive reply is generated/received (for REGISTER and SUBSCRIBE) or when the dialog is started/replied (for INVITEs). For this reason, it can be called early or late in the script. The only condition is to call it before replying to the request or before sending it to another proxy. If the request needs to be sent to another proxy, t_relay() must be used to be able to intercept replies via TM or dialog callbacks.

If stateless forwarding is used, the keepalive functionality will not work. Also for outgoing INVITEs, record_route() should also be used to make sure the proxy that keeps the caller endpoint alive stays in the path.

if ((method=="REGISTER" || method=="SUBSCRIBE" ||
    (method=="INVITE" && !has_totag())) && client_nat_test("3"))

Pseudo Variables


  • keepalive_endpoints – total number of NAT endpoints that are being kept alive.
  • registered_endpoints – NAT endpoints kept alive for registrations
  • subscribed_endpoints – NAT endpoints kept alive for subscriptions.
  • dialog_endpoints – Indicates how many of the NAT endpoints are kept alive for taking part in an INVITE dialog.

NATHelper Module

NAT traversal and reuse of TCP connections
Helps symmetric UAs who are not able to determine their public address.

NAT pinging types

UDP packet – 4 bytes (zero filled) UDP packets are sent to the contact address.
pros : low bandwitdh traffic, easy to generate by Kamailio;
cons : unidirectional traffic through NAT (inbound – from outside to inside); As many NATs do update the bind timeout only on outbound traffic, the bind may expire and closed.

SIP request – a stateless SIP request is sent to the UDP contact address.
pros : bidirectional traffic through NAT, since each PING request from Kamailio (inbound traffic) will force the SIP client to generate a SIP reply (outbound traffic) – the NAT bind will be surely kept open.
cons : higher bandwitdh traffic, more expensive (as time) to generate by Kamailio;

Dependencies – usrloc


force_socket – Socket to be used when sending NAT pings for UDP communication.

modparam("nathelper", "force_socket", "")

natping_processes – How many timer processes should be created by the module for the exclusive task of sending the NAT pings.
received_avp – AVP) used to store the URI containing the received IP, port, and protocol by fix_nated_register


fix_nated_contact() – rewrites the “Contact” header field with request’s source address:port pair
fix_nated_sdp() – adds the active direction indication to SDP and updates ource ip address information too
add_rcv_param() – add a received parameter to the “Contact” header fields or the Contact URI.
fix_nated_register() exports the request’s source address:port into an AVP to be used during save()
nat_uac_test()- check if client’s request originated behind a nat
add_contact_alias() – Adds an “;alias=ip~port~transport” parameter to the contact URI
handle_ruri_alias() – Checks if the Request URI has an “alias” parameter and if so, removes it and sets the “$du” based on its value.

Pseudo Variables

$rr_count – Number of Record Routes in received SIP request or reply.
$rr_top_count – If topmost Record Route in received SIP request or reply is a double Record Route, value of $rr_top_count is 2.

RPC Commands


Ref :