what is WebRTC ?

webrtc draft
 

WebRTC 1.0: Real-time Communication Between Browsers – W3C Candidate Recommendation 13 December 2019 https://www.w3.org/TR/webrtc/

Read more in the layers of webrtc  and their functionalities here :  WebRTC layers

webrtc_development_logowebrtcdevelopment
Open Source WebRTC SDK and its implementation steps https://github.com/altanai/webrtc

What is WebRTC ?

WebRTC (Web Real-Time Communication) is an API definition drafted by the World Wide Web Consortium (W3C) that supports browser-to-browser applications for voice calling, video chat, and P2P file sharing without the need of either internal or external plugins.

  • Enables browser to browser media streaming over secure RTP profile
  • Standardization , on a API level at the W3C and at the protocol level at the IETF.
  • Enables web browsers with Real-Time Communications (RTC) capabilities
  • written in c++ and javascript
  • BSDD style license
  • free, open project avaiable in all major borwsers 

As of the 2019 update the W3C defines it as

a set of ECMAScript APIs in WebIDL to allow media to be sent to and received from another browser or device implementing the appropriate set of real-time protocols. The specification being developed in conjunction with a protocol specification developed by the IETF RTCWEB group and an API specification to get access to local media devices.

 The following is the browser side stack for webrtc media .  

WebRTC media stack Solution Architecture
WebRTC Media Stack

Open and Free Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

WebRTC uses bare MediaStreamTrack objects for each track being shared from one peer to another. Codecs associated in those tracks is not mandated by webrtc soecification.

For video as per RFC 7742 WebRTC Video Processing and Codec Requirements , the manadatory codesc to be supported by webrtc clients are : VP8 and H.264‘s Constrained Baseline profile

For Audio as per RFC 7874 WebRTC Audio Codec and Processing Requirements ,browser must support  Opus codec as well as G.711‘s PCMA and PCMU formats.

Video Resolution handling

Unless the SDP specifically signals otherwise, the web browser receiving a WebRTC video stream must be able to handle video at at least 20 FPS at a minimum resolution of 320 pixels wide by 240 pixels tall.

In the best scenarios ( avaible bandwidth and media devices ) VP8 had no upper mark set on resolution of vdieo stream hence the stream can even go asfar as  maximum resolution of 16384×16384 pixels.

Independant of Signalling 

Webrtc does not specify any signalling / telecommunication protocl and it is upto the adoptor to perform ofeer/answer exchaneg in any way deemed fit for the usecase . For ex maple for a web only application on may use only plain websockets, whereas for a teelcom endpoints compatible app one should SIP as the protocol . 

Read more about WebRTC handshakes :

NAT-traversal technologies such as ICE, STUN, and TURN

Have written in detail about TURN based WebRTC flow diagrams .

https://telecom.altanai.com/2015/03/11/nat-traversal-using-stun-and-turn/. The post describe ICE  (Interactive Connectivity Establishment )  framework which is  mandatory by WebRTC standards.  It is find network interfaces and ports in Offer / Answer Model to exchange network based information with participating communication clients. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN) 

NAT and TURN Relay

Learn about hosting / integrating different TURN servers for WebRTC

TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys – https://telecom.altanai.com/2015/03/28/turn-server-for-webrtc-rfc5766-turn-server-coturn-xirsys/

Why is WebRTC importatnt ?

Significantly better video qualityWebRTC video quality is noticeably better than Flash.
Up to 6x faster connection timesUsing JavaScript WebSockets, also an HTML5 standard, improves session connection times and accelerates delivery of other OpenTok events.
Reduced audio/video latencyWebRTC offers significant improvements in latency through WebRTC, enabling more natural and effortless conversations.
Freedom from FlashWith WebRTC and JavaScript WebSockets, you no longer need to rely on Flash for browser-based RTC.
Native HTML5 elementsCustomize the look and feel and work with video like you would any other element on a web page with the new video tag in HTML5.

The major players behind conception and advancement of WebRTC standards and libraries are  :

IETF , W3C , Java community , GSMA .   The idea is to develop a Light -weight browser based call console , to make SIP calls from Web page .This was successfully achieved using fundamental technologies as Javascript , html5 , web-sockts  and TCP /UDP , open source sip server.It is good to note that there is no extra extension, plugin or gateway required , such as flash support  .Also it bears cross platform support ,  including Mozilla , chrome so on .

 Peer to peer Communication

 WebRTC forms a p2p communication channel between all the peers . that means as the participant count grows  , it converts to  a mesh networking topology with incoming and outgoing stream towards direction of each of its peers .

Two party call p2p

Peer to peer calling

two party call
p2p call

Multiparty Call and mesh network

Mesh based arrangement .

Multiparty party call
Mesh based webrtc video confeerncing

 In special case of broadcasting or  large number of viewers ( without outgoing media stream ) it is recommended to setup a Media Control Unit ( MCU) which will replay the incoming stream to large number of users without putting traffic load on the clients from where the stream is actually originating .   Important note :     1.It should be notes that these diagrams do not depict the ICE and NAT traversal and have been simplifies for better understanding. In real world scenarios there is almost all the time a STUN and TURN server involved .  

More on TURN Servers is given here : NAT traversal using STUN and TURN

2.Also the webrtc mandates the use of secure origin ( https ) on the webpage which invoke getusermedia to capture user media devices like audio , video and location .

Browser Adoption

As of March 2020 , webrtc is supported on following client’s browsers

  • Desktop PC
    Microsoft Edge 12+[25]
    Google Chrome 28+
    Mozilla Firefox 22+[26]
    Safari 11+[27]
    Opera 18+[28]
    Vivaldi 1.9+
  • Android
    Google Chrome 28+ (enabled by default since 29)
    Mozilla Firefox 24+[29]
    Opera Mobile 12+
  • Chrome OS
  • Firefox OS
  • BlackBerry 10
  • iOS
    MobileSafari/WebKit (iOS 11+)
  • Tizen 3.0

Furthermore , read about the Steps for building and deploying WebRTC solution – https://telecom.altanai.com/2014/12/04/steps-for-building-and-deploying-webrtc-solution/

TURN based media Relay

WebRTC APIs

Javascript functions  to access and process the browser media stack

getUserMedia

acquires the audio and video media (e.g., by accessing a device’s camera and microphone)

Properties

ondevicechange

Methods

enumerateDevices()
getDisplayMedia()
getSupportedConstraints()
getUserMedia()

navigator.mediaDevices.getUserMedia({ audio: true, video: true })
.then(function(stream) {
  var video = document.querySelector('video');
  // Older browsers may not have srcObject
  if ("srcObject" in video) {
    video.srcObject = stream;
  } else {
    // Avoid using this in new browsers, as it is going away.
    video.src = window.URL.createObjectURL(stream);
  }
  video.onloadedmetadata = function(e) {
    video.play();
  };
})
.catch(function(err) {
  console.log(err.name + ": " + err.message);
});

DOMException Error on getusermedia

Rejections of the returned promise are made by passing a DOMException error object to the promise’s failure handler. Possible errors are:

AbortError
Although the user and operating system both granted access to the hardware device, problem occurred which prevented the device from being used.

NotAllowedError
One or more of the requested source devices cannot be used at this time. This will happen if the browsing context is insecure( http instead of https) or if the user has specified that the current browsing instance /sessionis not permitted access to the device or has denied all access to user media devices globally.

NotFoundError
No media tracks of the type specified were found that satisfy the given constraints.

NotReadableError
Although the user granted permission to use the matching devices, a hardware error occurred at the operating system, browser, or Web page level which prevented access to the device.

OverconstrainedError
no candidate devices which met the criteria requested. string value is the name of a constraint which was not meet, and a message property containing a human-readable string explaining the problem.

exmaple conatraints :

var constraints = { video: { facingMode: (front? "user" : "environment") } };

SecurityError
User media support is disabled on the Document on which getUserMedia() was called.

TypeError
The list of constraints specified is empty, or has all constraints set to false.

RTCPeerConnection

enables audio and video communication between peers. It performs signal processing, codec handling, peer-to-peer communication, security, and bandwidth management.

Properties

canTrickleIceCandidates
connectionState
currentLocalDescription
currentRemoteDescription
getDefaultIceServers()
iceConnectionState
iceGatheringState
localDescription
onaddstream
onconnectionstatechange
ondatachannel
onicecandidate
oniceconnectionstatechange
onicegatheringstatechange
onidentityresult
onnegotiationneeded
onremovestream
onsignalingstatechange
ontrack
peerIdentity
pendingLocalDescription
pendingRemoteDescription
remoteDescription
sctp
signalingState

Methods

addIceCandidate()
addStream()
addTrack()
close()
createAnswer()
createDataChannel()
createOffer()
generateCertificate()
getConfiguration()
getIdentityAssertion()
getReceivers()
getSenders()
getStats()
getStreamById()
getTransceivers()
removeStream()
removeTrack()
restartIce()
setConfiguration()
setIdentityProvider()
setLocalDescription()
setRemoteDescription()

 signalling state transitions diagram , source W3C

RTC Signalling states

stable
There is no offer/answer exchange in progress. This is also the initial state, in which case the local and remote descriptions are empty.

have-local-offer
Local description, of type “offer”, has been successfully applied.

have-remote-offer
Remote description, of type “offer”, has been successfully applied.

have-local-pranswer
Remote description of type “offer” has been successfully applied and a local description of type “pranswer” has been successfully applied.

have-remote-pranswer
Local description of type “offer” has been successfully applied and a remote description of type “pranswer” has been successfully applied.
closed The RTCPeerConnection has been closed; its [[IsClosed]] slot is true.

RTCSDPType

offer
SDP offer.

pranswer
An RTCSdpType of pranswer indicates that a description MUST be treated as an [SDP] answer, but not a final answer.

answer
treated as an [SDP] final answer, and the offer-answer exchange MUST be considered complete. A description used as an SDP answer may be applied as a response to an SDP offer or as an update to a previously sent SDP pranswer.

rollback
canceling the current SDP negotiation and moving the SDP [SDP] offer back to what it was in the previous stable state.

RTCPeerConfiguration

Defines a set of parameters to configure how the peer-to-peer communication established via RTCPeerConnection

iceServers of type sequence
array of objects describing servers available to be used by ICE, such as STUN and TURN servers.

iceTransportPolicy of type RTCIceTransportPolicy.

bundle policy affects which media tracks are negotiated if the remote endpoint is not bundle-aware, and what ICE candidates are gathered. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.

  • relay
    ICE Agent uses only media relay candidates such as candidates passing through a TURN server.
  • all
    The ICE Agent can use any type of candidate when this value is specified.

bundlePolicy of type RTCBundlePolicy.
media-bundling policy to use when gathering ICE candidates.
Types :

  • balanced
    Gather ICE candidates for each media type in use (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
  • max-compat
    Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
  • max-bundle
    Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track.

rtcpMuxPolicy of type RTCRtcpMuxPolicy.
rtcp-mux policy to use when gathering ICE candidates.

certificates of type sequence
A set of certificates that the RTCPeerConnection uses to authenticate.

iceCandidatePoolSize of type octet, defaulting to 0
Size of the prefetched ICE pool as defined in [JSEP]

RTCDataChannel

allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.

getStats

allows the web application to retrieve a set of statistics about WebRTC sessions. These statistics data are being described in a separate W3C document

Peer to Peer DTMF

-tbd

Call Setup betweeb WebRTC Endpoints

updates in W3C 13 Dec , 2019

Over the years since its adoption many of the associated tech were depricated from the Webrtc based platforms and enviornments , some of which are: OAuth as a credential method for ICE servers
Negotiated RTCRtcpMuxPolicy (previously marked at risk)
voiceActivityDetection
RTCCertificate.getSupportedAlgorithms()
RTCRtpEncodingParameters: ptime, maxFrameRate, codecPayloadType, dtx, degradationPreference
RTCRtpDecodingParameters: encodings
RTCDatachannel.priority

Some of the newly added featufres include:

restartIce() method added to RTCPeerConnection
Introduced the concept of “perfect negotiation”, with an example to solve signaling races.
Implicit rollback in setRemoteDescription to solve races.
Implicit offer/answer creation in setLocalDescription to solve races.

References :

WebRTC 1.0: Real-time Communication Between Browsers – W3C Candidate Recommendation 13 December 2019https://www.w3.org/TR/webrtc/

SIP VoIP system Architecture

SIP solutioning and architectures  is a subsequent article after SIP introduction, which can be found here.

A VOIP Solution is designed to accommodate the signalling and media both along with integration leads to various external endpoints such as various SIP phones ( desktop, softphones , webRTC ) ,  telecom carriers  , different voip network providers  , enterprise applications  ( Skype , Microsoft Lync  ), Trunks etc .

A sufficiently capable SIP platform should consist of following features :

  • audio calls ( optionally video )
  • media services such as conferencing, voicemail, and IVR,
  • messaging as IM and presence based on SIMPLE,
  • programmable services through standardized APIs and development of new modules
  • near-end and far-end NAT traversal for signalling and media flows
  • interconnectivity with other IP multimedia systems, VoLTE ( optional interconnection with other types of communications networks as GSM or PSTN/ISDN)
  • registry , location and lookup service
  • Backend support like Redis, MySQL, PostgreSQL, Oracle, Radius, LDAP, Diameter
  • serial and parallel forking
  • support for Voip signalling protocols (SIP, H,323, SCCP, MGCP, IAX) and telephony signalling protocols ( ISDN/SS7, FXS/FXO, Sigtran ) either internally via pluggable modules or externally via gateways

Performnace factors :

  • High availability using redundant servers in standby
  • Load balancing
  • IPv4 and IPv6 network layer support
  • TCP , UDP , SCTP transport layer protocol support
  • DNS lookups and hop by hop connectvity

Security considerations :

  • authentication, authorization, and accounting (AAA)
  • Digest authentication and credentials fetched from backend
  • Media Encryption
  • TLS and SRTP support
  • Topology hidding to prevent disclosing IP form internal components in via and route headers
  • Firewalls , blacklist, filters , peak detectors to prevent Dos and Ddos attacks

The article only outlines SIP system architecture  from 3 viewpoints :

  • from Infrastructure standpoint
  • from core voice engineering perspective
  • and accompanying external components required to run and system

Infrastructure Requirements

  • Data Centers with BCP ( Business Continuity Planning ) and DR ( Disaster Recovery )
  • Servers and Clusters for faster and parallel calculating
  • Virtualization
    VMs to make a distributed computing environment with HA ( high availability ) and DRS ( Distributed Resource Scheduling )
  • Storage
    SAN with built-in redundancy for the resiliency of data.
    WORM compliant NAS for storing voice archives over a retention period.
  • Racks, power supplies, battery backups, cages etc.
  • Networking
    DMZs ( Demilitarized Zones)  which are interfacing areas between internal servers in the green zone and outside network
    VLANs for segregation between tenants.
    Connectivity through the public Internet as well as through VPN or dedicated optical fibre network for security.
  • Firewall configuration
  • Load Balancer ( Layer 7 )
  • Reverse Proxies for the security of internal IPs and port
  • Security controls In compliance with ISO/IEC 27000 family – Information security management systems
  • PKI Infrastructure to manage digital certificates
  • Key management with HSM ( hardware security module )
  • truster CA ( Certificate Authority ) to issue publicly signed certificate for TLS ( Https, wss etc)
  • OWASP ( Open Web Application Security Project )  rules compliance

Integral Components of a VOIP SIP based architecture

  • Call Controller
  • Media Manager
  • Recording
  • Softclients
  • logs and PCAP archives
  • CDR generators
  • Session Borer Controllers ( SBCs)

Types of SIP servers are listed below . It is important to understand the roles a SIP server can be moulded to take up which in turn defines its placement in overall voip communication platform such as stateless proxy servers on the border , application and B2BUA server at the core etc

SIP Gateways:

sip entities
SIP platform components

A SIP gateway is an application that interfaces a SIP network to a network utilising another signalling protocol. In terms of the SIP protocol, a gateway is just a special type of user agent, where the user agent acts on behalf of another protocol rather than a human. A gateway terminates the signalling path and can also terminate the media path .

sip gaeways

To PSTN for telephony inter-working
To H.323 for IP Telephony inter-working
Client – originates message
Server – responds to or forwards message

Logical SIP entities are:

  • User Agent Client (UAC): Initiates SIP requests  ….
  • User Agent Server (UAS): Returns SIP responses ….
  • Network Servers ….

Registrar Server

A registrar server accepts SIP REGISTER requests; all other requests receive a 501 Not Implemented response. The contact information from the request is then made available to other SIP servers within the same administrative domain, such as proxies and redirect servers. In a registration request, the To header field contains the name of the resource being registered, and the Contact header fields contain the contact or device URIs.

regsitrar server

Proxy Server

A SIP proxy server receives a SIP request from a user agent or another proxy and acts on behalf of the user agent in forwarding or responding to the request. Just as a router forwards IP packets at the IP layer, a SIP proxy forwards SIP messages at the application layer.

Typically proxy server ( inbound or outbound) have no media capabilities and ignore the SDP . They are mostly bypassed once dialog is established but can add a record-route .
A proxy server usually also has access to a database or a location service to aid it in processing the request (determining the next hop).

proxy server

 1. Stateless Proxy Server
A proxy server can be either stateless or stateful. A stateless proxy server processes each SIP request or response based solely on the message contents. Once the message has been parsed, processed, and forwarded or responded to, no information (such as dialog information) about the message is stored. A stateless proxy never retransmits a message, and does not use any SIP timers

2. Stateful Proxy Server
A stateful proxy server keeps track of requests and responses received in the past, and uses that information in processing future requests and responses. For example, a stateful proxy server starts a timer when a request is forwarded. If no response to the request is received within the timer period, the proxy will retransmit the request, relieving the user agent of this task.

  3 . Forking Proxy Server
A proxy server that receives an INVITE request, then forwards it to a number of locations at the same time, or forks the request. This forking proxy server keeps track of each of the outstanding requests and the response. This is useful if the location service or database lookup returns multiple possible locations for the called party that need to be tried.

Redirect Server

A redirect server is a type of SIP server that responds to, but does not forward, requests. Like a proxy server, a redirect server uses a database or location service to lookup a user. The location information, however, is sent back to the caller in a redirection class response (3xx), which, after the ACK, concludes the transaction. Contact header in response indicates where request should be tried .

redirect server

Application Server

The heart of all call routing setup. It loads and executes scripts for call handling at runtime and maintains transaction states and dialogs for all ongoing calls . Usually the one to rewrite SIP packets adding media relay servers, NAT . Also connects external services like Accounting , CDR , stats to calls .

Developing SIP based applications

Basic SIP methods

SIP defines basic methods such as INVITE, ACK and BYE which can pretty much handle simple call routing with some more advanced processoes too like call forwarding/redirection, call hold with optional Music on hold, call parking, forking, barge etc.

Extending SIP headers

Newer SIP headers defined by more updated SIP RFC’s contina INFO, PRACK, PUBLISH, SUBSCRIBY, NOTIFY, MESSAGE, REFER, UPDATE. But more methods or headers can be added to baseline SIP packets for customization specific to a particular service provider. In case where a unrecognized SIP header is found on a SIP proxy which it either does not suppirt or doesnt understand, it will simply forward it to the specified endpoint.

Call routing Scripts

Interfaces for programming SIP call routing include :
– Call Processing Language—SIP CPL,
– Common Gateway Interface—SIP CGI,
– SIP Servlets,
– Java API for Integrated Networks—JAIN APIs etc .

Some known SIP stacks :

SailFin – SIP servlet container uses GlassFish open source enterprise Application Server platform (GPLv2), obsolete since merger from Sun Java to Oracle.

Mobicents – supports both JSLEE 1.1 and SIP Servlets 1.1 (GPLv2)

Cipango – extension of SIP Servlets to the Jetty HTTP Servlet engine thus compliant with both SIP Servlets 1.1 and HTTP Servlets 2.5 standards.

WeSIP – SIP and HTTP ( J2EE) converged application server build on OpenSER SIP platform

Additionally SIP stacks are supported on almost all popular SIP programming lanaguges which can be imported as lib as used for building call routing scripts to be mounted on SIP servers or endpoints such as :

PJSIP in C

JSSIP Javascript

Sofia in kamailio , Freswitch

Some popular SIP server also have proprietary scripting language such as
Asterisk Gateway Interface (AGI) , application interface for extending the dialplan with your functionality in the language you choose – PHP, Perl, C, Java, Unix Shell and others

Adding Media Management

Media processing is usually provided by media servers in accordance to the SIP signalling. Bridges, call recording, Voicemail, audio conferencing, and interactive voice response (IVR) are commomly used.

Read more about Media Architecture here

RFC 6230 Media Control Channel Framework decribes framework and protocol for application deployment where the application programming logic and media processing are distributed

Any one such service could be a combination of many smaller services within such as Voicemail is a combitional of prompt playback, runtime controls, Dual-Tone Multi-Frequency (DTMF) collection, and media recording. RFC 6231 Interactive Voice Response (IVR) Control Package for the Media Control Channel Framework.

RTP ( Real Time Transport Protocol )

RTP handles realtime multimedia transport between end to end network components . RFC 3550 .

Image result for RTP packet structure

Packet structure of RTP     

RTP Header contain timestamp , name of media source , codec type and sequence number .

Image result for RTP header structure

RTCP

– tbd

DTMF( Dual tone Multi Frequency )

delivery options:

  • Inband –  With Inband digits are passed along just like the rest of your voice as normal audio tones with no special coding or markers using the same codec as your voice does and are generated by your phone.
  • Outband  – Incoming stream delivers DTMF signals out-of-audio using either SIP-INFO or RFC-2833 mechanism, independently of codecs – in this case, the DTMF signals are sent separately from the actual audio stream.

TTS ( Text to Speech )

 Alexa Text-to-Speech (TTS) + Amazon Polly

Ivona – multiple language text to speech converter with ssml scripts such as below

      <speak>
          <p>
              <s><prosody rate="slow">IVONA</prosody> means highest quality speech
              synthesis in various languages.</s>
              <s>It offers both male and female radio quality voices <break/> at a
              sampling rate of 22 kHz <break/> which makes the IVONA voices a
              perfect tool for professional use or individual needs.</s>
          </p>
      </speak>

check ivona status

service ivona-tts-http status
 tail -f /var/log/tts.log

Collecting and Processing PCAPS

  • VoIP monitor – network packet sniffer with commercial frontend for SIP RTP RTCP SKINNY(SCCP) MGCP WebRTC VoIP protocols

it uses a passive network sniffer (like tcpdump or wireshark) to analyse packets in realtime and transforms all SIP calls with associated RTP streams into database CDR record which is sent over the TCP to MySQL server (remote or local). If enabled saving SIP / RTP packets the sniffer stores each VoIP call into separate files in native pcap format (to local storage).

voip monitor
  • sngrep
  • tcpdump
  • custom made pcap capture and uploader

SIP platform Development

A sufficiently capable SIP platform shoudl consist of following features :

  • audio calls ( optionally video )
  • media services such as conferencing, voicemail, and IVR,
  • messaging as IM and presence based on SIMPLE,
  • programmable services through standardized APIs and development of new modules
  • near-end and far-end NAT traversal for signalling and media flows
  • interconnectivity with other IP multimedia systems, VoLTE ( optional interconnection with other types of communications networks as GSM or PSTN/ISDN)
  • registry , location and lookup service
  • serial and parallel forking

Performance factors :

  • High availability using redundant servers in standby
  • Load balancing
  • IPv4 and IPv6 support

Security considerations :

  • digest authentication and credentials fetched from backend
  • Media Encryption
  • TLS and SRTP support
  • Topology hiding to prevent disclosng IP form internal components in via and route headers
  • Firewalls , blacklist, filters , peak detectors to prevent Dos and Ddos attacks

Add NAT and DNS components

To adapt SIP to modern IP networks with inter network traversal ICE, far and near-end NAT traversal solutions are used. Network Address traversal is crtical to traffic flow between private public network and from behind firewalls and policy controlled networks
One can use any of the VOVIDA-based STUN server, mySTUN , TurnServer, reStund , CoTURN , NATH (PJSIP NAT Helper), ReTURN, or ice4j

Near-end NAT traversal

STUN (session traversal utilities for NAT) – UA itself detect presence of a NAT and learn the public IP address and port assigned using Nating. Then it replaces device local private IP address with it in the SIP and SDP headers. Implemented via STUN, TURN, and ICE.
limitations are that STUN doesnt work for symmetric NAT (single connection has a different mapping with a different/randomly generated port) and also with situations when there are multiple addresses of a end point.

TURN (traversal using relay around NAT) or STUN relay – UA learns the public IP address of the TURN server and asks it to relay incoming packets. Limitatiosn since it handled all incoming and outgong traffic , it must scale to meet traffic requirments and should not become the bottle neck junction or single point of failure.

ICE (interactive connectivity establishment) – UA gathers “candidates of communication” with priorities offered by the remote party. After this client pairs local candidates with received peer candidates and performs offer-answer negotiating by trying connectivity of all pairs, therefore maximising success. The types of candidates :
– host candidate who represents clients’ IP addresses,
– server reflexive candidate for the address that has been resolved from STUN
– and a relayed candidate for the address which has been allocated from a TURN relay by the client.

Far-end NAT traversal

UA is not concerned about NAT at all and communicated using its local IP port. The border controller implies a NAT handling components such as an application layer gateway (ALG) or universal plug and play (UPnP) etc which resolves the private and public network address mapping by act as a back to back user agent (B2BUA).
Far end NAT can also be enabled by deploying a public SIP server which performs media relay (RTP Proxy/Media proxy).

Limitations of this approach
– security risks as they are operating in the public network
– enabling reverse traffic from UAS to UAC behind NAT.

A keep-alive mechanism is used to keep NAT translations of communications between SIP endpoint and its serving SIP servers opened , so that this NAT translation can be reused for routing. It contains client-to-server “ping” keep-alive and corresponding server-to-client “pong” messages. The 2 keep-alive mechanisms: a CRLF keep-alive and a STUN keep-alive message exchange.

The 3 types of SIP URIs,

  • address of record (AOR)
  • fully qualified domain name (FQDN)
  • globally routable user agent (UA) URI
    SIP uniform resource identifiers (URIs) are identified based on DNS resolution since the URI after @ symbol contains hostname , port and protocl for the next hop.

Adding record route headers for locating the correct SIP server for a SIP message can be done by :
– DNS service record (DNS SRV)
– naming authority pointer (NAPTR) DNS resource record

Steps for SIP endpoints locating SIP server

  1. From SIP packet get the NAPTR record to get the protocl to be used
  2. Inspect SRV record to fetch port to use
  3. Inspect A/AAA record to get IPv4 or IPv6 addresses
    ref : RFC 3263 – Locating SIP Servers
    Can use BIND9 server for DNS resolution supports NAPTR/SRV, ENUM, DNSSEC, multidomains, and private trees or public trees.

Cross platform and integration to External Telecommunication provider landscape

connection to IMS such as openIMS
support for Voip signalling protocols (SIP, H,323, SCCP, MGCP, IAX) and telephony signalling protocls ( ISDN/SS7, FXS/FXO, Sigtran ) either internally via pluggable modules or externally via gateways

Database Integration

Need backend , cache , databse integration to npt only store routing rules with temporary varaible values but also account details , call records details, access control lists etc. Should therefore extend integartion with text based db, redis, MySQL, PostrgeSQL, OpenLDAP, and OpenRadius.

The obvious starting milestone before making a full scale carrier grade, SIP based VoIP system is to start by building a PBX for intra enterprise communication. There are readily available solutions to make a IP telephony PBX kamailio , freeswitch , asterisk , Elastix , SipXecs

Call Rate and Accounting

Generally data streams proecssing are used for crtical and voluminious service usage like for
– metering/billing
– server activity,
– website clicks,
– geo-location of devices, people, and physical goods

Call Rates are very crticial for billing and charging the calls . Any updates from the customer or carriers or individuals need to propagate automatically and quickly to avoid discrpencies and neagtive margins. CDRs need to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and used for a wide variety of analytics including correlations, aggregations, filtering, and sampling.

To acheieve this the follow setup is ideal to use the new input rate sheet values via web UI console or POST API and propagate it quickly to main DB via AWS SQS which is a queing service and AWS lamda which is a serverless trigger based system . This ensures that any new input rates are updates in realtime and maintin fallback values in s3 bucket too

CDR Processing and Billing

CDR store call detail records along with proof of call with tiemstamps , orignation , destination , duaration , rate etc. At the end of month or any other term , the aggregated CDR are cumulatively processed to generate the bill for a user . This heavy data stream needs to be accurately processed and this can be achiveed by using datapipelines like AWS kinesis or Kafka eventstore .

The prime requirnment for the system is to handle enormous amount of call records data in relatime , cater to a number of producers and consumers .

For security the data is obfuscated into blob using base 64 encoding

AWS kinesis – Kinesis Data Streams is sued for for rapid and continuous data intake and aggregation. The type of data used can include IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data

Pros of data streams

This system can handle high volume of data in realtime and produce call uuid specfic reults which can be consumed by consumers waiting for the processed results

Cons of data streams

If not consumed with a pre-specified time duration the processed results expire and are irretrivable . Self implement publisher to store teh processed reults from kisesis stream to data stores like Redis / RDBMS or other storge locations like s3 , dynamo DB. If pieline crashes during operation , data is lost

Data stream should have low latency igesting contnous data from producer and presenting data to consumer .

It should support data sharding ie number of call records grouped and uses a partition Key ( string MD5 hash) to determine which shard the record goes to. 


There are other external components to setup a VOIP solution apart from Core voice Servers and gateways like the ones listed below, I will try to either add a detailed overall architecture diagram here or write about them in an seprate article . Keep watching this space for updates

  • Payment Gateways
  • Billing and Invoice
  • Fraud Prevention
  • Contacts Integration
  • Call Analytics
  • API services
  • Admin Module
  • Number Management ( DIDs ) and porting
  • Call Tracking
  • Single Sign On and User Account Management with Oauth and SAML
  • Dashboards and Reporting
  • Alert Management
  • Continuous Deployment
  • Automated Validation
  • Queue System
  • External cache

Read about VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform which includes :

  • Scalable and Flexible SIP platform building
  • Cluster SIP telephony Server for High Availability
  • Failure Recovery
  • Multi-tier cluster architecture
  • Role Abstraction / Micro-Service based architecture
  • Distributed Event management and Event-Driven architecture
  • Containerization
  • Autoscaling Cloud Servers
  • Open standards and Data Privacy
  • Flexibility for inter-working – NextGen911 , IMS , PSTN
  • security and Operational Efficiencies

References :

AWS kinesis –https://docs.aws.amazon.com/streams/latest/dev/introduction.html

AWazon docs streaming data – https://aws.amazon.com/streaming-data/

VOIP monitor Archietcture – https://www.voipmonitor.org/doc/Architecture

TTS Ivona – http://developer.ivona.com/en/ttsresources/ssml/ssml.html