SIP and SDP Messages Explained

SIP is a widely adopted application layer protocol used in VoIP calls and confernecing applciations and in IMS architeture or pure packet switched networks .

More on SIP , its packet structure , transaction and dialogs , loose and strict record routing , location service , near and far end nating , and commonly used SIP Call flows like Redirection , forking , click to Dial – https://telecom.altanai.com/2013/07/13/sip-session-initiaion-protocol/(opens in a new tab)

SIP Request and Repsosnes

Traditional SIP headers for Call setup are INVITE, ACK and teardown are CANCEL or BYE , however with more adoption newer methods specific to services were added such as :

MESSAGE Methods for Instant Message based services
SUBSCRIBE, NOTIFY standardised by Event notification extension RFC 3856
PUBLISH to push presence information to the network

Outlining the SIP Requests and Responses in tables below,

Request Message

Request Message

Description

REGISTERA Client use this message to register an address with a SIP server
INVITEA User or Service use this message to let another user/service participate in a session. The body of this message would include a description of the session to which the callee is being invited.
ACKThis is used only for INVITE indicating that the client has received a final response to an INVITE request
CANCELThis is used to cancel a pending request
BYEA User Agent Client use this message to terminate the call
OPTIONSThis is used to query a server about its capabilities

Response Message

Code

Category

Description

1xxProvisionalThe request has been received and processing is continuing
2xxSuccessAn ACK, to indicate that the action was successfully received, understood, and accepted.
3xxRedirectionFurther action is required to process this request
4xxClient ErrorThe request contains bad syntax and cannot be fulfilled at this server
5xxServer ErrorThe server failed to fulfill an apparently valid request
6xxGlobal FailureThe request cannot be fulfilled at any server

SIP headers

Display names

From originators sipuri

CSeq or Command Sequence contains an integer and a method name. The CSeq number is incremented for each new request within a dialog and is a traditional sequence number.

Contact – SIP URI that represents a direct route to the originator usually composed of a username at a fully qualified domain name (FQDN) , also IP addresses are permitted. The Contact header field tells other elements where to send future requests.

Max-Forwards -to limit the number of hops a request can make on the way to its destination. It consists of an integer that is decremented by one at each hop.

Content

Content-Type – description of the message body.

Content-Type: application/h.323 
Content-Type: message/sip   
Content-Type: application/sdp

Content-Type: multipart/signed;
        protocol="application/pkcs7-signature";
        micalg=sha1; boundary=boundary42

Content-Type: application/pkcs7-signature; name=smime.p7s

Content Encoding

Content-Encoding: text/plain

Content Language

Content-Language: en

Content-Length – an octet (byte) count of the message body.

Content-Disposition

describes how the message body or, for multipart messages, a message body part is to be interpreted by the UAC or UAS. It extends the MIME Content-Type

Disposition Types :

  • “session” – body part describes a session, for either calls or early (pre-call) media
  • “render” – body part should be displayed or otherwise rendered to the user.
  • “icon” – body part contains an image suitable as an iconic representation of the caller or callee
  • “alert” – body part contains information, such as an audio clip

Accept

Accept – acceptable formats like application/sdp or currency/dollars

Header field where proxy ACK BYE CAN INV OPT REG

Accept R - o - o m* o
Accept 2xx - - - o m* o
Accept 415 - c - c c c

An empty Accept header field means that no formats are acceptable.

Accept-Encoding

Accept-Encoding R - o - o o o
Accept-Encoding 2xx - - - o m* o
Accept-Encoding 415 - c - c c c

Accept-Language : languages for reason phrases, session descriptions, or status responses carried as message bodies in the response.

Accept-Language: da, en-gb;q=0.8, en;q=0.7
Accept-Language R - o - o o o
Accept-Language 2xx - - - o m* o
Accept-Language 415 - c - c c c

Tag globally unique and cryptographically random with at least 32 bits of randomness. identify a dialog, which is the combination of the Call-ID along with two tags ( from To and FROM headers )

Call-Id uniquely identify a session

contact – sip url alternative for direct routing

Encryption

Expires – when msg content is no longer valid

Mandatory SIP headers

INVITE sip:altanai@domain.comSIP/2.0
Via: SIP/2.0/UDP host.domain.com:5060
From: Bob
To: Altanai
Call-ID: 163784@host.domain.com
CSeq: 1 INVITE

Informational headers

Call-Info additional information for example, through a web page. The “card” parameter provides a business card, for example, in vCard [36] or LDIF [37] formats. Additional tokens can be registered using IANA

Call-Info: http://wwww.example.com/alice/photo.jpg ;purpose=icon,http://www.example.com/alice/ ;purpose=info

Contact
Contact: “Mr. Watson” ;q=0.7; expires=3600,
“Mr. Watson” watson@bell-telephone.com ;q=0.1 m: ;expires=60

Priority indicates the urgency of the request as perceived by the client.
can have the values “non-urgent”, “normal”, “urgent”, and “emergency”, but additional values can be defined elsewhere

Subject: A tornado is heading our way!
Priority: emergency

or

Subject: Weekend plans
Priority: non-urgent

Subject summary or indicates the nature of call

Subject: Need more boxes
s: Tech Support

Supported enumerates all the extensions supported. can contain list of option tags, described

Supported: 100rel
k: 100rel

Unsupported features not supported

Unsupported: foo

User-Agent information about the UAC originating the request.

User-Agent: Softphone Beta1.5

Organization conveys the name of the organization to which the SIP element issuing the request or response belongs.

Organization: AltanaiTelecom Co.

Warning additional information about the status of a response.
List of warn-code

  • 300 Incompatible network protocol:
  • 301 Incompatible network address formats:
  • 302 Incompatible transport protocol:
  • 303 Incompatible bandwidth units:
  • 304 Media type not available:
  • 305 Incompatible media format:
  • 306 Attribute not understood:
  • 307 Session description parameter not understood:
  • 330 Multicast not available:
  • 331 Unicast not available:
  • 370 Insufficient bandwidth:
  • 399 Miscellaneous warning:
  • 1xx and 2xx have been taken by HTTP/1.1.

Warning: 307 isi.edu “Session parameter ‘foo’ not understood”
Warning: 301 isi.edu “Incompatible network address type ‘E.164′”

Authetication and Authorization related headers

Authentication-Info mutual authentication with HTTP Digest. A UAS MAY include this header field in a 2xx response to a request that was successfully authenticated using digest based on the Authorization header field.

Authentication-Info: nextnonce=”47364c23432d2e131a5fb210812c”

Authorization authentication credentials of a UA

Authorization: Digest username=”Alice”, realm=”atlanta.com”, nonce=”84a4cc6f3082121f32b42a2187831a9e”, response=”7587245234b3434cc3412213e5f113a5432″

Proxy-Authenticate contains an authentication challenge.

Proxy-Authenticate: Digest realm=”atlanta.com”,domain=”sip:ss1.carrier.com”, qop=”auth”,
nonce=”f84f1cec41e6cbe5aea9c8e88d359″,opaque=””, stale=FALSE, algorithm=MD5

Timers

exponential back-off on re-transmissions 

Session Expire Header Feild

limit the time period over which a stateful proxy must maintain state information.
options

  • User agents must tear down the call after the expiration of the timer , or
  • aller can send re-INVITEs to refresh the timer, enabling a “keep alive” mechanism for SIP.

SDP (Session Description Protocol)

SIP can bear many kinds of MIME attachments , one such is SDP. It is a standard for protocol definition for exchange of media , metadata and other transport realted attributes between the particpants before establishing a VoIP call.

SDP session description is entirely textual using the ISO 10646 character set in UTF-8 encoding and described by application/SDP media type.

It should be noted that SDP itself does not incorporate a transport protocol and can be used with difference protocls like Session announcement proctols (SAP) , SIP , HTTP , Electronic MAIl MIME extension, RTSP etc.

In case of SIP SDP is encapsulated inside of SIP packet and use offer/answer model to convey information about media stream in multimedia session.

SDP body contains 2 parts : session based section starting with v= line and media bsesction starting with m= line
Media and Transport Information can contain type of media like video, audio , transport protocol like RTP/UDP/IP, H.320 and format of the media such as H.261 video, MPEG video, etc.

Session Description in SDP

protocol version ( v= ) protocol version mostly version 0

originator and session identifier ( o= )

o= < username > <session-id> <session-version> <net-type> <addr-type> <unicast address>
o=- 6476888576284874344 2 IN IP4 127.0.0.1

session name ( s=) and session information ( i= ) session name is textual and can contain empty space or even s=- but must not be empty. Session infomration is optional textual information about the session

URI of description ( u = )

Email Address and Phone Number (“e=” and “p=”)

Both are optional free text string SHOULD be in the ISO-10646 character set with UTF-8 encoding

Nothe that if given the Phone numbers SHOULD follow international public telecommunication number specification ( ITU-T Recommendation E.164) and be preceded by a “+”. Spaces and hyphens may be used to split up a phone field to aid readability if desired.

e=Jane Doe j.doe@example.com
p=+1 617 555-6011

Connection Data ( c= ) connection information — not required if included in all media in which media specific connecion data override overall session connection data

c= <net-type> <addr-type> <connection-address>

c=IN IP4 172.31.90.251

If the session is multicast, the connection address will be an IP multicast group address . TTL shoudl be present in IPv4 multicast address .
If connection is unicast the address contains the unicast IP address of the expected data source or data relay or data sink .

Bandwidth ( b= ) interpreted as kilobits per second by default

b= <bwtype> : <bandwidth>

Encryption Keys ( k= ) Only is SDP is exchanged in secure and trusted channel, keys va be excahnged on this SDP field . Although this process is not recomended,

k= clear:< encryption key >
k= base64:< encoded encryption key >
k= uri:< URI to obtain key >
k= prompt

Attributes ( a= )

extends the SDP with values like flags

a=inactive , a=sendonly , a=sendrecv , a=recvonly

Mapping the Encoder Spec from

a=rtpmap: < payload type > < encoding name >/ < clock rate > [/ ]

a=rtpmap:96 opus/48000/2
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:9 G722/8000
a=rtpmap:101 telephone-event/48000
a=rtpmap:97 telephone-event/8000

Conferenec Type like “broadcast”, “meeting”, “moderated”, “test”,

a=type: < conf type>

Orientation portrait or landscape for whiteboard session

a=orient:  <orientation>

ICE candidates

a=ice-pwd:86701d63e2d96ec42268679a
a=ice-ufrag:948a1316
a=rtcp-12133xr:rcvr-rtt=all:10000 stat-summary=loss,dup,jitt,TTL voip-metrics

Frame per second for video

a=framerate:

Quality between 0 – 10 ( 10 best still image , 5 default , 0 wrst )

a= quality: < quality >

Format specific Parameters

a=fmtp: <format> <parameters>
a=rtpmap:114 AMR-WB/16000/1
a=fmtp:114 mode-change-capability=2;max-red=220

a=rtpmap:113 AMR-WB/16000/1
a=fmtp:113 octet-align=1;mode-change-capability=2;max-red=220

a=rtpmap:102 AMR/8000/1
a=fmtp:102 mode-change-capability=2;max-red=220

a=rtpmap:115 AMR/8000/1
a=fmtp:115 octet-align=1;mode-change-capability=2;max-red=220

a=rtpmap:105 telephone-event/16000
a=fmtp:105 0-15

a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15

Time Description in SDP

Timing (t =)
time the session is active)

t=<start-time> <stop-time>

If the <stop-time> is set to zero, then the session is not bounded, though it will not become active until after the < start -time>.
If the <start-time> is also zero, the session is regarded as permanent.

t=0 0

Repeat Times ( r= )

zero or more repeat times for scheduling a session

r= <repeat interval> <active duration> <offsets from start-time>

time zone adjustments ( z = )

z= <adjustment time> <offset> <adjustment time> <offset> ….

useful for scejduling session during transation to daylightv saving to standard time and vice versa

Media Description in SDP

For RTP, the default is that only the even-numbered ports are used for data with the corresponding one-higher odd ports used for the RTCP belonging to the RTP session

m= <media> <port> <proto> <fmt> …

m=audio 20098 RTP/AVP 0 101

will stream RTP on 20098 and RTCP on 20099

For multiple transport ports pairs of RTP , RTCP stream are specified

m= <media> <port>/ <number of ports> <proto> <fmt> …

m=audio 20098/2 RTP/AVP 0 101
will stream one pair on RTP 20098 , RTCP 20099 and RTP 20100 , RTCP 20101

If non-contiguous ports are required, they must be signalled using a separate attribute like example, “a=rtcp:”

Additioan SDP features : In addition to normal unicast sessions , SDP can also convery multicast group address for media on IP multicast session. Private (encryption of SDP ) or public session are not treated differently by SDP and they are entorely a function of implementing mechanism like SIP or SAP. Optiopnal SDP params include URI , Categorisation “a=cat:” , Internationalisation etc

Example 1 : Typical Audio call SIP INVITE showing SIP headers in blue and SDP in green below

INVITEnbspsip:01150259917040@x.x.x.x SIP/2.0
 Via: SIP/2.0/UDP x.x.x.x:5060branch=z9hG4bK400fc6e6
 From: "123456789" ltsip:123456789@x.x.x.xgttag=as42e2ecf6
 To: ltsip:01150259917040@x.x.x.x.4gt
 Contact: ltsip:123456789@x.x.x.x4gt
 Call-ID: 2485823e63b290b47c042f20764d990a@x.x.x.x.x
 CSeq: 102 INVITE
 User-Agent:nbspMatrixSwitch
 Date: Thu, 22 Dec 2005 18:38:28 GMT
 Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER
 Content-Type: application/sdp
 Content-Length: 268

 v=0
 o=root 14040 14040 IN IP4 x.x.x.x
 s=session
 c=IN IP4 x.x.x.x
 t=0 0
 m=audio 26784 RTP/AVP 0 8 18 101
 a=rtpmap:0 PCMU/8000
 a=rtpmap:8 PCMA/8000
 a=rtpmap:18 G729/8000
 a=rtpmap:101 telephone-event/8000
 a=fmtp:101 0-16
 a=fmtp:18nbspannexb=no - - - -
 c=* (connection information - optional if included at session-level)
 b=* (bandwidth information)
 a=* (zero or more media attribute lines)

The above SDP shows 4 supported media codecs on audio stream which are 0 PCMU , 8 PCMA , 18 G729 and finally 101 used for telephone events . It also shows RTP/AVP as RTP profile and does not contain any m=cideo line which shows that this endpoint does not want a video call , only an audio one.

Example 2 : Video Vall SIP invite from Linphone

SIP URI Params

Internet Assigned Number Authority (IANA) Universal Resource Identifier (URI) Parameter Registry defines URI params that can be sued along with SIP scheme

sip:user:password@host:port;uri-parameters?headers

comp param

signalling compression of SIP messages

sip:alice@atlanta.com;comp=sigcomp
Via: SIP/2.0/UDP server1.foo.com:5060;branch=z9hG4bK87a7;comp=sigcomp

The aobve exmaple indicates that the request has to be compressed using SigComp

transport-param

SIP can use any network transport protocol. Parameter names are defined for UDP (RFC 768), TCP (RFC 761), and SCTP (RFC 2960).
For a SIPS URI, the transport parameter MUST indicate a reliable transport.

“transport=”  ( “udp” / “tcp” / “sctp” / “tls” / “ws” / other-transport )

sip:alice:secretword@atlanta.com;transport=tcp

maddr paarm

The server address ( detsiantion address , port , transport ) to be contacted for this user, overriding any address derived from the host field.

Although discouraged , maddr URI param has been used as a simple form of loose source routing. It allows a URI to specify a proxy that must be traversed en-route to the destination.

user-param

“user=”  ( “phone”  “ip”  “dialstring”  other-user )

sip:1-212-555-1212:1234@gateway.com;user=phone

sip:123;phone-context=atlanta.example.com@example.com;user=dialstring

method-param

“method=” Method

sip:atlanta.com;method=REGISTER?to=alice%40atlanta.com

annc-parameters (announcement)

ANNC-URL
sip‑ind  annc‑ind  “@”  hostport  annc‑parameters  uri‑parameters

sip:annc@ms.example.net; \
; play=file://fs.example.net//clips/my-intro.dvi; \
; content-type=video/mpeg%3bencode%d3314M-25/625-50

sip-ind - “sip:” / “sips:”

annc-ind - “annc”

annc-parameters
“;”  play‑param
[ “;”  delay‑param ]
[ “;”  duration‑param ]
[ “;”  repeat‑param ]
[ “;”  locale‑param ]
[ “;”  variable‑params ]
[ “;”  extension‑params ]

play-param – “play=”  prompt‑url

prompt-url – “/provisioned/”  announcement‑id

announcement-id = 1*( ALPHA / DIGIT )

content-param “content‑type=”  MIME‑type

VoiceXML Media Services

dialog-param
“voicexml=”  vxml-url ;  vxml-url follows the URI syntax

method-param – “method=”  ( “get” / “post” )

postbody-param- “postbody=”  token

ccxml-param – “ccxml=”  json‑value

aai-param- “aai=”  json‑value

json-value – false / null / true / object / array / number / string

sip:dialog@mediaserver.example.com; \
voicexml=http://appserver.example.com/promptcollect.vxml; \
maxage=3600;maxstale=0

dialog-params (prompt and collect)

DIALOG-URL = sip-ind  dialog-ind  “@”  hostport  dialog‑parameters

ttl-param (time-to-live)

ttl parameter determines the time-to-live value of the UDP multicast packet and MUST only be used if maddr is a multicast address and the transport protocol is UDP.

sip:alice@atlanta.com;maddr=239.255.255.1;ttl=15

cause param

“cause” EQUAL Status-Code
; 404 Unknown/Not available
; 486 User busy
; 408 No reply
; 302 Unconditional
; 487 Deflection during alerting
; 480 Deflection immediate response
; 503 Mobile subscriber not reachable
; 380 Service number translation   RFC 8119 – Section 2

sip:voicemail@example.com;target=bob%40example.com;cause=486

SIP Responses

1xx—Provisional Responses

response that tells to its recipient that the associated request was received but result of the processing is not known yet which could be if the processing hasnt finished immediately. The sender must stop retransmitting the request upon reception of a provisional response.

100 Trying
180 Ringing : Triigers a local ringing at callers device
181 Call is Being Forwarded : Used before tranefering to another UA such as during forking or tranfer to voice mail Server

182 Queued

183 Session in Progress : conveys information . Headers field or SDP body has mor details about the call. Used in announcements and IVR + DTMF too by being followed by “Early media”.

199 Early Dialog Terminated

2xx—Successful Responses

final responses express result of the processing of the associated request and they terminate the transactions.

200 OK
202 Accepted
204 No Notification

3xx—Redirection Responses

Redirection response gives information about the user’s new location or an alternative service that the caller should try for the call. Used for cases when the server cant satisfy the call and wants the caller to try elsewhere . After this the caller is suppose to resend the request to the new location.

300 Multiple Choices
301 Moved Permanently
302 Moved Temporarily
305 Use Proxy
380 Alternative Service

4xx—Client Failure Responses

negative final responses indicating that the request couldn’t be processed  due to callers fault , for reasons such as t contains bad syntax or cannot be fulfilled at that server.

400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Conditional Request Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Unsupported URI Scheme
417 Unknown Resource-Priority
420 Bad Extension
421 Extension Required
422 Session Interval Too Small
423 Interval Too Brief
424 Bad Location Information
428 Use Identity Header
429 Provide Referrer Identity
430 Flow Failed
433 Anonymity Disallowed
436 Bad Identity-Info
437 Unsupported Certificate
438 Invalid Identity Header
439 First Hop Lacks Outbound Support
470 Consent Needed
480 Temporarily Unavailable
481 Call/Transaction Does Not Exist
482 Loop Detected.
483 Too Many Hops
484 Address Incomplete
485 Ambiguous
486 Busy Here
487 Request Terminated
488 Not Acceptable Here
489 Bad Event
491 Request Pending
493 Undecipherable
494 Security Agreement Required

5xx—Server Failure Responses

negative responses but indicating that fault is at server’s side for cases such as server cant or doesnt want to respond the the request.

500 Server Internal Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Server Time-out
505 Version Not Supported
513 Message Too Large
580 Precondition Failure

6xx—Global Failure Responses

request cannot be fulfilled at any server with definitive information

600 Busy Everywhere
603 Decline
604 Does Not Exist Anywhere
606 Not Acceptable

Mandatory SIP headers in SIP respone

SIP/2.0 200 OK
Via: SIP/2.0/UDP host.domain.com:5060
From: Bob<sip:bob@domain.com>
To: Altanai<sip:altanai@domain.com>
Call-ID: 163784@host.domain.com
CSeq: 1 INVITE

Via, From, To, Call-ID , and  CSeq   are copied exactly from request

You can read more about SIP based Architecture here :SIP based architecture

Re-INVITE and Target-Refresh Request Handling

An INVITE request sent within an existing dialog is known as a re-INVITE. A re-Invite has an offer-answer exchange and can be used to do the following

  • change the session and/or dialog params
  • change the port to which media should be sent.
  • change the connection address or media type.
  • Hold/Release and SUSPEND/RESUME rtp streams (connection address is zero).
  • FAX (T.38 and Bypass).

Re-INVITE with SDP useCases

1.UAS rejects all changes in params in re-INVITE

Situtaion where UAC establishes audio only call
SDP1: m=audio 30000 RTP/AVP 0

but later wants to upgrade to video as well SDP:

m=audio 30000 RTP/AVP 0
m=video 30002 RTP/AVP 31

UAS configured to reject video streams, can reject this with a 4XX error and get ACK .
No changes to session are made

2. UAS receives re-INVITE for param but wants to accept few and reject others, it sends back SDP with acceptable changes with 200 OK

For instance UAC moves to high bandwidth access point and wants to update IP of media stream . It also wanst to add video stream

initial SDP

 m=audio 30000 RTP/AVP 0
c=IN IP4 192.0.2.1

new SDP in reINVITE

 m=audio 30000 RTP/AVP 0
c=IN IP4 192.0.2.2
m=video 30002 RTP/AVP 31
c=IN IP4 192.0.2.2

UAS returns a 200 (OK) response to accept IP but sets the port of the video stream to zero in its SDP to show rejected of video stream.

m=audio 31000 RTP/AVP 0
c=IN IP4 192.0.2.5
m=video 0 RTP/AVP 31

another example is when UAC wwants to add another audio codec and also add video stream to session

orignal SDP

m=audio 30000 RTP/AVP 0
c=IN IP4 192.0.2.1

re-invite SDP

 m=audio 30000 RTP/AVP 0 3
c=IN IP4 192.0.2.1
m=video 30002 RTP/AVP 31
c=IN IP4 192.0.2.1

again the UAS will optionally accept the some param canges like audio code but set video to null IP address

m=audio 31000 RTP/AVP 0 3
c=IN IP4 192.0.2.5
m=video 31002 RTP/AVP 31
c=IN IP4 0.0.0.0 

3. UAS receives re-INVITE but waits for user intervention

UAS receives re-INVITE to add video , but instead of rejecting , it prompts user to permit.

So UAS provides a null IPaddress instead of setting the stream to ‘inactive’ because inactive streams still need to exchange RTP Control Protocol (RTCP) traffic

 m=audio 31000 RTP/AVP 0
c=IN IP4 192.0.2.5
m=video 31002 RTP/AVP 31
c=IN IP4 0.0.0.0

Later if user rejects the addition of the video stream. Consequently, the UAS sends an UPDATE request (6) setting the port of the video stream to zero in its offer.

 m=audio 31000 RTP/AVP 0
c=IN IP4 192.0.2.5
m=video 0 RTP/AVP 31
c=IN IP4 0.0.0.0

References:

SIP VoIP system architecture basics


A VOIP/CPaaS solution is designed to accommodate the signalling and media both along with integration leads to various external endpoints such as various SIP phones ( desktop, softphones, webRTC ), telecom carriers, different VoIP networks providers, enterprise applications ( Skype, Microsoft Lync ), Trunks etc.

A sufficiently capable SIP platform should have

  1. Audio calls ( optionally video ) service using SIP gateways
  2. Media services (such as recording , conferencing, voicemail, and IVR )
  3. Messaging and presence ( could be using SIP SIMPLE, SMS , messahing service from third parties)
  4. Developing SIP based applications : Programmable services through standardized APIs and development of new modules
  5. NAT and DNS near-end and far-end NAT traversal for signalling and media flows
  6. Telemetry for Sessions , Registry, Location and lookup service
  7. CDR Processing and Billing : Backend for CDR and accounts ( can use Redis, Kafka , MySQL, PostgreSQL, Oracle, Radius, LDAP, Diameter)
  8. Serial and parallel forking, load balancing , proxying
  9. Cross platform and integration to External Telecommunication provider landscape
    • Interconnectivity with other IP multimedia systems, VoLTE ( optional interconnection with other types of communications networks as GSM or PSTN/ISDN).
    • support for VoIP signalling protocols (SIP, H,323, SCCP, MGCP, IAX) and telephony signalling protocols ( ISDN/SS7, FXS/FXO, Sigtran ) either internally via pluggable modules or externally via gateways .
Performnace factors :Security considerations :
High availability using redundant servers in standby
Load balancing
IPv4 and IPv6 network layer support
TCP , UDP , SCTP transport layer protocol support
DNS lookups and hop by hop connectvity
authentication, authorization, and accounting (AAA)
Digest authentication and credentials fetched from backend
Media Encryption
TLS and SRTP support
Topology hidding to prevent disclosing IP form internal components in via and route headers
Firewalls , blacklist, filters , peak detectors to prevent Dos and Ddos attacks

The article only outlines SIP system architecture  from 3 viewpoints :

  1. Infrastructure standpoint
  2. Vore voice engineering perspective
  3. External components required to run and system

Infrastructure Requirements

  • Data Centers with BCP ( Business Continuity Planning ) and DR ( Disaster Recovery )
  • Servers and Clusters for faster and parallel calculating
  • Virtualization
    VMs to make a distributed computing environment with HA ( high availability ) and DRS ( Distributed Resource Scheduling )
  • Storage
    SAN with built-in redundancy for the resiliency of data.
    WORM compliant NAS for storing voice archives over a retention period.
  • Racks, power supplies, battery backups, cages etc.
  • Networking
    DMZs ( Demilitarized Zones)  which are interfacing areas between internal servers in the green zone and outside network
    VLANs for segregation between tenants.
    Connectivity through the public Internet as well as through VPN or dedicated optical fibre network for security.
  • Firewall configuration
  • Load Balancer ( Layer 7 )
  • Reverse Proxies for the security of internal IPs and port
  • Security controls In compliance with ISO/IEC 27000 family – Information security management systems
  • PKI Infrastructure to manage digital certificates
  • Key management with HSM ( hardware security module )
  • truster CA ( Certificate Authority ) to issue publicly signed certificate for TLS ( Https, wss etc)
  • OWASP ( Open Web Application Security Project )  rules compliance

Integral Components of a VOIP SIP based architecture

  • Call Controller
  • Media Manager
  • Recording
  • Softclients
  • logs and PCAP archives
  • CDR generators
  • Session Borer Controllers ( SBCs)

A SIP server can be moulded to take up any role based on the libraries and programs that run on it such as gateway server, call manager, load balancer etc. This in turn defines its placement in overall VoIP communication architecture. For example
– stateless proxy servers are placed on the border,
– application and B2BUA server at the core

sip entities
SIP platform components

SIP Gateways

A SIP gateway is an application that interfaces a SIP network to a network utilising another signalling protocol. In terms of the SIP protocol, a gateway is just a special type of user agent, where the user agent acts on behalf of another protocol rather than a human. A gateway terminates the signalling path and can also terminate the media path .

sip gaeways
To PSTN for telephony inter-working
To H.323 for IP Telephony inter-working
Client – originates message
Server – responds to or forwards message

Logical SIP entities are:

  • User Agent Client (UAC): Initiates SIP requests  ….
  • User Agent Server (UAS): Returns SIP responses ….
  • Network Servers ….

Registrar Server

A registrar server accepts SIP REGISTER requests; all other requests receive a 501 Not Implemented response. The contact information from the request is then made available to other SIP servers within the same administrative domain, such as proxies and redirect servers. In a registration request, the To header field contains the name of the resource being registered, and the Contact header fields contain the contact or device URIs.

regsitrar server

Proxy Server

A SIP proxy server receives a SIP request from a user agent or another proxy and acts on behalf of the user agent in forwarding or responding to the request. Just as a router forwards IP packets at the IP layer, a SIP proxy forwards SIP messages at the application layer.

Typically proxy server ( inbound or outbound) have no media capabilities and ignore the SDP . They are mostly bypassed once dialog is established but can add a record-route .
A proxy server usually also has access to a database or a location service to aid it in processing the request (determining the next hop).

proxy server

 1. Stateless Proxy Server
A proxy server can be either stateless or stateful. A stateless proxy server processes each SIP request or response based solely on the message contents. Once the message has been parsed, processed, and forwarded or responded to, no information (such as dialog information) about the message is stored. A stateless proxy never retransmits a message, and does not use any SIP timers

2. Stateful Proxy Server
A stateful proxy server keeps track of requests and responses received in the past, and uses that information in processing future requests and responses. For example, a stateful proxy server starts a timer when a request is forwarded. If no response to the request is received within the timer period, the proxy will retransmit the request, relieving the user agent of this task.

  3 . Forking Proxy Server
A proxy server that receives an INVITE request, then forwards it to a number of locations at the same time, or forks the request. This forking proxy server keeps track of each of the outstanding requests and the response. This is useful if the location service or database lookup returns multiple possible locations for the called party that need to be tried.

Redirect Server

A redirect server is a type of SIP server that responds to, but does not forward, requests. Like a proxy server, a redirect server uses a database or location service to lookup a user. The location information, however, is sent back to the caller in a redirection class response (3xx), which, after the ACK, concludes the transaction. Contact header in response indicates where request should be tried .

redirect server

Application Server

The heart of all call routing setup. It loads and executes scripts for call handling at runtime and maintains transaction states and dialogs for all ongoing calls . Usually the one to rewrite SIP packets adding media relay servers, NAT . Also connects external services like Accounting , CDR , stats to calls .

Adding Media Management

Media processing is usually provided by media servers in accordance to the SIP signalling. Bridges, call recording, Voicemail, audio conferencing, and interactive voice response (IVR) are commomly used. Read more about Media Architecture here

RFC 6230 Media Control Channel Framework decribes framework and protocol for application deployment where the application programming logic and media processing are distributed.

Any one such service could be a combination of many smaller services within such as Voicemail is a combitional of prompt playback, runtime controls, Dual-Tone Multi-Frequency (DTMF) collection, and media recording. RFC 6231 Interactive Voice Response (IVR) Control Package for the Media Control Channel Framework.

DTMF( Dual tone Multi Frequency )

delivery options:

  • Inband –  With Inband digits are passed along just like the rest of your voice as normal audio tones with no special coding or markers using the same codec as your voice does and are generated by your phone.
  • Outband  – Incoming stream delivers DTMF signals out-of-audio using either SIP-INFO or RFC-2833 mechanism, independently of codecs – in this case, the DTMF signals are sent separately from the actual audio stream.

TTS ( Text to Speech )

 Alexa Text-to-Speech (TTS) + Amazon Polly

Ivona – multiple language text to speech converter with ssml scripts such as below

      <speak>
          <p>
              <s><prosody rate="slow">IVONA</prosody> means highest quality speech
              synthesis in various languages.</s>
              <s>It offers both male and female radio quality voices <break/> at a
              sampling rate of 22 kHz <break/> which makes the IVONA voices a
              perfect tool for professional use or individual needs.</s>
          </p>
      </speak>

check ivona status

service ivona-tts-http status
 tail -f /var/log/tts.log

Developing SIP based applications

Basic SIP methods

SIP defines basic methods such as INVITE, ACK and BYE which can pretty much handle simple call routing with some more advanced processoes too like call forwarding/redirection, call hold with optional Music on hold, call parking, forking, barge etc.

Extending SIP headers

Newer SIP headers defined by more updated SIP RFC’s contina INFO, PRACK, PUBLISH, SUBSCRIBY, NOTIFY, MESSAGE, REFER, UPDATE. But more methods or headers can be added to baseline SIP packets for customization specific to a particular service provider. In case where a unrecognized SIP header is found on a SIP proxy which it either does not suppirt or doesnt understand, it will simply forward it to the specified endpoint.

Call routing Scripts

Interfaces for programming SIP call routing include :
– Call Processing Language—SIP CPL,
– Common Gateway Interface—SIP CGI,
– SIP Servlets,
– Java API for Integrated Networks—JAIN APIs etc .

Some known SIP stacks :

SailFin – SIP servlet container uses GlassFish open source enterprise Application Server platform (GPLv2), obsolete since merger from Sun Java to Oracle.

Mobicents – supports both JSLEE 1.1 and SIP Servlets 1.1 (GPLv2)

Cipango – extension of SIP Servlets to the Jetty HTTP Servlet engine thus compliant with both SIP Servlets 1.1 and HTTP Servlets 2.5 standards.

WeSIP – SIP and HTTP ( J2EE) converged application server build on OpenSER SIP platform

Additionally SIP stacks are supported on almost all popular SIP programming lanaguges which can be imported as lib and used for building call routing scripts to be mounted on SIP servers or endpoints such as :

PJSIP in C

JSSIP Javascript

Sofia in kamailio , Freswitch

Some popular SIP server also have proprietary scripting language such as –
Asterisk Gateway Interface (AGI) , application interface for extending the dialplan with your functionality in the language you choose – PHP, Perl, C, Java, Unix Shell and others

SIP platform Development

  • audio calls ( optionally video )
  • media services such as conferencing, voicemail, and IVR,
  • messaging as IM and presence based on SIMPLE,
  • programmable services through standardized APIs and development of new modules
  • near-end and far-end NAT traversal for signalling and media flows
  • interconnectivity with other IP multimedia systems, VoLTE ( optional interconnection with other types of communications networks as GSM or PSTN/ISDN)
  • Registry, location and lookup service
  • Serial and parallel forking

A sufficiently capable SIP platform shoudl consist of following features :

Performance factors :

  • High availability using redundant servers in standby
  • Load balancing
  • IPv4 and IPv6 support

Security considerations :

  • digest authentication and credentials fetched from backend
  • Media Encryption
  • TLS and SRTP support
  • Topology hiding to prevent disclosng IP form internal components in via and route headers
  • Firewalls , blacklist, filters , peak detectors to prevent Dos and Ddos attacks .

Collecting and Processing PCAPS

  • VoIP monitor – network packet sniffer with commercial frontend for SIP RTP RTCP SKINNY(SCCP) MGCP WebRTC VoIP protocols

it uses a passive network sniffer (like tcpdump or wireshark) to analyse packets in realtime and transforms all SIP calls with associated RTP streams into database CDR record which is sent over the TCP to MySQL server (remote or local). If enabled saving SIP / RTP packets the sniffer stores each VoIP call into separate files in native pcap format (to local storage).

voip monitor
  • sngrep
  • tcpdump
  • custom made pcap capture and uploader

NAT and DNS

To adapt SIP to modern IP networks with inter network traversal ICE, far and near-end NAT traversal solutions are used. Network Address traversal is crtical to traffic flow between private public network and from behind firewalls and policy controlled networks
One can use any of the VOVIDA-based STUN server, mySTUN , TurnServer, reStund , CoTURN , NATH (PJSIP NAT Helper), ReTURN, or ice4j

Near-end NAT traversal

STUN (session traversal utilities for NAT) – UA itself detect presence of a NAT and learn the public IP address and port assigned using Nating. Then it replaces device local private IP address with it in the SIP and SDP headers. Implemented via STUN, TURN, and ICE.
limitations are that STUN doesnt work for symmetric NAT (single connection has a different mapping with a different/randomly generated port) and also with situations when there are multiple addresses of a end point.

TURN (traversal using relay around NAT) or STUN relay – UA learns the public IP address of the TURN server and asks it to relay incoming packets. Limitatiosn since it handled all incoming and outgong traffic, it must scale to meet traffic requirments and should not become the bottle neck junction or single point of failure.

ICE (interactive connectivity establishment) – UA gathers “candidates of communication” with priorities offered by the remote party. After this client pairs local candidates with received peer candidates and performs offer-answer negotiating by trying connectivity of all pairs, therefore maximising success. The types of candidates :
– host candidate who represents clients’ IP addresses,
– server reflexive candidate for the address that has been resolved from STUN
– and a relayed candidate for the address which has been allocated from a TURN relay by the client.

Far-end NAT traversal

UA is not concerned about NAT at all and communicated using its local IP port. The border controller implies a NAT handling components such as an application layer gateway (ALG) or universal plug and play (UPnP) etc which resolves the private and public network address mapping by act as a back to back user agent (B2BUA).
Far end NAT can also be enabled by deploying a public SIP server which performs media relay (RTP Proxy/Media proxy).

Limitations of this approach
(-) security risks as they are operating in the public network
(-) enabling reverse traffic from UAS to UAC behind NAT.

A keep-alive mechanism is used to keep NAT translations of communications between SIP endpoint and its serving SIP servers opened , so that this NAT translation can be reused for routing. It contains client-to-server “ping” keep-alive and corresponding server-to-client “pong” messages. The 2 keep-alive mechanisms: a CRLF keep-alive and a STUN keep-alive message exchange.

The 3 types of SIP URIs,

  • address of record (AOR)
  • fully qualified domain name (FQDN)
  • globally routable user agent (UA) URI
    SIP uniform resource identifiers (URIs) are identified based on DNS resolution since the URI after @ symbol contains hostname , port and protocl for the next hop.

Adding record route headers for locating the correct SIP server for a SIP message can be done by :
– DNS service record (DNS SRV)
– naming authority pointer (NAPTR) DNS resource record

Steps for SIP endpoints locating SIP server

  1. From SIP packet get the NAPTR record to get the protocl to be used
  2. Inspect SRV record to fetch port to use
  3. Inspect A/AAA record to get IPv4 or IPv6 addresses
    ref : RFC 3263 – Locating SIP Servers
    Can use BIND9 server for DNS resolution supports NAPTR/SRV, ENUM, DNSSEC, multidomains, and private trees or public trees.

CDR Processing and Billing

CDR store call detail records along with proof of call with tiemstamps, orignation, destination, duaration, rate etc. At the end of month or any other term, the aggregated CDR are cumulatively processed to generate the bill for a user. This heavy data stream needs to be accurately processed and this can be achived by using data-pipelines like AWS kinesis or Kafka eventstore.

The prime requirnment for the system is to handle enormous amount of call records data in relatime , cater to a number of producers and consumers.

For security the data is obfuscated into blob using base 64 encoding.

For good consistency only a single shard should be rsponsible to process one user account’s bill.

Data Streams for billing service

AWS Kinesis – Kinesis Data Streams is sued for for rapid and continuous data intake and aggregation. The type of data used can include IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. It supports data sharding (ie number of call records grouped) and uses a partition Key ( string MD5 hash) to determine which shard the record goes to. 

(+) This system can handle high volume of data in realtime and produce call uuid specfic reults which can be consumed by consumers waiting for the processed results

(-) If not consumed with a pre-specified time duration the processed results expire and are irretrivable . Self implement publisher to store teh processed reults from kisesis stream to data stores like Redis / RDBMS or other storge locations like s3 , dynamo DB. If pieline crashes during operation , data is lost

(-) Data stream should have low latency igesting contnous data from producer and presenting data to consumer.

Call Rate and Accounting

Generally data streams proecssing are used for crtical and voluminious service usage like for
– metering/billing
– server activity,
– website clicks,
– geo-location of devices, people, and physical goods

Call Rates are very crticial for billing and charging the calls . Any updates from the customer or carriers or individuals need to propagate automatically and quickly to avoid discrpencies and neagtive margins. CDRs need to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and used for a wide variety of analytics including correlations, aggregations, filtering, and sampling.

To acheieve this the follow setup is ideal to use the new input rate sheet values via web UI console or POST API and propagate it quickly to main DB via AWS SQS which is a queing service and AWS lamda which is a serverless trigger based system . This ensures that any new input rates are updates in realtime and maintin fallback values in s3 bucket too

Call Rate and Accounting using task pipes , lambda serverless and qiueing service. Uses s3 buckets , AWS lambda, AWS SQS and AWS RDS.
Call Rate and Accounting using task pipes , lambda serverless and qiueing service

Cross platform and integration to External Telecommunication provider landscape

It is an advantage to plan for ahead for connection with IMS such as openIMS, support for Voip signalling protocols (SIP, H,323, SCCP, MGCP, IAX) and telephony signalling protocls ( ISDN/SS7, FXS/FXO, Sigtran ) either internally via pluggable modules or externally via gateways or for SIP trunking integration via OTT providers/ cloud telephony.

Adhere to Standard

The obvious starting milestone before making a full-scale carrier-grade, SIP-based VoIP system is to start by building a PBX for intra-enterprise communication. There are readily available solutions to make an IP telephony PBX Kamailio, FreeSWITCH, asterisk, Elastix, SipXecs. It is important to use the standard protocol and widely acceptable media formats and codecs to ensure interoperability and reduce compute and delay involved in protocol or media transcoding.

Database Integration

Need backend , cache , databse integration to npt only store routing rules with temporary varaible values but also aNeed backend, cache, database integration to not only store routing rules with temporary variable values but also account details, call records details, access control lists etc. Should therefore extend integration with text-based DB, Redis, MySQL, PostgreSQL, OpenLDAP, and OpenRadius.

Consistency of Call Records and duplicated charging records at various endpoints

In current Voip scenarios a call may be passing thorugh various telco providers , ISP and cloud telephony serviIn current VoIP scenarios, a call may be passing through various telco providers, ISP and cloud telephony service providers where each system maintains its own call records and billing. This in my opinion is duplication and can be avoided by sharing a consistent data store possible in the blockchain. This is an experimental idea that I have further explored in this article


There are other external components to setup a VOIP solution apart from Core voice Servers and gateways like the ones listed below, I will try to either add a detailed overall architecture diagram here or write about them in an seprate article. Keep watching this space for updates

  • Payment Gateways
  • Billing and Invoice
  • Fraud Prevention
  • Contacts Integration
  • Call Analytics
  • API services
  • Admin Module
  • Number Management ( DIDs ) and porting
  • Call Tracking
  • Single Sign On and User Account Management with Oauth and SAML
  • Dashboards and Reporting
  • Alert Management
  • Continuous Deployment
  • Automated Validation
  • Queue System
  • External cache

References :

SIP solutioning and architectures is a subsequent article after SIP introduction, which can be found here.

Read about VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform which includes :

  • Scalable and Flexible SIP platform building
  • Cluster SIP telephony Server for High Availability
  • Failure Recovery
  • Multi-tier cluster architecture
  • Role Abstraction / Micro-Service based architecture
  • Distributed Event management and Event-Driven architecture
  • Containerization
  • Autoscaling Cloud Servers
  • Open standards and Data Privacy
  • Flexibility for inter-working – NextGen911 , IMS , PSTN
  • security and Operational Efficiencies