WebRTC Audio/Video Codecs

Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.

RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

Quick links : If you are new to WebRTC read : Introduction to WebRTC is at https://telecom.altanai.com/2013/08/02/what-is-webrtc/ and Layers of WebRTC at https://telecom.altanai.com/2013/07/31/webrtc/

WebRTC Media Stack

Media Stream Trcaks in WebRTC

The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.

The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.

Media Flow in VoIP system
Media Flow in WebRTC Call

Video

Video Capture insync with hardware’s capabilities

WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source

Video Capture Resolution

Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels. 

Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .

SDP attributes for resolution, frame rate, and bitrate

SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable. 

Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.

Dynamic FPS control based on actual hardware encoding :

video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .

Stream Orientation

support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer

Codecs

WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

WebRTC Browsers MUST implement the VP8 video codec as described in
RFC6386] and H.264 Constrained Baseline as described in [H264].

RFC 7442 WebRTC Video Codec and Processing Requirements

most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding.

Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123
a=rtcp-mux
a=rtcp-rsize

a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100

a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=102

a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=127

a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125

a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 red/90000
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=124

VP8

Developed by on2 and then acquired and opensource by google . Now free of royality fees.

Supported conatiner – 3GP, Ogg, WebM

No limit on frame rate or data rate and provides maximum resolution of 16384×16384 pixels.

libvpx encoder library.

VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.
encode and decode pixels with an implied 1:1 (square) aspect ratio.

supported simulcast

VP9

Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates .

Open and free of royalties and any other licensing requirements
Its supported Containers are – MP4, Ogg, WebM

H264/AVC constrained

AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC

Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

Other profiles , which are not supporedt are Baseline (BP) , Extended (XP), Main (MP) , High (HiP) , Progressive High (ProHiP) , High 10 (Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

Its supported containers are 3GP, MP4, WebM

Parameter settings:

  • packetization-mode
  • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
  • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
  • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

It is a propertiary , patented codec , mianted by MPEG / ITU

AV1 (AOMedia Video 1)

open format designed by the Alliance for Open Media
royality free
especially designed for internet video HTML element and WebRTC
higher data compression rates than VP9 and H.265/HEVC

offers 3 profiles in increasing support for color depths and chroma subsampling.
main,
high, and
professional

supports HDR
supports Varible Frame Rate

Supported container are ISOBMFF, MPEG-TS, MP4, WebM

Stats for Video based media stream track

timestamp 04/05/2020, 14:25:59
ssrc 3929649593
isRemote false
mediaType video
kind video
trackId RTCMediaStreamTrack_sender_2
transportId RTCTransport_0_1
codecId RTCCodec_1_Outbound_96
[codec] VP8 (payloadType: 96)
firCount 0
pliCount 9
nackCount 476
qpSum 912936
[qpSum/framesEncoded] 32.86666666666667
mediaSourceId RTCVideoSource_2
packetsSent 333664
[packetsSent/s] 29.021823604499957
retransmittedPacketsSent 0
bytesSent 342640589
[bytesSent/s] 3685.7715977714947
headerBytesSent 8157584
retransmittedBytesSent 0
framesEncoded 52837
[framesEncoded/s] 30.022576142586164
keyFramesEncoded 31
totalEncodeTime 438.752
[totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
totalEncodedBytesTarget 335009905
[totalEncodedBytesTarget/s] 3602.7091371103397
totalPacketSendDelay 20872.8
[totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
qualityLimitationReason bandwidth
qualityLimitationResolutionChanges 20
encoderImplementation libvpx
Graph for Video Track in chrome://webrtc-internals

Other RTP parameters

RTX(regtranmission ) – packet loss recovery technique for real-time applications with relaxed delay bounds.

Non WebRTC supported Video codecs

Need active realtime media transcoding

H.263

Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.
suited for low bandwidth networks
Although it is not comaptible with WebRTC but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

H.265 / HEVC

proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

Container – Mp4

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

Audio

Audio Level

audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.

normalization considering frequencies above 300 Hz, regardless of the sampling rate used.

adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.

GAIN calculation

  • If the endpoint has control over the entire audio-capture path like a regular phone
    the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
  • If the endpoint does not have control over the entire audio capture like software endpoint
    then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
  • For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.

Acoustic Echo Cancellation (AEC)

Endpoints shoudl allow echo control mechsnisms

Codecs

WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000

Opus

stabdardised by IETF

container- Ogg, WebM, MPEG-TS, MP4

supportes multiple comptression algorithms

For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommenda that Opus be offered before PCMA/PCMU.

AAC (Advanvced Audio Encoding)

part of the MPEG-4 (H.264) standard
supported congainers – MP4, ADTS, 3GP

Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

G.711 (PCMA and PCMU)

ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
vital to interface with the standard teelcom network and carriers

Fixed 64Kbpd bit rate

supports 3GP container formats

G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU

G.722

ncoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression
conatiners used 3GP, AMR-WB

Comfort noise (CN)

artificial background noise which is used to fill gaps in a transmission instead of using pure silence

avoids jarring or RTP Timeout

for streams encoded with G.711 or any other supported codec that does not provide its own CN.
Use of Discontinuous Transmission (DTX) / CN by senders is optional

Internet Low Bitrate Codec (iLBC)

opensource narrow band codec
designed specifically for streaming voice audio

Internet Speech Audio Codec (iSAC)

designed for voice transmissions which are encapsulated within an RTP stream.

DTMF and ‘audio/telephone-event’ media type

endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

DTMF events list
| 0 | DTMF digit “0”
| 1 | DTMF digit “1”
| 2 | DTMF digit “2”
| 3 | DTMF digit “3”
| 4 | DTMF digit “4”
| 5 | DTMF digit “5”
| 6 | DTMF digit “6”
| 7 | DTMF digit “7”
| 8 | DTMF digit “8”
| 9 | DTMF digit “9”
| 10 | DTMF digit “*”
| 11 | DTMF digit “#”
| 12 | DTMF digit “A”
| 13 | DTMF digit “B”
| 14 | DTMF digit “C”
| 15 | DTMF digit “D”

Stats for Audio Media track

timestamp 04/05/2020, 14:25:59
ssrc 3005719707
isRemote fals
mediaType audio
kind audio
trackId RTCMediaStreamTrack_sender_1
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
[codec] opus (payloadType: 111)
mediaSourceId RTCAudioSource_1
packetsSent 88277
[packetsSent/s] 50.03762690431027
retransmittedPacketsSent 0
bytesSent 1977974
[bytesSent/s] 150.11288071293083
headerBytesSent 2118648
retransmittedBytesSent 0
Graphs in chrome://webrtc-internals for Audio

DataChannel

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
b=AS:30
a=ice-ufrag:blj+
a=ice-pwd:Ytdofc24WZYWRAnyNSNhuF4F
a=ice-options:trickle
a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE
a=setup:active
a=mid:2
a=sctp-port:5000
a=max-message-size:262144

Stats for Datachannel

Statistics RTCDataChannel_1
timestamp 04/05/2020, 14:25:59
label sctp
protocol
datachannelid 1
state open
messagesSent 1
[messagesSent/s] 0
bytesSent 228
[bytesSent/s] 0
messagesReceived 1
[messagesReceived/s] 0
bytesReceived 228
[bytesReceived/s] 0

Refrenecs :

SIP Security

Major standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling protocol for services such as LTE , VoIP, conferencing, Video on Demand (VoD), IPTV (Internet Television), presence, and Instant Messaging (IM) etc. With the continous evolution of SIP as the defacto VoIP protocol , we need to underatdn the risk mitigartion practices around it .

I have written about VoIP and security in these blogs before

For Security around web browser based calling via webrtc i have written

  • Webrtc Security –https://telecom.altanai.com/2015/04/24/webrtc-security/ , which describes browser threat modal , access to local resource , Same Orogin Policy (SOP) and Cross Resource Sharing ( CORS) as well as Location sharing , ICE , TUEN and threats to privacy with screen sharing , microgone camera long term access and probable mid call attacks .
  • Genric secrutity of web Application build around hosting platform of webrtc . https://telecom.altanai.com/2014/10/03/security-for-webrtc-applications/ . Includs concepts like Identity management , browser security – cross site security amd clickjacking , Authetication of devices and applications , Media Encryption and regex checking.

Also Written about VoIP security at protocl level with SRTP /DTLS using TLS https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/ and specifically using avaialble pre added modules on kamailio SIP server https://telecom.altanai.com/2018/02/17/kamailio-security/ . It describes Sanity checks , ACL lists with permissions , hiding topology details , countering Flood using pike and Fail2Ban as well as Traffic monitoring and detection .

In this article we will cover types of attacks on SIP systems

Types of attacks on SIP based systems

Registration Hijacking

malicious registrations on registrar by a third party who modifies From header field of a SIP request.

exmaple implementation :
attacker de-registers all existing contacts for a URI
attacker can also register their own device as the appropriate contact address, thereby directing all requests for the affected user to him

solution – Autheticaion of user

Impersonating a Server

attacker impersonates the remote server
user’s request can now be intercepted by some other party
user’s request may be forwarded to insecure locations

solution –
confidentiality, integrity, and authentication of proxy servers
Proxy/redirect sever, and registrars SHOULD possess a site certificate issued by CA which could be validated by UA

Temparing Message bodies

If users are relying on SIP message bodies to communicate either of

  • session encryption keys for a media session
  • MIME bodies
  • SDP
  • encapsulated telephony signals
    Then the atackers on proxy server can modify the session key or can act as a man-in-the-middle and do eaves droppng

exmaple implementation :
attacker can point RTP media streams to a wiretapping device
can changes Subject header field to appear to users as spam

solution – end to end ecryption over TLS + Digest Authorization

mid-session threats like tearing down session

Request forging
attacker learns the params of the session like To , From tags etc then he can alter ongoing session parameters and even bring it down

example implementation :
attacker inserts a BYE in a ongoing session thereby tearing it down
can insert re INVITE and redierct the stream to wiretaping device

solution – authetication on every request
signing and encrypting of MIME bodies, and transference of credentials with S/MIME

Denial of Service and Amplification

DOS attacks – rendering a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces.
dDOS – multiple network hosts to flood a target host with a large amount of network traffic.

Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.

exmaple implementation :
attackers creates a falsified source IP address and a corresponding Via header field that identify a targeted host as the originator of the request. Then send this to large number of SIP network element . This geneerates DOS aimed at target.

attackers uses falsified Route header field values in a request that identify the target host and then send such messages to forking proxies that will amplify messaging sent to the target.

Flooding with register attacks can deplete available memory and disk resources of a registrar by registering huge numbers of bindings.
Flooding a stateful proxy server causes it to consume computational expense associated with processing a SIP transaction

Solution –
detect flooding and pike in traffic and use ipban to block
challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required), forgoing the normal response retransmission algorithm, and thus behaving statelessly towards unauthenticated requests.

Security mchanisms

Full encryption vs hop by hop encrption

SIP mssages cannot be encrypted end-to-end in their entirety since
message fields such as the Request-URI, Route, and Via need to be visible to proxies in most network architectures
so that SIP requests are routed correctly.
proxy servers need to also update the message with via headers

Thus SIP uses low level security along with hop by hop encrption and auth headers to verify the identity of proxy servers

Transport and Network Layer Security

IPsec – used where set of hosts or administrative domains have an existing trust relationship with one another.

TLS – used where hop-by-hop security is required between hosts with no pre-existing trust association.

SIPS URI Scheme

Used as an address-of-record for a particular user, signifies that each hop over which the request is forwarded, must be secured with TLS

HTTP Authentication

Reuse of the HTTP Digest authentication via 401 and 407 response codes that implement challenge for autehtication
provides replay protection and one-way authentication.

S/MIME

allows SIP UAs to encrypt MIME bodies within SIP, securing these bodies end-to-end without affecting message headers.
provides end-to-end confidentiality and integrity for message bodies

nonce-count

provides replay protection

SIP over TLS

SIP messages can be secured using TLS. There is also TLS for Datagrams called DTLS.

Security of SIP signalling is different from security of protocols used in concert with SIP like RTP , RTCP. and that will be covered in later topics of this article.

TLS operation consists of two phases: handshake phase and bulk data encryption phase

Handshake phase

Prepare algorithm to be used during TLS session

Server Authentication

server sends its certificate to the client, which then verifies the certificate using a certificate authority’s (CA’s) public key.

Client Authentication

Server sends an additional CertificateRequest message to request the client’s certificate. The client responds with

  1. Certificate message containing the client certificate with the client public key and
  2. CertificateVerify message containing a digest signature of the handshake messages signed by clients private key

Server authenticates client by client’s public key , since only client holding correct private key can sign the message.

prepare the shared secret for bulk data encryption

client generate a pre_master_secret, and encrypt it using the server’s public key obtained from the server’s certificate. The server decrypts the pre_master_secret using its own private key.
Both the server and client then compute a master_secret they share based on the same pre_master_secret. The master_secret is further used to generate the shared symmetric keys for bulk data encryption and message authentication

Public key cryptographic operations such as RSA are much more expensive than shared key cryptography. This is why TLS uses public key cryptography to establish the shared secret key in the handshake phase, and then uses symmetric key cryptography with the negotiated shared secret as the data encryption key.

Stateless proxy servers do not maintain state information about the SIP session and therefore tend to be more scalable. However, many standard application functionalities, such as authentication, authorization, accounting, and call forking require the proxy server to operate in a stateful
mode by keeping different levels of session state information.

Steps :

  1. The SIP proxy server enforces proxy authentication with
    407 Proxy Authentication Required challenge.
  2. UAC provides credentials that verify its claimed identity (e.g., based on MD5 [34] digest algorithm) and retransmits in authorization header

Security of RTP

confidentiality protection of the RTP session and integrity protection of the RTP/RTCP packets requires source authentication of all the packets to ensure no man-in-the-middle (MITM) attack is taking place.

end to end media encryption – SRTP ( Secure RTP )

encodes the voice into encrypted IP packages and transport those via the internet from the transmitter  to receive

References

  • The Impact of TLS on SIP Server Performance – Charles Shen† Erich Nahum‡ Henning Schulzrinne† Charles Wright , Department of Computer Science, Columbia University,IBM T.J. Watson Research Center

HTTP/2 – offer/answer signaling for WebRTC call

HTTP ( Hyper Text Transfer Protocol ) is the top application layer protocol atop the Tarnsport layer ( TCP ) and the Network layer ( IP )

HTTP/1.1

release in 1997. Since HTTP/1 allowed only 1 req at a time , HTTP/1.1

Allows one one outstanding connection on a TCP session but allowed request pieplinig to achieve concurency.

HTTP/2

In 2015, HTTP/2 was released which aimed at reducing latency while delivering heavy graphics, videos and other media cpmponents on web page especially on mobile sites .
optimizes server push and service workers

FRAMES

A key differenet between Http/1.1 and HTTP/2 is the fact that former transmites requests and reponses in plaintext whereas the later encapsulates them into binary format , proving more features and scope for optimzation.

Thus at protocol level , it is all about frames of bytes which are part of stream.

“enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.”

Hypertext Transfer Protocol Version 2 (HTTP/2) draft-ietf-httpbis-http2-latest
https://http2.github.io/http2-spec/

It is important to know that Browsers only implement HTTP/2 under HTTPS, thus TLS connection is must for whichw e need certs ad keys signed by CA ( either self signed using openssl , signed by public CA like godaddy , verisign or letsencrypt)

Compatibility Layer between HTTP1.1 and HTTP2.0 in node

Nodejs >9 provides http2 as native module. Exmaple of using http2 with compatibility layer

const http2 = require('http2');
const options = {
 key: 'ss/key', // path to key
 cert: 'ssl/cert' // path to cert
};

const server = http2.createSecureServer(options, (req, res) => {
    req.addListener('end', function () {
        file.serve(req, res);
    }).resume();
});
server.listen(8084);

in replacement for existing server http/https server

const https = require('https');
app = https.createServer(options, function (request, response) {
    request.addListener('end', function () {
        file.serve(request, response);
    }).resume();
});

app.listen(8084);

Socket.io/ Websocket over HTTP2

The WebSocket Protocol uses the HTTP/1.1 Upgrade mechanism to transition a TCP connection from HTTP into a WebSocket connection

Due to its multiplexing nature, HTTP/2 does not allow connection-wide header fields or status codes, such as the Upgrade and Connection request-header fields or the 101 (Switching Protocols) response code. These are all required for opening handshake.

Ideally the code shouldve looekd like this with backward compatiability layer , but continue reading update ..

var app = http2
    .createSecureServer(options, (req, res) => {
        req.addListener('end', function () {
            file.serve(req, res);
        }).resume();
    })
    .listen(properties.http2Port);

var io = require('socket.io').listen(app);
io.origins('*:*');
io.on('connection', onConnection); // evenet handler onconnection

Error during WebSocket handshake: Unexpected response code: 403

update May 2020 : I tried using the http2 server with websocket like mentioned above ,h owever many many hours of working around WSS over HTTP2 secure server , I consistencly kept faccing the ECONNRESET issues after couple of seconds , which would crash the server

client 403
server ECONNRESET

Therefore leaving the web server to server htmll conetnt I reverted the siganlling back to HTTPs/1.1 given the reasons for sticking with WSS is low latency and existing work that was already put in.

Example Repo : https://github.com/altanai/webrtcdevelopment/tree/htt2.0

Reading Further of exploring HTTP CONNECT methods for setting WS handshake . Will update this section in future if it works .

Streams

A “stream” is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection.
A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.

Core http2 module provides new core API (Http2Stream), accessed via a “stream” listener:

const http2 = require('http2');
const options = {
 key: 'ss/key', // path to key
 cert: 'ssl/cert' // path to cert
};

const server = http2.createSecureServer(options, (stream, headers) => {
    stream.respond({ ':status': 200 });
    stream.end('some text!');
});
server.listen(3000);

Other features

  • stream multiplexing
  • stream Prioritization
  • header compression
  • Flow Control
  • support for trailer

Persistent , one connection per origin.

With the new binary framing mechanism in place, HTTP/2 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP/2 connections are persistent, and only one connection per origin is required,

Server Push

bundling multiple assets and resources into a single HTTP/2  and lets the srever proactively push resources to client’s cache .

Server issues PUSH_PROMISE , client validates whether it needs the resource of not. If the client matches it then they will load like regular GET call

The PUSH_PROMISE frame includes a header block that contains a complete set of request header fields that the server attributes to the request.

After sending the PUSH_PROMISE frame, the server can begin delivering the pushed response as a response on a server-initiated stream that uses the promised stream identifier.

Client receives a PUSH_PROMISE frame and can either chooses to accept the pushed response or if it does not wish to receive the pushed response from the server it can can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream’s identifier.

Push Stream Support

-tbd

respondWithFile() and respondWithFD() APIs can send raw file data that bypasses the Streams API.

Related technologies

MIME

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.

Email messages + MIME : transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

MIME in HTTP in WWW : servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.

MIME header fields

MIME version

MIME-Version: 1.0

Content Type

Content-Type: text/plain

multipart/mixed , text/html, image/jpeg, audio/mp3, video/mp4, and application/msword

content disposition

Content-Disposition: attachment; filename=genome.jpeg;
  modification-date="Wed, 12 Feb 1997 16:29:51 -0500";

Content-Transfer-Encoding

References :

Certificates , compliances and Security in VoIP

This article describes various Certificates and compliances, Bill and Acts on data privacy, Security and prevention of Robocalls as adopted by countries around the world pertaining to Interconnected VoIP providers, telecommunications services, wireless telephone companies etc

Compliance certificates by Industry types

HIPAA (Health Insurance Portability and Accountability Act)

Deals with privacy and security of personal medical records and electronic health care transaction

Applicability  : If voip company handles medical information

Includes : 

  • Not allowed Voice mail transcription
  • Should have End-to-End Encryption
  • Restrict  using unsecured WiFi networks to prevent Snooping
  • User security , strong password rules  and mandatory monthly change
  • Secure Firmware on VoIP phones
  • Maintaining Call and Access Logs

SOX( Sarbanes Oxley Act of 2002)

Also known as SOX, SarbOX or Public Company Accounting Reform and Investor Protection Act

Applicability : if managing the communications operations of a regulated, publicly traded company 

Includes : 

  • Retain records which include financial and other sensitive data
  • ways employees are provided or denied access to records or data based on their roles and responsibilities
  • do information audit by a trusted third party. 
  • Retention and deletion of files such as audio files like voicemails, text messages, video clips, declared paper records, storage, and logs of communications activities
  • Physical and digital security controls around cloud-based VoIP applications and the networks

Privacy Related Compliance certificates

COPPA (Children’s Online Privacy Protection Act ) of 1998 

prohibits deceptive marketing to children under the age of 13, or collecting personal information without disclosure to their parents. 

any information is to be passed on to a third party, must be easy for the child’s guardian to review and/or protect

2011 amendment  requires that the data collected was erased after a period of time,

2014 FTC issued guidelines that apps and app stores require “verifiable parental consent.”

CPNI (Customer Proprietary Network Information) 2007

CPNI (Customer Proprietary Network Information) in united states is the information that communication providers  acquire about their subscribers. This Individually identifiable information that is created by a customer’s relationship with a provider, such as data about the frequency, duration, and timing of calls, the information on a customer’s bill, and call identifying information. This processing information is governed strictly by FCC and certification should be renewed on an annual basis

Provider can pass along that information to marketers to sell other services, as long as the customer is notified

In 2007, the FCC explicitly extended the application of the Commission’s CPNI rules of the Telecommunications Act of 1996 to providers of interconnected VoIP service.

CALEA

Communications Assistance for Law Enforcement Act (CALEA) conduct electronic surveillance by imposing specific obligations on “telecommunications carriers” for assisting law enforcement, including delivering call interception and call identification functionality to the government with a minimum of interference to customer service and privacy.

Read more about CALEA and its roles in VoIP here Regulatory and Legal Considerations with WebRTC development

GDPR (General Data Protection Regulation)  in European Union 2018

Supersedes the 1995 Data Protection Directive

Establishes requirements of organizations that process data, defines the rights of individuals to manage their data, and outlines penalties for those who violate these rights.

No personal data may be processed unless this processing is done under one of six lawful bases specified by the regulation (consent, contract, public task, vital interest, legitimate interest or legal requirement). When the processing is based on consent the data subject has the right to revoke it at any time.

Controllers must notify Supervising Authorities (SA)s of a personal data breach within 72 hours of learning of the breach.

California Consumer Privacy Act (CCPA) 2019

consumer rights relating to the access to, deletion of, and sharing of personal information that is collected by businesses. 

Allows consumers to know whether their personal data is sold or disclosed , to whom .

Allows opt-out right for sales of personal information

Right to deletion – to request a business to delete any personal information about a consumer collected from that consumer

Personal Data Protection Bill (PDP) – India 2018

This bill introduces various private and sensitive protection frameworks  like restriction on retention of personal data, Right to correction and erasure (such as right to be forgotten) , Prohibition and transparency of processing of personal data. It also classifies data fiduciaries  including certain social media intermediaries. 

The Bill amends the Information Technology Act, 2000 to delete the provisions related to compensation payable by companies for failure to protect personal data.

Other data privacy acts similar to GDPR 

  • South Korea’s Personal Information Protection Act  2011
  • Brazil’s Lei Geral de Proteçao de Dados (LGPD)  2020
  • Privacy Amendment (Notifiable Data Breaches) to Australia’s Privacy Act 2018
  • Japan’s Act on Protection of Personal Information 2017
  • Thailand Personal Data Protection Act (PDPA) 2020

Features offered by VOIP companies for Data privacy 

  • Access Control & Logging
  • Auto Data Redaction / Account Deletion policy 
  • SIEM (Security information and event management) alerts 
  • Information security , Encrypted Storage For Recordings & Transcripts
  • Disclosing all third party services that are involved in data processing too
  • Role Based Access Control and 2 Factor Authentication
  • Data Security Audits and appointing  data protection officer to oversee GDPR compliance

Against Robocalls and SPIT ( SPAM over Internet Telephony)

 2009 Truth in Caller ID Act 

Telephone Consumer Protection Act of 1991

Implementation of Do not call registry against use of robocalls, automatic dialers, and other methods of communication

Do-Not-Call Implementation Act of 2003

if a business has an established relationship with a customer, it can continue to call them for up to 18 months. If a consumer calls the company, say, to ask for information about the product or service, the company has three months to get back to him.

if the customer asks to not receive calls, the company must stop calling, or be subject to fines.

Exemptions – Calls from a not-for-profit B organisation , informational messages as flight cancellations , Calls from sales and debt collectors etc

Personal Data Privacy and Security Act 2009

Implemented to curb  identity theft and computer hacking. Sensitive personal identifiable information includes : victim’s name, social security number, home address, fingerprint/biometrics data, date of birth, and bank account numbers.

Any company that is breached must notify the affected individuals by mail, telephone, or email, and the message must include information on the company and how to get in touch with credit reporting agencies

If the breach involves government or national security , company must also contact the Secret Service within fourteen days 

TRACED Act (Telephone Robocall Abuse Criminal Enforcement and Deterrence) 2019

Canadian Radio-television and Telecommunications Commission (CRTC) 2018 -32

A solution mechanism has already been standardised and active in adoption called STIR / SHAKEN ( Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs) described in another article here.

Emergency services 

FCC E911 E911 / VoIP E911 rules

Unlike traditional telephone connections, which are tied to a physical location, VOIP’s packet switched technology allows a particular number to be anywhere making it more difficult for it to reach localised services like emergency numbers of Public Safety Answering Points (PSAPs) . Thus FCC regulations as well as the New and Emerging Technologies 911 Improvement Act of 2008 (NET 911 Act), interconnected VoIP providers are required to provide 911 and E911 service. 

Ref : 

CLI/NCLI, Robocalls and STIR/SHAKEN

To understand the need for implementing an identification verification technique in Internet protocol based network to network communication system , we need to evaluate the existing problem plaguing the VoIP setup .

What is Call ID spoofing ? 

Vulnerability of existing interconnection phone system which is used by robo-callers to mask their identity or to make it appear the call is from a legitimate source, usually originates from voice-over-IP (VOIP) systems.

In this context understand the Caller Line identification CLI/ NCLI techniques used by VoIP and OTT( over the top) providers today.

CLI (Caller Line Identification)

If call goes out on a CLI route ( White Route ) the received party will likely see your callerID information

  • Lawful – Termination is legal on the remote end ie abiding country’s telco infrastructure and stable
  • Expensive – usually with direct or via leased line (TDM) interconnections with the tier-1 carriers.

Non-CLI (Non-Caller Line Identification)

The Caller ID is not visible at the call
If call goes out on a Non-CLI route (Grey Route) goes out on a non-CLI routes they will see either a blocked call or some generic number.

  • Unlawful – questionable legality or maybe violating some providers AUP(Acceptable Use Policy ) on the remote end.
  • Cheaper – low quality , usually via VoIP-GSM gateways

Example include robocalls , tele-marketting / spam etc which are unwilling to share their Caller Id for call receiver, to not be blocked or cancelled.

To overcome the problem of non-verifiable spam , robocalls a suite of protocols and procedures are proposed that can combat caller ID spoofing on VOIP and connected public telephone networks.

STIR/SHAKEN

Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs

Used by robocallers to mask their identity or to make it appear the call is from a legitimate source
usually orignates from voice-over-IP (VOIP) systems

STIR

Standards developed by the Internet Engineering Task Force (IETF) 

For telecommunication service providers implement  certificate management system to create and manage the public and private keys, digital certificates used to sign and verify Caller ID details. 

Adds information to the SIP headers that allow the endpoints along the system to positively identify the origin of the data , such as JSON web tokens encrypted with the provider’s private key, encoded using Base64,

There are three levels of verification, or “attestation”

  • A : Full Attestation
    indicates that the provider recognizes the entire phone number as being registered with the originating subscriber.
  • B : Partial Attestation
    call originated with a known customer but the entire number cannot be verified,
  • C : Gateway Attestation
    call can only be verified as coming from a known gateway

How can the Public Key Infrastructure be used ? 

In an interconnection network , each telephone service provider will obtain its digital certificate from a certificate authority (CA)  that is trusted by other telephone service providers. Calling party signs the SIP Header  caller ID as legitimate . The called party verifies that the calling number is authentic

STIR

Originating service provider’s encrypted SIP Identity Header includes the following data:

  1. Attestation level
  2. Date and Time
  3. Calling and Called Numbers
  4. Orig ID for analytics and/or traceback purposes among others
  5. Location of certificate repository
  6. Signature
  7. Encryption algorithm

FCC has also assigned the role of a Secure Telephone Identity Policy Administrator (STI-PA) which oversees that CAs do not provide certificate to spoofing robocallers and enforce the framework for STIR /SHAKEN .

Sample Identity header in SIP requst

INVITE sip:bob@biloxi.example.org SIP/2.0
Via: SIP/2.0/TLS pc33.atlanta.example.com;branch=z9hG4bKnashds8
To: Bob
From: Alice ;tag=1928301774
Call-ID: a84b4c76e66710
CSeq: 314159 INVITE
Max-Forwards: 70
Date: Thu, 21 Feb 2002 13:02:03 GMT
Contact:
Identity:
"ZYNBbHC00VMZr2kZt6VmCvPonWJMGvQTBDqghoWeLxJfzB2a1pxAr3VgrB0SsSAaifsRdiOPoQZYOy2wrVghuhcsMbHWUSFxI6p6q5TOQXHMmz6uEo3svJsSH49thyGnFVcnyaZ++yRlBYYQTLqWzJ+KVhPKbfU/pryhVn9Yc6U="
Identity-Info: https://atlanta.example.com/atlanta.cer;alg=rsa-sha1
Content-Type: application/sdp
Content-Length: 147

v=0
o=UserA 2890844526 2890844526 IN IP4 pc33.atlanta.example.com
s=Session SDP
c=IN IP4 pc33.atlanta.example.com
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000

SHAKEN

STIR is based on the SIP protocol and is designed to work with calls being routed through a VOIP network. Since traditional endpoints like POTS and SS7 networks also should be covered under this call authenticity framework , SHAKEN was developed to manage call via IP-to-telephone gateways .

Developed by the Alliance of Telecommunications Industry Solutions (ATIS)

Working Steps  :

  1. When a call is initiated, a SIP INVITE is received by the originating service provider.
  2. Originating service provider verifies the call source and number to determine how to confirm validity.
    1. Full Attestation (A) — The service provider authenticates the calling party AND confirms they are authorized to use this number. An example would be a registered subscriber.
    2. Partial Attestation (B) — The service provider verifies the call origination but cannot confirm that the call source is authorized to use the calling number. An example would be a calling number from behind an enterprise PBX.
    3. Gateway Attestation (C) — The service provider authenticates the call’s origin but cannot verify the source. An example would be a call received from an international gateway.
  3. Create a SIP Identity header that contains information on the calling number, called number, attestation level, and call origination, along with the certificate thus caller ID “signed” as legitimate
  4. SIP INVITE with the SIP Identity header with the certificate is sent to the destination service provider.
  5. Destination service provider verifies the identity of the header and certificate.

Diagrammatic depiction of flow of how Telecom carriers to digitally validates authenticity before receiving or handoff through their network

SHAKEN

References

Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1

MPEG 2

MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
generic coding of moving pictures and associated audio information
combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

better than MPEG 1

evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.

Application

  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:

H264

Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
introduced in 2004

Better than MPEG2

40-50% bit rate reduction compared to MPEG-2

Support Up to 4K (4,096×2,304) and 59.94 fps
21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )

Applications

  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat

H265

High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC
video compression standard designed to substantially improve coding efficiency
stream high-quality videos in congested network environments or bandwidth constrained mobile networks
Jan 2013
product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

better than H264

overcome shortage of bandwidth, spectrum, storage
bandwidth savings of approx. 45% over H.264 encoded content

resolutions up to 8192×4320, including 8K UHD
Supports up to 300 fps
3 approved profiles, draft for additional 5 ; 13 levels
Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

Enhanced Hybrid spatial-temporal prediction model
CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures

Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
Individual rectangular regions that divide the image are independent

Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.

Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
Entropy coding is only CABAC

Applications

  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4                                              ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   libavutil      56. 22.100 / 56. 22.100   libavcodec     58. 35.100 / 58. 35.100   libavformat    58. 20.100 / 58. 20.100   libavdevice    58.  5.100 / 58.  5.100   libavfilter     7. 40.101 /  7. 40.101   libavresample   4.  0.  0 /  4.  0.  0   libswscale      5.  3.100 /  5.  3.100   libswresample   3.  3.100 /  3.  3.100   libpostproc    55.  3.100 / 55.  3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     creation_time   : 2019-06-23T04:58:13.000000Z   Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping:   Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices                              : 1 x265 [info]: frame threads / pool features       : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0 x265 [info]: References / ref-limit  cu / depth  : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     encoder         : Lavf58.20.100     Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1       encoder         : Lavc58.35.100 libx265 frame=  746 fps= 64 q=-0.0 Lsize=     606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x     video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I:      3, Avg QP:27.18  kb/s: 1884.53  x265 [info]: frame P:    179, Avg QP:27.32  kb/s: 523.32   x265 [info]: frame B:    564, Avg QP:35.17  kb/s: 38.69    x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9%  encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25

if you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support

AV1

Realtime High quality video encoder
product of product of the Alliance for Open Media (AOM)
Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)

better than H265

AV1 is royalty free and overcomes the patent complexities around H265/HVEC

Applications

  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

Audio and Acoustic Signal Processing

Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

Application of audio Signal processing in general

  • storage
  • data compression
  • music information retrieval
  • speech processing ( emotion recognition/sentiment analysis , NLP)
  • localization
  • acoustic detection
  • Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
  • noise cancellation
  • acoustic fingerprinting
  • sound recognition ( speaker Identification , biometric speech verification , voice commands )
  • synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
  • enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

Effects for audio streams processing

  • delay or echo
    To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
    Implemented using tape delays or bucket-brigade devices.
  • flanger
    delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
    signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
  • phaser
    signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
  • chorus
    delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
  • equalization
    frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
    overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
  • pitch shift
    shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
  • time stretching
    changing the speed of an audio signal without affecting its pitch.
  • resonators
    emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
  • modulation
    change the frequency or amplitude of a carrier signal in relation to a predefined signal.
  • compression
    reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
  • 3D audio effects
    place sounds outside the stereo basis
  • reverse echo
    swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
  • wave field synthesis
    spatial audio rendering technique for the creation of virtual acoustic environments

ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

  • Acoustic echo control
    aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
  • Noise control
    microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
  • Gain control
    how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
  • Linear filtering
    ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
  • Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

ASP applications in music playback

AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

  • Post-processing
    techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
  • Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

ASP for virtual assitants

Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

  • Speech enhancement
    multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
  • Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
  • Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

Other areas of ASP

  • Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

Ref :
wikipedia – https://en.wikipedia.org/wiki/Audio_signal_processing
IEEE – https://signalprocessingsociety.org/publications-resources/blog/audio-and-acoustic-signal-processing%E2%80%99s-major-impact-smartphones

Webrtc handshake

Interfaces of webrtc and tracks to stream addition

Process to perform webrtc handshake

1.Setup Client side for the caller
PeerConnectionFactory to generate PeerConnections
PeerConnection for every connection to remote peer
MediaStream audio and video from client device

2.caller creates SDP offer for the callee

peerConnection.createOffer()

3.Callee process the offer

peerConnection.setRemoteDescription(offer)

4.Callee generates an SDP answer for the caller

peerConnection.createAnswer()

5.Caller receives and prcesses the answer from callee

peerConnection.setRemoteDescription(answer)

6.Proceed to Add stream

7. Proceed to do ICE for NAT

Webrtc call setup and incoming call callflow between remote peer , peerconnection actory , peerconnection and application

setup a call
receive a call

Interactive Connectivity Establishment (ICE) for NAT traversal

Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .

An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks.
Checks done by STUN and TURN
also allows for address selection for multi-homed and dual-stack hosts

ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.

Gathering Candidate Addresses

An agent identifies all CANDIDATE whic is a transport address. Types:

  • HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP)
    if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
  • STUN or TURN to obtain additional candidates. Types
    1.translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
    2.addresses on TURN servers (RELAYED CANDIDATES)

Mapping Server Reflexive address

Agent sends the TURN Allocate request from IP address and port X:x,
NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE).
Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate.
Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.

Allocate Request and response fom TURN – Informing the agent of this relayed candidate

Only STUN based Binding

agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.

STUN Binding request for connectivity checks on CANDIDATE PAIRS

The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.

PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT

Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.

Checks
TRIGGERED CHECKS – accelerates the process of finding a valid candidate
ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.

ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first

Checks ensure maintaining frozen candidates and pairs with some foundation for media stream

Each candidate pair in the check list has a foundation and a state. States for candidates pairs

1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.

2. In-Progress: A check has been sent for this pair, but the transaction is in progress.

3. Succeeded: A check for this pair was already done and produced a successful result.

4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.

5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.

Example of ICE gather state

icegatheringstatechange – gathering

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 192.168.0.2 58122 typ host generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (srflx)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:4081163164 1 udp 1686052607 106.51.26.168 37542 typ srflx raddr 192.168.0.2 rport 58122 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (host)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:345893049 1 tcp 1518280447 192.168.0.2 9 typ host tcptype active generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2130406062 1 udp 41886207 74.125.39.44 27190 typ relay raddr 106.51.26.168 rport 37542 generation 0 ufrag vzpn network-id 1 network-cost 10

icecandidate (relay)
sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3052096874 1 udp 25108479 172.217.163.158 28049 typ relay raddr 106.51.26.168 rport 37543 generation 0 ufrag vzpn network-id 1 network-cost 10

icegatheringstatechange – complete

Exmaple Candidate Checking

iceconnectionstatechange : checking

setRemoteDescription L type: answer, sdp: v=0

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:ydvf
a=ice-pwd:mb4ousBoT6B0l//ljjD/9Z/M
a=ice-options:trickle

m=video 9 UDP/TLS/RTP/SAVPF 98 100 96 97 99 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:ydvf
a=ice-pwd:mb4ousBoT6B0l//ljjD/9Z/M
a=ice-options:trickle

addIceCandidate (host)
sdpMid: , sdpMLineIndex: 0, candidate: candidate:1511920713 1 udp 2122260223 192.168.0.2 56060 typ host generation 0 ufrag ydvf network-id 1 network-cost 10

iceconnectionstatechange : connected

Candidate Nomination for Media Path

selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement
controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. Ways
regular nomination and aggressive nomination

tbd

Ref :

http://w3c.github.io/webrtc-pc/ WebRTC 1.0: Real-time Communication Between Browsers – W3C Editor’s Draft 31 August 2019
RFC 5245 Inter

Websockets as VOIP signal transport medium

Web resources are usually build on request/response paradigm such as HTTP , SIP messages . This means that server responds only when a client requests it to. This made web intercations very slow and unsuited for VOIP signalling
Long Poll involved repeated polling checks to load new server resources by itself instead of client made explicit request
AJAX and multipart XHR tried to patch the problem by selective reloading however they still required that client perform the mapping for an incomig reply to map to correct request.
However due to overhead latency involved with HTTP transaction and its working mode to open new TCP connetion for every request and reponse and add HTTP headers, none of them were suited to realtime operations

Websocket is the current (2017) most idelistic solution to perform realtime sigalling suited to VOIP requirnments due to its nature os establish a socket .

Websocket Protocol

Enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code.

protocol consists of an opening handshake followed by basic message framing, layered over TCP.
handshake is interpreted by HTTP servers as an Upgrade request.

Secure websocket example :

Request URL: wss://site.com:8084/socket.io/?transport=websocket&sid=hh3Dib_aBWgqyO1IAAEL
Request Method: GET
Status Code: 101 Switching Protocols

Response Headers
Connection: Upgrade
Sec-WebSocket-Accept: UVhTdFOWfywGyQTKDRZyGuhkfls=
Sec-WebSocket-Extensions: permessage-deflate
Upgrade: websocket

Request Headers
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: no-cache
Connection: Upgrade
Host: site.com:8085
Origin: https://site.com:8084
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: 06FNaHge8GLGVuPFxV2fAQ==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36

Query String parameters
transport: websocket
sid: hh3Dib_aBWgqyO1IAAEL

Working with websockets

A new websocket can be opned with ws or wss and it can have sub protocols like in example .

var wsconnection = new WebSocket('wss://voipsever.com', ['soap', 'xmpp']);

It can be attached with event handlers

wsconnection.onopen = function () {
  ...
};
wsconnection.onerror = function (error) {
  console.log('WebSocket Error ' + error);
};
wsconnection.onmessage = function (e) {
  console.log('message received : ' + e.data);
};

Send Data on websocket

message string

wsconnection.send('Hi);

Blob or ArrayBuffer object to send binary data
Ex : Sending canvas ImageData as ArrayBuffer

var img = canvas_context.getImageData(0, 0, 400, 320);
var binary = new Uint8Array(img.data.length);
for (var i = 0; i < img.data.length; i++) {
  binary[i] = img.data[i];
}
wsconnection.send(binary.buffer);

Ex : sending file as Blob

var file = document.querySelector('input[type="file"]').files[0];
wsconnection.send(file);

Closing the connection

if (socket.readyState === WebSocket.OPEN) {
    socket.close();
}

Registry for Close codes for WS
1000 Normal Closure [IESG_HYBI] [RFC6455]
1001 Going Away [IESG_HYBI] [RFC6455]
1002 Protocol error [IESG_HYBI] [RFC6455]
1003 Unsupported Data [IESG_HYBI] [RFC6455]
1004 Reserved [IESG_HYBI] [RFC6455]
1005 No Status Rcvd [IESG_HYBI] [RFC6455]
1006 Abnormal Closure [IESG_HYBI] [RFC6455]
1007 Invalid frame payload data [IESG_HYBI] [RFC6455]
1008 Policy Violation [IESG_HYBI] [RFC6455]
1009 Message Too Big [IESG_HYBI] [RFC6455]
1010 Mandatory Ext. [IESG_HYBI] [RFC6455]
1011 Internal Error [IESG_HYBI] [RFC6455][RFC Errata 3227]
1012 Service Restart [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1013 Try Again Later [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1014 The server was acting as a gateway or proxy and received an invalid response from the upstream server. This is similar to 502 HTTP Status Code. [Alexey_Melnikov] [https://www.ietf.org/mail-archive/web/hybi/current/msg10748.html]
1015 TLS handshake [IESG_HYBI] [RFC6455]
1016-3999 Unassigned
4000-4999 Reserved for Private Use [RFC6455]

WebSocket Subprotocol Name Registry

  • MBWS.huawei.com MBWS
  • MBLWS.huawei.com MBLWS
  • soap soap
  • wamp WAMP (“The WebSocket Application Messaging Protocol”)
  • v10.stomp Name: STOMP 1.0 specification
  • v11.stomp Name: STOMP 1.1 specification
  • v12.stomp Name: STOMP 1.2 specification
  • ocpp1.2 OCPP 1.2 open charge alliance
  • ocpp1.5 OCPP 1.5 open charge alliance
  • ocpp1.6 OCPP 1.6 open charge alliance
  • ocpp2.0 OCPP 2.0 open charge alliance
  • ocpp2.0.1 OCPP 2.0.1
  • rfb RFB [RFC6143]
  • sip WebSocket Transport for SIP (Session Initiation Protocol) [RFC7118]
  • notificationchannel-netapi-rest.openmobilealliance.org OMA RESTful Network API for Notification Channel
  • wpcp Web Process Control Protocol (WPCP)
  • amqp Advanced Message Queuing Protocol (AMQP) 1.0+
  • mqtt mqtt [MQTT Version 5.0]
  • jsflow jsFlow pubsub/queue protocol
  • rwpcp Reverse Web Process Control Protocol (RWPCP)
  • xmpp WebSocket Transport for the Extensible Messaging and Presence Protocol (XMPP) [RFC7395]
  • ship SHIP – Smart Home IP SHIP (Smart Home IP) is a an IP based approach to plug and play home automation and smart energy / energy efficiency, which can easily be extended to additional domains such as Ambient Assisted Living (AAL). SHIP can be used solely on the customer premises or can be integrated into a cloud based solution.
  • mielecloudconnect Miele Cloud Connect Protocol This protocol is used to securely connect household or professional appliances to an internet service portal via a public communication network in order to enable remote services.
  • v10.pcp.sap.com Push Channel Protocol
  • msrp WebSocket Transport for MSRP (Message Session Relay Protocol) [RFC7977]
  • v1.saltyrtc.org
  • TLCP-2.0.0.lightstreamer.com TLCP (Text Lightstreamer Client Protocol)
  • bfcp WebSocket Transport for BFCP (Binary Floor Control Protocol)
  • sldp.softvelum.com Softvelum Low Delay Protocol SLDP is a low latency live streaming protocol for delivering media from servers to MSE-based browsers and WebSocket-enabled applications.
  • opcua+uacp OPC UA Connection Protocol
  • opcua+uajson OPC UA JSON Encoding
  • v1.swindon-lattice+json Swindon Web Server Protocol (JSON encoding)
  • v1.usp USP (Broadband Forum User Services Platform)
  • mles-websocket mles-websocket
  • coap Constrained Application Protocol (CoAP) [RFC8323]
  • TLCP-2.1.0.lightstreamer.com TLCP (Text Lightstreamer Client Protocol)
  • sqlnet.oracle.com sqlnet This protocol is used for communication between Oracle database client and database server, and its usage as subprotocol of websocket is primarly geared towards cloud deployments. sqlnet supports bi-directional data transfer and is full duplex in nature.
  • oneM2M.R2.0.json oneM2M R2.0 JSON
  • oneM2M.R2.0.xml oneM2M R2.0 XML
  • oneM2M.R2.0.cbor oneM2M R2.0 CBOR
  • transit Transit
  • 2016.serverpush.dash.mpeg.org MPEG-DASH-ServerPush-23009-6-2017
  • 2018.mmt.mpeg.org MPEG-MMT-23008-1-2018
  • CLUE CLUE
  • webrtc.softvelum.com Softvelum WebSocket signaling protocol WebRTC live streaming requires WebSocket-based signaling protocol for every specific implementation. Softvelum products will use this subprotocol for signaling

websocket libraries

C++: libwebsockets
Erlang: Shirasu.ws
Java: Jetty
Node.JS: ws
Ruby:
em-websocket
EventMachine
Faye
Python:
Tornado,
pywebsocket
PHP: Ratchet, phpws
Javascript:
Socket.io
ws
WebSocket-Node
GoLang:
Gorilla
C#:
Fleck

Ref :
RFC 6455 – The websocket protocol
Websocket Protocol Registeries : http://www.iana.org/assignments/websocket/websocket.xml
https://www.html5rocks.com/en/tutorials/websockets/basics/
IANA websocket -https://www.iana.org/assignments/websocket/websocket.xhtml

Wifi 6

Wi‑Fi is a trademark of the Wi-Fi Alliance
family of radio technologies commonly used for wireless local area networking (WLAN) of devices.

Current and older Wifi standards

standards operate on varying frequencies, deliver different bandwidth, and support different numbers of channels.

802.11a – transmits at 5 GHz frequency band of the radio spectrum with 54 megabits of data per second.
uses orthogonal frequency-division multiplexing (OFDM) which splits that radio signal into several sub-signals before they reach a receiver to reduces interference.

802.11b – transmits at 2.4 GHz with speed of 11 megabits of data per second,
uses complementary code keying (CCK) modulation to improve speeds.

802.11g – transmits at 2.4 GHz but faster upto 54 megabits of data per second.
uses OFDM coding

802.11n – speeds 140 megabits per second
backward compatible with a, b and g.
can transmit up to four streams of data, each at a maximum of 150 megabits per second, but most routers only allow for two or three streams.

802.11ac – n on the 2.4 GHz band and ac on the 5 GHz band.
backward compatible with 802.11n and thus others
450 megabits per second on a single stream
allows for transmission on multiple spatial streams upo 8
called 5G WiFi because of its frequency band
Very High Throughput (VHT)

Wifi 6

Wi-Fi CERTIFIED 6 networks enable lower battery consumption in devices, making it a solid choice for any environment, including smart home and Internet of Things (IoT) uses.

Wifi Compoents

  • wireless access point (AP) allows wireless devices to connect to the wireless network.
    takes the bandwidth coming from a router and stretches it so that many devices can go on the network from farther distances away.
    give useful data about the devices on the network, provide proactive security, and serve many other practical purposes.
  • Wireless routers are hardware devices that Internet service providers use to connect you to their cable or xDSL Internet network.
    combines the networking functions of a wireless access point and a router.
  • mobile hotspot – feature on smartphones with both tethered and untethered connections
    share your wireless network connection with other devices

Wifi performance

Wi-Fi operational range depends on factors such as the frequency band, radio power output, receiver sensitivity, antenna gain and antenna type as well as the modulation techniquea and propagation charestristics of the signal

Transmitter power
Compared to cell phones and similar technology, Wi-Fi transmitters are low power devices. In general, the maximum amount of power that a Wi-Fi device can transmit is limited by local regulations, such as FCC Part 15 in the US. Equivalent isotropically radiated power (EIRP) in the European Union is limited to 20 dBm (100 mW).

Antenna
An access point compliant with either 802.11b or 802.11g, using the stock omnidirectional antenna might have a range of 100 m

Security

  • Wired Equivalent Privacy WEP
    client connects to a WEP-protected network, the WEP key is added to some data to create an “initialization vector”/ IV
  • WiFi Protected Access version 2 (WPA2)
    successor to WEP and WPA
    uses either TKIP or Advanced Encryption Standard (AES) encryption
  • WiFi Protected Setup (WPS)
    ties a hard-coded PIN to the router for setup is vulnerabile for exploitation by hackers
  • WPA3™
    Use the latest security methods , higher grader security protocls
    Disallow outdated legacy protocols
    Require use of Protected Management Frames (PMF)
    increased protections from password guessing attempts
    better password protection through Simultaneous Authentication of Equals (SAE), which replaces Pre-shared Key (PSK) in WPA2-Personal.
  • WPA3-Enterprise
    192-bit minimum-strength security protocols and cryptographic tools
    Authenticated encryption: 256-bit Galois/Counter Mode Protocol (GCMP-256)
    Key derivation and confirmation: 384-bit Hashed Message Authentication Mode (HMAC) with Secure Hash Algorithm (HMAC-SHA384)
    Key establishment and authentication: Elliptic Curve Diffie-Hellman (ECDH) exchange and Elliptic Curve Digital Signature Algorithm (ECDSA) using a 384-bit elliptic curve
    Robust management frame protection: 256-bit Broadcast/Multicast Integrity Protocol Galois Message Authentication Code (BIP-GMAC-256)

Ref :
https://www.wi-fi.org/
https://en.wikipedia.org/wiki/Wi-Fi
https://www.wi-fi.org/discover-wi-fi/wi-fi-certified-6

Long Term Evolution (LTE), VOLTE and VOWifi

Both radio and core network evolution
all-IP packet-switched architecture
standardised by 3GPP
lower CAPEX ans OPEX involved

Evolved from Universal Mobile Telecommunication System (UMTS), which in turn evolved from the Global
Also aligned with 4G (fourth-generation mobile)

LTE is backward compatible with GSM/EDGE/UMTS/CDMA/WCDMA systems on existing 2G and 3G spectrum , even hand-over and roaming to existing mobile networks.

Motivation for evolution

Wireless/cellular technology standards are constantly evolving for better efficiency and performance.
LTE evolved as a result of rapid increase of mobile data usage. Applications such as voice over IP (VOIP), streaming multimedia, videoconferencing , cellular modemetc.
It provides packet-switched traffic with seamless mobility and higher qos than predecessors.
Also high data rate, throughput, low latency and packet optimized radioaccess technology on flexible bandwidth deployments.

Performance

Peak Data Rate
uplink – 75Mbps(20MHz bandwidth)
downlink – 150 Mbps(UE Category 4, 2×2 MIMO, 20MHz bandwidth) , 300 Mbps(UE category 5, 4×4 MIMO, 20MHz bandwidth)

carrier bandwidth can range from 1.4 MHz up to 20 MHz. Ultimately bandwidth used by carrier depends on frequency band and the amount of spectrum available with a network operator
Mobility 350 km/h

Multiple Access Schemes
uplink: SC-FDMA (Single Carrier Frequency Division Multiple Access) 50Mbps+ (20MHz spectrum)
downlink: OFDM (Orthogonal Frequency Division Multiple Access) 100Mbps+ (20MHz spectrum)

Multi-Antenna Technology , Multi-user collaborative MIMO for Uplink and TxAA, spatial multiplexing, CDD ,max 4×4 array for downlink

Coverage 5 – 100km with slight degradation after 30km

LTE architecture supports hard QoS and guaranteed bit rate (GBR) for radio bearers.

Technology

All interfaces between network nodes are IP based
Duplexing – Time Division Duplex (TDD) , Frequency Division Duplex (FDD) and half duples FD
(MIMO) Multiple Input Multiple Output transmissions – LTE devices have to support this. Allows the base station to transmit several data streams over the same carrier simultaneously.
Modulation Schemes QPSK, 16QAM, 64QAM(optional)

LTE Architecture

Primarily composed of

  1. User Equipment (UE)
  2. Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
  3. Evolved Packet Core (EPC).

LTE-Advanced

LTE devices capable of CAT6 speeds (Category 6 )
Increased peak data rate, downlink 3 Gbps, Uplink 1.5 Gbps ( 1 Gbps = 1000 Mbps)
spectral efficiency from 16bps/Hz in R8 to 30 bps/Hz in R10
Carrier Aggregation (CA)
enhanced use of multi-antenna techniques
support for Relay Nodes (RN)

Ref:
3GPP on LTE – https://www.3gpp.org/technologies/keywords-acronyms/98-lte
ETSI on LTE; Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment (UE) radio transmission and reception – https://www.etsi.org/deliver/etsi_ts/136100_136199/136101/10.03.00_60/ts_136101v100300p.pdf

MIMO ( multiple-input and multiple-output )

SISO – Single Input Single Output
SIMO – Single Input Multiple output
MISO – Multiple Input Single Output
MIMO – Multiple Input multiple Output

Multiplying the capacity of a radio link using multiple transmission and receiving antennas to exploit multipath propagation.
Key technology for achieving a vast increase of wireless communication capacity over a finite electromagnetic spectrum.

Antenna configuration – implies antenna spatial diversity by useing arrays of multiple antennas on one or both ends of a wireless communication link
boost channel capacity.
combats multipath fading
enhance signal to noise ratio,
create multiple communication paths

Applies to wifi
IEEE 802.11n (Wi-Fi), IEEE 802.11ac (Wi-Fi)
as well as cellular networks
HSPA+ (3G)
WiMAX (4G)
Long Term Evolution (4G LTE)
power-line communication for 3-wire installations as part of ITU G.hn standard and HomePlug AV2 specification

Large capacity increases over given bandwidth and S/N resources
Greater throughputs on bands below 6 GHz,

multi-user MU-MIMO

simultaneous independent data links to multiple users over a common time-frequency resource

massive MIMO

enable the expansion of the useful spectrum to microwave and millimeter wave bands within the framework of 5G cellular communication.

microdiversity MIMO

MIMO modes (60m)

Diversity – Alamouti algorithm
Beam forming – create and aim the antenna pattern electronically
Spatial multiplex – use of precoding and shaping to unravel the multipath signals

challenges faced by mobile equipment vendors implementing MIMO in small portable devices.

Functions

3main categories: precoding, spatial multiplexing (SM), and diversity coding.

Precoding

multi-stream beamforming ( signal is emitted from each of the transmit antennas with appropriate phase and gain weighting such that the signal power is maximized at the receiver input ) , increases reception and reduce multipath fading

In line-of-sight propagation, beamforming results in a well-defined directional pattern. However, conventional beams are not a good analogy in cellular networks, which are mainly characterized by multipath propagation. When the receiver has multiple antennas, the transmit beamforming cannot simultaneously maximize the signal level at all of the receive antennas, and precoding with multiple streams is often beneficial. Note that precoding requires knowledge of channel state information (CSI) at the transmitter and the receiver.

Spatial multiplexing

High-rate signal is split into multiple lower-rate streams and each stream is transmitted from a different transmit antenna in the same frequency channel. If these signals arrive at the receiver antenna array with sufficiently different spatial signatures and the receiver has accurate CSI, it can separate these streams into (almost) parallel channels.

increasing channel capacity at higher signal-to-noise ratios (SNR).

Diversity coding

when there is no channel knowledge at the transmitter , a single stream is transmitted. The signal is emitted from each of the transmit antennas with full or near orthogonal coding. Diversity coding exploits the independent fading in the multiple antenna links to enhance signal diversity.

Ref :
https://www.comsoc.org
https://en.wikipedia.org/wiki/MIMO

NLP ( Natural Language Processing ) in VoIP

NLP ( Natural Language Processing ) can be defined as the automatic manipulation of natural languages ( text or audio) using computer algorithms and softwares. As such NLP has great potential in cognitive and artificial intelligence , but also with increasing human to machine interaction and enhancement in Machine learning ,NLP is set to revolutionize the Voice over IP space.

Note : although not obvious but some people confuse Natural language procession with Neurolinguistic pressing which is a science in Psychology.

NLP evolves from linguistics which itself is a study of language along with its semantics , phonetics and gramer. Every language has rules and NLP uses mathematical formulation to understand it. Discrete mathematical formalisms will be discussed later in this article.

Inputs for NLP is usually though conversation, speech, correspondence, reading, print, written composition, dictation, publishing, translation, lip reading, signing etc .

Rule based vs Statistical NLP – In contrast to rule based engines which work on hard preset values using maybe a decision tree , statistical models work in a more probabilistic fashion which produces more reliable results even in unfamiliar scenarios.

Linear classifier vs Convolutional Neural Nets– CNNs are powerful supervised deep learning technique. As opposed to a linear classifier whose decision boundary on feature space is linear function , CNN increases model complexity by adding more layers . tbd-

NLP tasks

Syntax

Grammer induction , lemmitization , morphological segmentation , part of speech tagging , parsing , sentence breaking , stemming , word segmentation , terminology extraction

Semantics

lexical , distributional , machine translation , Named entity recognition ( NER) , natural language understanding and generation, relationship establishment , sentimental analysis , work sense disambiguation , OCR( optical Character recognition) , recognizing textual entailment

Speech

speech recognition , specch segmentation , text to speech , dialogues

Discourse

automatic summarizations , conference resolution , discourse analysis

Key techniques

Out of above its worthy to point out few key techniques

Parts of speech (POS )

A primary tasks in NLP is to extract tokens and sentences, identify parts of speech ( like nouns , verbs , adjectives ) and create parse trees.

POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag . By tagging, algorithm builds lemmatizers which are used to reduce a word to its root form.

POS methods significantly differs from Bag-of-words(BOW) methods which disregards semantic relation relationship and only takes into account words and their frequencies. Whereas POS takes context and definition into consideration.

POS tagging techniques include lexical , rule based , probablistic and deep learning methods.

Named entity recognition (NER)

Given a stream of text, determine which items in the text map to proper names, such as people or places, and their types such as person, location, Organization. Example for raw test as below using Spacy.io

“Hello ! My name is Atanai and I work on Solution design and architecture, developed many custom WebRTC and SIP based solutions such as telecom applications, media stream inetgration into IOT,Unified communication-collaboration ,signalling gateways ,SBC etc. I passed out from Anna university with Betch degree in 2011 and currenlty stay in Bangalore India.”

Analysis of NER is

Noun phrases: ['My name', 'Atanai', 'I', 'Solution design', 'architecture', 'many custom', 'WebRTC and SIP based solutions', 'telecom applications', 'media stream integration', 'IOT', 'Unified communication-collaboration', 'signalling gateways', 'I', 'Anna university', 'Betch degree', 'currently stay', 'Bangalore India'] 
Verbs: ['be', 'work', 'develop', 'base', 'signal', 'pass'] 
Atanai PERSON 
WebRTC PRODUCT 
SIP ORG 
IOT ORG 
Betch NORP 
2011 DATE 
Bangalore India LOC

Sentiment Analysis

Understand the overall opinion, feeling, or attitude expressed in given media ( speech , text or video) .

NLP in action

NLP application layout

Steps to obtain insights and relevant information from an unclassified document , raw tex file or speech to text content such as recording from VOIP meeting

step 1 : upload a document which could be an invoice , order , feedback , complaint or any other unstructured raw text

Step 2 : Collect the data from the document

  • use OCR (optical character recognition) for hand written or signed components
  • perform search , index , duplication detection etc
  • can use MNIST database as
  • phrase matching and vocabulary
  • Can use translation APIs to trans late from other languages

Step 3 : Collect meaning-full data

  • perform Part of Speech (POS) tagging and chunking process
  • topic discovery and modelling
  • tokenizations and text classification , obtain domain specific entities from the document
  • can use standard model language to collect relevant frequently used words
  • NER ( Named Entity recognition ) to validate names , places and locations
  • can extract out time and date from mentioned entities
  • build relationship graphs

step 4 : extract sentiments using a trained model

  • utilize Regular Expressions for pattern searching
  • sentiment analysis

General Applications:

Application of NLP find its way into many domains

1.VOIP platforms ,media servers and automatic summarization of conference / meetings like “Minutes of Meetings” to highlight the key takeaways from a VOIP session

2. Automatic essay assessment and scripting in education setting alike.

3. Image annotation using metadata describing digital images for categorizations and easy retrieval based on keywords.

4. Spam filtering

5. Building automatic assistants and chatbots with Speech Recognition and using auto suggest with sentence completion ( Siri , Alexa , google voice etc )

6. Social Media Analytics , to track sentiments about topic , figure out influencers such as for movie or restaurant reviews .

NLP in VOIP system

To know more about sound waves go here which describes fundamental characteristics of analog waves . To know more about analog wave modulation go here , this describes how waves are modulated such as frequency , phase , amplitude etc to hold information for propagation . click here to know more about digital wave modulation such as amplitude , frequency , phase shift keying etc . This section build on top of audio streams captured or live .

Based on NLP and trained models on extracted features ,an unknown audio wave can be classified and possibly identified.

Replacing auto attendants with IVR

tbd–

Ref :

Tools ref:

WebRTC CPaaS ( Communication Platform as a Service )

A CPasS ( communication platform as a service ) is cloud based communication platform that provides real time communication capabilities. This should be easily integrable with any given external environment or application of the customer, without him worrying about building backend infrastructure or interfaces .

Traditionally , with IP protected protocols , licensed codecs maintaining a signalling protocol stack , network interfaces building communication platform was a costly affair. Cisco , Facetime , Skype were the only OTT ( over the top) players taking away from telco’s call revenue .

However with the advent of standardize , open source protocol and codecs plenty of CPaaS providers have crowded the market making more supply than there is demand. A customer wanting to quickly integrate real time communications on his platform has many options to choose from. This article provides an insight to how CPaaS solution are architectured and programmed

Sample CPass Architecture build on open source technologies

I have written an article before on Steps for building and deploying WebRTC solution , which includes standalone , cloud hosted and TURN based NAT handler systems .

A typical CPaaS solution provides

  • Call server + Media Server that can be interacted with via UA
  • Comm clients like sipphones , webrtc client , SDK ( software development kits ) or libraries for desktop , embedded and/or mobile platforms .
  • APIs that can trigger automated calls and perform preprogrammed routing.
  • Rich documentation and samples to build various apps such as call centre solutions , interactive auto-attendant using IVR , DTMF , conference solutions etc .
  • Some CPaaS providers also add features like transcribing ,transcoding , recording , playback etc to provide edge over other CPaaS providers

Advantages of using a CPaaS vs building your own RTC platform

Tech insights and experiences

companies who have been catering to telco and communication domain make robust solutions based on industry best practices which beats novice solution build in a fortnight anyday

keeping up with emerging trends

Market trends like new codecs , rich communication services , multi tenancy, contextual communication , NLP , other ML based enhancements are provided by CPaaS company

Auto Scaling , High Availability

A firm specializing in CPaaS solution has already thought of clustering and autoscaling to meet peak traffic requirements and backup/replication on standby servers to activate incase of failure

CAPEX and OPEX

using a Cpaas saves on human resources, infrastructure, and time to market. It saves tremendously on underlying IT infrastructure and many a times provides flexible pricing models

In a nutshell I have come across so many small size startups trying to build CPaaS solution from scratch but only realising it after weeks of trying to build a MVP that they are stuck with firewall, NAT, media quality or interoperability issues . Since there are so many solution already out in the market it is best to instead use them as underlying layer and build applications services using it such as callcentre or CRM services etc .

PJSIP

SIP stack written in C. Available under GPL

pjsip dev guide architecture diagram

PJSip user agent

Attributes:
local_info+tag, local_contact,
call_id

Operations:
pj_status_t pjsip_ua_init(endpt, param);
pj_status_t pjsip_ua_destroy(void);
pjsip_module* pjsip_ua_instance(void);
pjsip_endpoint* pjsip_ua_get_endpt(ua);

PJSip dialog

Attributes:

state, session_counter, initial_cseq, local_cseq, remote_cseq, route_set,

local_info+tag, local_contact, remote_info+tag, remote_contact, next_set

Operations:

pj_status_t pjsip_dlg_create_uac(ua, local_uri, contact, …);
pj_status_t pjsip_dlg_create_uas(ua, rdata, contact, &dlg);
pj_status_t pjsip_dlg_fork(old_dlg,rdata,&dlg);
pj_status_t pjsip_dlg_set_route_set(dlg, route_set);
pj_status_t pjsip_dlg_inc_session(dlg);
pj_status_t pjsip_dlg_dec_session(dlg);
pj_status_t pjsip_dlg_add_usage(dlg, mod);
pj_status_t pjsip_dlg_create_request(dlg,method,cseq,&tdata);
pj_status_t pjsip_dlg_send_request(dlg,tdata,&tsx);
pj_status_t pjsip_dlg_create_response(dlg,rdata,code,txt,&tdata);
pj_status_t pjsip_dlg_modify_response(dlg,tdata,code,txt);
pj_status_t pjsip_dlg_send_response(dlg,tsx,tdata);
pj_status_t pjsip_dlg_send_msg(dlg,tdata);
pjsip_dialog* pjsip_tsx_get_dlg(tsx);
pjsip_dialog* pjsip_rdata_get_dlg(rdata);

PJsip module

Attributes: name, id, priority, …

Callbacks:
pj_bool_t on_rx_request(rdata);
pj_bool_t on_rx_response(rdata);
void on_tsx_state(tsx,event);

SDP state Offer/ Answer transition

SDP negotiator structure

SDP negotiator class diagram PJSIP dev guide

Installation

Pre requisities :

GNU make (other make will not work),
GNU binutils for the target, and
GNU gcc for the target.
In addition, the following libraries are optional, but they will be used if they are present:

ALSA header files/libraries (optional) if ALSA support is wanted.
OpenSSL header files/libraries (optional) if TLS support is wanted.

Video support

Video4Linux2 (v4l2) development library
​ffmpeg development library for video codecs: H.264 (together with libx264) and H263P/H263-1998.
Simple DirectMedia Layer SDL
libyuv for format conversion and video manipulation / Ffmpeg
OpenH264 / VideoToolbox (only for Mac) / ffmpeg

get tar and untar sourcecode

wget https://www.pjsip.org/release/2.9/pjproject-2.9.tar.bz2
tar -xvzf pjproject-2.9.tar.bz2

configure to make ‘build.mak’, and ‘os-auto.mak’

./configure --enable-shared --disable-static --enable-memalign-hack --enable-gpl --enable-libx264

It runs checks , pay attention to traces such as

opus

checking opus/opus.h usability... yes
checking opus/opus.h presence... yes
checking for opus/opus.h... yes
checking for opus_repacketizer_get_size in -lopus... yes
OPUS library found, OPUS support enabled

For more features customization `configure’ configures pjproject 2.x to adapt to many kinds of systems.

Usage: ./aconfigure [OPTION]… [VAR=VALUE]…

To assign environment variables (e.g., CC, CFLAGS…), specify them as
VAR=VALUE. See below for descriptions of some of the useful variables.

Defaults for the options are specified in brackets.

Configuration:
-h, –help display this help and exit
–help=short display options specific to this package
–help=recursive display the short help of all the included packages
-V, –version display version information and exit
-q, –quiet, –silent do not print checking ...' messages --cache-file=FILE cache test results in FILE [disabled] -C, --config-cache alias for–cache-file=config.cache’
-n, –no-create do not create output files
–srcdir=DIR find the sources in DIR [configure dir or `..’]

Installation directories:
–prefix=PREFIX install architecture-independent files in PREFIX
[/usr/local]
–exec-prefix=EPREFIX install architecture-dependent files in EPREFIX
[PREFIX]

By default, make install' will install all the files in /usr/local/bin’, /usr/local/lib' etc. You can specify an installation prefix other than/usr/local’ using --prefix', for instance–prefix=$HOME’.

For better control, use the options below.

Fine tuning of the installation directories:
–bindir=DIR user executables [EPREFIX/bin]
–sbindir=DIR system admin executables [EPREFIX/sbin]
–libexecdir=DIR program executables [EPREFIX/libexec]
–sysconfdir=DIR read-only single-machine data [PREFIX/etc]
–sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
–localstatedir=DIR modifiable single-machine data [PREFIX/var]
–libdir=DIR object code libraries [EPREFIX/lib]
–includedir=DIR C header files [PREFIX/include]
–oldincludedir=DIR C header files for non-gcc [/usr/include]
–datarootdir=DIR read-only arch.-independent data root [PREFIX/share]
–datadir=DIR read-only architecture-independent data [DATAROOTDIR]
–infodir=DIR info documentation [DATAROOTDIR/info]
–localedir=DIR locale-dependent data [DATAROOTDIR/locale]
–mandir=DIR man documentation [DATAROOTDIR/man]
–docdir=DIR documentation root [DATAROOTDIR/doc/pjproject]
–htmldir=DIR html documentation [DOCDIR]
–dvidir=DIR dvi documentation [DOCDIR]
–pdfdir=DIR pdf documentation [DOCDIR]
–psdir=DIR ps documentation [DOCDIR]

System types:
–build=BUILD configure for building on BUILD [guessed]
–host=HOST cross-compile to build programs to run on HOST [BUILD]
–target=TARGET configure for building compilers for TARGET [HOST]

Optional Features:
–disable-option-checking ignore unrecognized –enable/–with options
–disable-FEATURE do not include FEATURE (same as –enable-FEATURE=no)
–enable-FEATURE[=ARG] include FEATURE [ARG=yes]
–disable-floating-point
Disable floating point where possible
–enable-epoll Use /dev/epoll ioqueue on Linux (experimental)
–enable-shared Build shared libraries
–disable-resample Disable resampling implementations
–disable-sound Exclude sound (i.e. use null sound)
–disable-video Disable video feature
–enable-ext-sound PJMEDIA will not provide any sound device backend
–disable-small-filter Exclude small filter in resampling
–disable-large-filter Exclude large filter in resampling
–disable-speex-aec Exclude Speex Acoustic Echo Canceller/AEC
–disable-g711-codec Exclude G.711 codecs from the build
–disable-l16-codec Exclude Linear/L16 codec family from the build
–disable-gsm-codec Exclude GSM codec in the build
–disable-g722-codec Exclude G.722 codec in the build
–disable-g7221-codec Exclude G.7221 codec in the build
–disable-speex-codec Exclude Speex codecs in the build
–disable-ilbc-codec Exclude iLBC codec in the build
–enable-libsamplerate Link with libsamplerate when available.
–enable-resample-dll Build libresample as shared library
–disable-sdl Disable SDL (default: not disabled)
–disable-ffmpeg Disable ffmpeg (default: not disabled)
–disable-v4l2 Disable Video4Linux2 (default: not disabled)
–disable-openh264 Disable OpenH264 (default: not disabled)
–enable-ipp Enable Intel IPP support. Specify the Intel IPP
package and samples location using IPPROOT and
IPPSAMPLES env var or with –with-ipp and
–with-ipp-samples options
–disable-darwin-ssl Exclude Darwin SSL (default: autodetect)
–disable-ssl Exclude SSL support the build (default: autodetect)

–disable-opencore-amr Exclude OpenCORE AMR support from the build
(default: autodetect)

–disable-silk Exclude SILK support from the build (default:
autodetect)

–disable-opus Exclude OPUS support from the build (default:
autodetect)

–disable-bcg729 Disable bcg729 (default: not disabled)
–disable-libyuv Exclude libyuv in the build
–disable-libwebrtc Exclude libwebrtc in the build

Optional Packages:
–with-PACKAGE[=ARG] use PACKAGE [ARG=yes]
–without-PACKAGE do not use PACKAGE (same as –with-PACKAGE=no)
–with-external-speex Use external Speex development files, not the one in
“third_party” directory. When this option is set,
make sure that Speex is accessible to use (hint: use
CFLAGS and LDFLAGS env var to set the include/lib
paths)
–with-external-gsm Use external GSM codec library, not the one in
“third_party” directory. When this option is set,
make sure that the GSM include/lib files are
accessible to use (hint: use CFLAGS and LDFLAGS env
var to set the include/lib paths)
–with-external-srtp Use external SRTP development files, not the one in
“third_party” directory. When this option is set,
make sure that SRTP is accessible to use (hint: use
CFLAGS and LDFLAGS env var to set the include/lib
paths)
–with-external-yuv Use external libyuv development files, not the one
in “third_party” directory. When this option is set,
make sure that libyuv is accessible to use (hint:
use CFLAGS and LDFLAGS env var to set the
include/lib paths)
–with-external-webrtc Use external webrtc development files, not the one
in “third_party” directory. When this option is set,
make sure that webrtc is accessible to use (hint:
use CFLAGS and LDFLAGS env var to set the
include/lib paths)
–with-external-pa Use external PortAudio development files. When this
option is set, make sure that PortAudio is
accessible to use (hint: use CFLAGS and LDFLAGS env
var to set the include/lib paths)
–with-sdl=DIR Specify alternate libSDL prefix
–with-ffmpeg=DIR Specify alternate FFMPEG prefix
–with-openh264=DIR Specify alternate OpenH264 prefix
–with-ipp=DIR Specify the Intel IPP location
–with-ipp-samples=DIR Specify the Intel IPP samples location
–with-ipp-arch=ARCH Specify the Intel IPP ARCH suffix, e.g. “64” or
“em64t. Default is blank for IA32”
–with-ssl=DIR Specify alternate SSL library prefix. This option
will try to find OpenSSL first, then if not found,
GnuTLS. To skip OpenSSL finding, use –with-gnutls
option instead.
–with-gnutls=DIR Specify alternate GnuTLS prefix
–with-opencore-amrnb=DIR
This option is obsolete and replaced by
–with-opencore-amr=DIR
–with-opencore-amr=DIR Specify alternate libopencore-amr prefix
–with-opencore-amrwbenc=DIR
Specify alternate libvo-amrwbenc prefix
–with-silk=DIR Specify alternate SILK prefix
–with-opus=DIR Specify alternate OPUS prefix
–with-bcg729=DIR Specify alternate bcg729 prefix

Some influential environment variables:
CC C compiler command
CFLAGS C compiler flags
LDFLAGS linker flags, e.g. -L if you have libraries in a
nonstandard directory
LIBS libraries to pass to the linker, e.g. -l
CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I if
you have headers in a nonstandard directory
CXX C++ compiler command
CXXFLAGS C++ compiler flags
CPP C preprocessor

Make and install

make dep
make 
make install

Program

create library instance and initiate with default config and logging

lib = pj.Lib()
lib.init(log_cfg = pj.LogConfig(level=LOG_LEVEL, callback=log_cb))

create UDP transport listening on any available port

transport = lib.create_transport(pj.TransportType.UDP,
                                 pj.TransportConfig(0))

create sipuri and local account

my_sip_uri = "sip:" + transport.info().host + ":" + str(transport.info().port)
acc = lib.create_account_for_transport(transport, cb=MyAccountCallback())

Function to make call

def make_call(uri):
    try:
        print "Making call to", uri
        return acc.make_call(uri, cb=MyCallCallback())
    except pj.Error, e:
        print "Exception: " + str(e)
        return None

Ask user input for destination URI and Call

print "Enter destination URI to call: ",
input = sys.stdin.readline().rstrip("\r\n")
if input == "":
    continue
lck = lib.auto_lock()
current_call = make_call(input)
del lck

shutdown the library

transport = None
 acc.delete()
 acc = None
 lib.destroy()
 lib = None

Run

➜  ~ python simplecall.py sip:altanai@127.0.0.1

09:58:09.571        os_core_unix.c !pjlib 2.9 for POSIX initialized
09:58:09.573        sip_endpoint.c  .Creating endpoint instance…
09:58:09.574        pjlib  .select() I/O Queue created (0x7fcfe00590d8)
09:58:09.574        sip_endpoint.c  .Module "mod-msg-print" registered
09:58:09.574        sip_transport.c  .Transport manager created.
09:58:09.575        pjsua_core.c  .PJSUA state changed: NULL --> CREATED
09:58:10.073        pjsua_core.c  .pjsua version 2.9 for Darwin-18.7/x86_64 initialized
Call is  CALLING last code = 0 ()

Debug Help

Isssue1 Undefined symbols for architecture x86_64:
“_pjmedia_codec_opencore_amrnb_deinit”, referenced from:
_amr_encode_decode in mips_test.o
_create_stream_amr in mips_test.o
“_pjmedia_codec_opencore_amrnb_init”, referenced from:
_amr_encode_decode in mips_test.o
_create_stream_amr in mips_test.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [../bin/pjmedia-test-x86_64-apple-darwin18.7.0] Error 1
make[1]: *** [pjmedia-test-x86_64-apple-darwin18.7.0] Error 2
make: *** [all] Error 1
*Solution* Mac often throws this linker error. Follow

./configure --enable-shared --disable-static --enable-memalign-hack
make dep
make install
cd pjproject-2.9/pjsip-apps/src/python/
python setup.py install