Anywhere anytime Telemedicine communication tool accessible on any device. The solution provides a low eight signalling server which drops out as soon as call is connected thus ensuring absolutely private calls without relaying or involving any central server in any call related data or media . This ensure doctor patient details are not processed , stored or recorded by our servers.
The solution enables doctors / nurses / medical practitioners and patients to do
High definition Audio/video calls
End to end encrypted p2p chats
Integration with HMS ( hospital management system ) to fetch history of the patients
Screens sharing to show reports without transferring them as files
Include more concerned people of doctors using Mesh based peer to peer conferencing feature.
Confidentialty and Privacy
For privacy and security of certain health information only HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant video-conferencing tools can only be used for Telemedicine in US.
Telemedicine scenario Callflow
Calllfow for Attended Call Transfer and 2 way conference in a Telemedicine scenario between Patient , hospital attendant , doctor and a nurse
In the course of evolution of RAN ( Radio Access layer) technologies 5G outsmarts 4G-2010 which comes in succession after 3G-2000 , 2.5G, 2G -1990 and 1G / PSTN -1980 respectively . Among the mosy striking features of 5G –
entirely IP based
ability to connect 100x more devices ( IOT favourable )
speed upto 10 Gbit/s
high peak bit rate
high data volume per unit area
virtually 0 latency hence high response time
Thus it can accomodate the rapid growth of rich mulimedia application like OTT streaming of HD content, gaming , Augmented reality so on while enabling devices connected to Internet of Things sto onboard the telecommunication backbone with high system spectral efficiency and ubiquitious connectivity .
Infact 5G has seen maximum investment in year 2020 in revamping infrastrcuture as compared to other technologies such as IoT or even Cloud. This could be partly due to high rise in high speed communication for streaming and remote communication owining to steep rise in remote learning adn working from home scenarious.
mid- band spectrum (2.5–10 GHz) provides a combination of good coverage and very high bitrates,
high band-spectrum (10–100 GHz) provides the bandwidths needed for the highest bitrates (up to 20 Gb/s) and lowest latencies
Workplan for 5G standardisation and release
The Workplan started in 2014 and is ongoing as of now (2018)
image source : 3GPP “Getting ready for 5G”
3GPP is the standard defining body for telecom and has specified almost all RAN technologies like GSM , GPRS , W-CDMA , UMTS , EDGE , HSPAand LTE before .
Applications of 5G
5G targets three main use case
enhanced mobile broadband (eMBB),
massive machine type communications (mMTC)
ultra-reliable low latency communications (URLLC) (also called critical machine type communications (cMTC))
GDPR , Europe’s digital privacy legislation passed in 2018, replaces the 1995 EU Data Protection Directive. It is rules designed to give EU citizens more control over their personal data & strengthen privacy rights. It aims to simplify the regulatory environment for business and citizens.
GDPR (General Data Protection Regulation) in European Union 2018,
California Consumer Privacy Act (CCPA) 2019,
Personal Data Protection Bill (PDP) – India 2018 and
also specifications against Robocalls and SPIT ( SPAM over Internet Telephony) among others
Multinational companies will predominantly be regulated by the supervisory authority where they have their “main establishment” or headquarter. However, the issue concerning GDPR is that it not only applies to any organisation operating within the EU, but also to any organisations outside of the EU which offer goods or services to customers or businesses in the EU.
Key Principles of GDPR are
Lawfulness, fairness and transparency
Purpose limitation
Data minimisation
Accuracy
Storage limitation
Integrity and confidentiality (security)
Accountability
GDPR consists of 7 projects (DPO, Impact assessment, Portability, Notification of violations, Consent, Profiling, Certification and Lead authority) that will strengthen the control of personal data throughout the European Union.
Stakeholders
stakeholders of data protection regulation are Data Subject – an individual, a resident of the European Union, whose personal data are to be protected
Data Controller – an institution, business or a person processing the personal data e.g. e-commerce website.
Data Protection Officer – a person appointed by the Data Controller responsible for overseeing data protection practices.
Data Processor – a subject (company, institution) processing a data on behalf of the controller. It can be an online CRM app or company storing data in the cloud.
Data Authority – a public institution monitoring implementation of the regulations in the specific EU member country.
Extra-Territorial Scope
Any VoIP service provider may feel that since they are not based out of EU such as officially headquartered in the Asia Pacific or US region they may not be legally binding to GDPR. However, GDPR expands the territorial and material scope of EU data protection law. It applies to both controllers and processors established in the EU, and those outside the EU, who offer goods or services to or monitor EU data subject.
VoIP service providers as Data Processors
A processor is a “person, public authority, agency or other body which processes personal data on behalf of the controller”. Most VoIP service providers are multinational in nature with services offered directly or indirectly to all regions. The GDPR imposes direct statutory obligations on data processors, which means they will be subject to direct enforcement by supervisory authorities, fines, and compensation claims by data subjects. However, a processor’s liability will be limited to the extent that it has not complied with it’s statutory and contractual obligations.
Data minimization – It is now a good practise to store and process as less user’s personal data as necessary to render our services effectively. Also to maintain data for only a stipulated time ( approx 90 days of CDR for call details and logs )
Record Keeping, Accountability and governance
To show compliance with GDPR, a service provider maintain detailed records of processing activities. Also, they must implement technological and organisational measures to ensure, and be able to demonstrate, that processing is performed in accordance with the GDPR. Some ways to apply these are :
Contracts: putting written contracts in place with organisations that process personal data on your behalf
maintaining documentation of your processing activities
Organisational policies focus on Data protection by design and default – two-factor auth, strong passwords to guard against brute-force, encryption, focus on security in architecture
Rish analysis and impact assessments: for uses of personal data that are likely to result in a high risk to individuals’ interests
Audit by Data protection officer
Clear Codes of conduct
Certifications
As for a VOIP landscape thankfully every call or message session is followed by a CDR ( Calld Detail Record ) or MDR ( Message Detail Record).
Additionally, assign a unique signature to every data-access client the VoIP system and log every read/write operation carried out on data stores whether persistent datastores or system caches.
Privacy Notices to Subjects
User profile data such as :
Basic identity information, name, address and ID numbers
Web data such as location, IP address, cookie data and RFID tags
Health and genetic data
Bio-metric data
Racial or ethnic data
Political opinions
Sexual orientation
is protected strictly under GDPR rules
A service provider should provide indepth information to data subjects when collecting their personal data, to ensure fairness and transparency. They must provide the information in an easily accessible form, using clear and plain language.
Consent
The GDPR introduces a higher bar for relying on consent , requiring clear affirmative action. Silence, pre ticked boxes or inactivity will not be sufficient to constitute consent. Data subjects can withdraw their consent at any time, and it must be easy for them to do so.
Lawful basis for processing Data now include
In Article 6 of the GDPR , there are six available lawful bases for processing.
(a) Consent: the individual has given clear consent for you to process their personal data for a specific purpose.
(b) Contract: the processing is necessary for a contract you have with the individual, or because they have asked you to take specific steps before entering into a contract.
(c) Legal obligation: the processing is necessary for you to comply with the law (not including contractual obligations).
(d) Vital interests: the processing is necessary to protect someone’s life.
(e) Public task: the processing is necessary for you to perform a task in the public interest or for your official functions, and the task or function has a clear basis in law.
(f) Legitimate interests: the processing is necessary for your legitimate interests or the legitimate interests of a third party, unless there is a good reason to protect the individual’s personal data which overrides those legitimate interests.
File such as PCAPS , Recordings and transcripts of calls hold sensitive information from end users , these should be encryoted and inaccssible to even the dev teams within the org without explicit consent of end user .
Individuals’ Rights
The GDPR provides individuals with new and enhanced rights to Data subjects who will have more control over the processing of their personal data. A data subject access request can only be refused if it is manifestly unfounded or excessive, in particular because of its repetitive character.
Rights of Data Subjets include
Right of Access
Right to Rectification
Right to Be Forgotten
Right to Restriction of Processing
Right to Data Portability
Right to Object
Right to Object to Automated Decisionmaking
For a VoIP service provider if a user opts for redaction then none of his calls or messages should be traced in logs . Also replace distinguishable end user identifier such as phone number and sip uri with *** charecters
Provide option for “Account Deletion” and purge account – If a user wished to close his/her account , his/her detaisl should be deleted form the sustem except for the bare bones detaisl which are otherwise required for legal , taxation and accounting requirnments
Breach Notification
A controller is a “person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of processing of personal data”,
A controller will have a mandatory obligation to notify his supervisory authority of a data breach within 72 hours unless the breach is unlikely to result in a risk to the rights of data subjects. Will also have to notify affected data subjects where the breach is likely to result in a “high risk” to their rights. A processor, however, will only be obliged to report data breaches to controllers
International Data Transfers
Data transfers to countries outside the EEA(European Economic Area) continue to be prohibited unless that country ensures an adequate level of protection. The GDPR retains existing transfer mechanisms and provides for additional mechanisms, including approved codes of conduct and certification schemes.
The GDPR prohibits any non-EU court, tribunal or regulator from ordering the disclosure of personal data from EU companies unless it requests such disclosure under an international agreement, such as a mutual legal assistance treaty.
One of the biggest challenges for a service provider is the identification & categorization of GDPR impacted data sets in disparate locations across the enterprise. A dev team must flag tables, attributes and other data objects that are categorically covered under GDPR regulations and then ensure that they are not transferred to a server outside of EU.
In the present age of Virtual shared server instance, cloud computing and VoIP protocol it is operational a very tough task for a communication service provider to ensure that data is not transferred outside of EU such as a VoIP call from origination in US and destination in EU will require information exchanges via SDP, vcard , RTP stream via media proxies etc.
Sanctions
The GDPR provides supervisory authorities with wide-ranging powers to enforce compliance, including the power to impose significant fines. You will face fines of up to €20m or 4% of your total worldwide annual turnover of the preceding financial year. In addition, data subjects can sue you for pecuniary or non-pecuniary damages (i.e. distress). Supervisory authorities will have a discretion as to whether to impose a fine and the level of that fine.
Data Protection officer (DPO)
Under the terms of GDPR, an organisation must appoint a Data Protection Officer (DPO) if it carries out large-scale processing of special categories of data, carries out large scale monitoring of individuals such as behaviour tracking or is a public authority.
With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution
This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.
Media Server Topology
Point to Point
endpoints communicating over unicast RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports
Point to Point via Middlebox
Same as above but with a middle-box involved
Translator
mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.
subtypes :
Transport/Relay Achoring
Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server Middleboxes for auditing or privacy control of particpant’s IP Other SBC ( Session Border Gateways) like charecteristics are also part of this topology setup
Transport translator
interconnecting networs like mutlicast to unicast media repacktization to allow other media to connect tgo the session like non RTP protocols
Media transaltor
modified the media inside of RTP streams commonly known as transcoding can do uptp full encoding / decoding of RTP streams in many cases it can also act of behalf of non RTP supported endpoints , receivinga nd repsosnidng to feedback reports ad performing FEC ( forward error corrected )
Back-To-Back RTP Session
Mostly like middlebox like trnslator but establishes separte legs RTP session with the endpoints , bridging the two sessions.
takes compleet repsososibility of forwarding teh correct RTP payload and maianting the realtion between the SSRC and CNAMEs
Point to Point using Multicast
Any-Source Multicast (ASM)
traffic from any particpant sent to the multicat group address reaches all other partcipants
Source-Specific Multicast (SSM)
Selective Sender stream to the multicast group which streams it to the recibers
Point to Multipoint using Mesh
many unicast RTP streams making a mesh
Point to Multipoint + Translator
some more varients of this topoplogy are Point to Multi point with Mixer
Media Mixing Mixer
receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.
Media Swicthing Mixer
RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.
The Mixer can reduce bitrtae or switch between sources like active speaker.
SFU ( selective Forwarding Unit)
middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This gtramsission is set up is independant RTP Session.
extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.
On a high level , one can safely assume that for no of peers between 3-6 mesh archietctures make sense however any number above it require centralized media archietcture .
Among teh centralized media archietctures , SFU makes sense for atmomst 6-15 people in a confernece however is teh number of participants exceed that it may need to switch to MCU mode. However there is another architecture which works on Hybrid mode
Point to Multipoint Using Video-Switching MCUs
much like MCU but MCU can switch the bitrate and resilution stream based on active speaker , host or ppt presenter , floor control like charecteristics This setup can embed the charecteristics of trabnslator , sleector and can ecen do congesyion congrol based on RTCP
To handle a multipoint confernece scenario it acts as a transaltor forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modify the RTCP RRs it forwards between the domains
Transport Protocols
Before getting into indepth discussion of all possible types of Media Archietctures in VoIP system , lets learn about TCP vs UDP
TCP is a reliable connection oriented protocol which sends REQ and receives ACK to establish connection between cmmunicating parties . It sequeentiallys ends packets which can be resent inidvidually when the receiver reciognizes out of order packets . It is thus used for session creation due to its errorx correction and congestion control features .
Once a session is established it automatically shifts to RTP over UDP . UDP even though not as reliable , not guarrenting non-duplication and delivery error correction is used due to its tunneling methodds where packets of other protcols are encapsulated isnide of UDP packet. However to provide ened to end security other methods for Auth and encryption
Audio PCAP storage and Privacy constraints for Media Servers
A Call session produces various traces for offtime monitoring and analysis which can include
CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc
Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .
Media Stats – jitter , buffer , RTT , MOS for all legs and avg values
Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.
Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.
Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.
Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent between each other as part of offeer and answer or SDP in SIP.
As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs
RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.
RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for VP8 and H.264’s Constrained Baseline profile for video .
In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.
The MediaStreamTrack interface typically represents a stream of data of audio or video and a MediaStream may contain zero or more MediaStreamTrack objects.
The objects RTCRtpSender and RTCRtpReceiver can be used by the application to get more fine grained control over the transmission and reception of MediaStreamTracks.
Media Flow in VoIP systemMedia Flow in WebRTC Call
Video
Video Capture insync with hardware’s capabilities
WebRTC compatible browsers are required to support Whie-balance , light level , autofocus from video source
Video Capture Resolution
Minimum WebRTC video attributes unless specified in SDP ( Session Description protocl ) is minimum 20 FPS and resolution 320 x 240 pixels.
Also supports mid stream resilution changes such as in screen source fromdesktop sharinig .
SDP attributes for resolution, frame rate, and bitrate
SDP allows for codec-independent indication of preferred video resolutions using a=imageattr to indicate the maximum resolution that is acceptable.
Sender must send limiting the encoded resolution to the indicated maximum size, as the receiver may not be capable of handling higher resolutions.
Dynamic FPS control based on actual hardware encoding :
video source capture to adjust frame rate accroding to low bandwidth , poor light conditions and harware supported rate rather than force a higher FPS .
Stream Orientation
support generating the R0 and R1 bits of the Coordination of Video Orientation (CVO) mechanism and sharing with peer
Codecs
WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .
WebRTC Browsers MUST implement the VP8 video codec as described in RFC6386] and H.264 Constrained Baseline as described in [H264].
RFC 7442 WebRTC Video Codec and Processing Requirements
most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding.
Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :
Developed by on2 and then acquired and opensource by google . Now free of royality fees.
Supported conatiner – 3GP, Ogg, WebM
No limit on frame rate or data rate and provides maximum resolution of 16384×16384 pixels.
libvpx encoder library.
VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes. encode and decode pixels with an implied 1:1 (square) aspect ratio.
supported simulcast
VP9
Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates .
Open and free of royalties and any other licensing requirements Its supported Containers are – MP4, Ogg, WebM
H264/AVC constrained
AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC
Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos
Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.
Other profiles , which are not supporedt are Baseline (BP) , Extended (XP), Main (MP) , High (HiP) , Progressive High (ProHiP) , High 10 (Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive
Its supported containers are 3GP, MP4, WebM
Parameter settings:
packetization-mode
max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )
It is a propertiary , patented codec , mianted by MPEG / ITU
AV1 (AOMedia Video 1)
open format designed by the Alliance for Open Media royality free especially designed for internet video HTML element and WebRTC higher data compression rates than VP9 and H.265/HEVC
offers 3 profiles in increasing support for color depths and chroma subsampling. main, high, and professional
supports HDR supports Varible Frame Rate
Supported container are ISOBMFF, MPEG-TS, MP4, WebM
Graph for Video Track in chrome://webrtc-internals
Other RTP parameters
RTX(regtranmission ) – packet loss recovery technique for real-time applications with relaxed delay bounds.
Non WebRTC supported Video codecs
Need active realtime media transcoding
H.263
Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems. suited for low bandwidth networks Although it is not comaptible with WebRTC but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them
H.265 / HEVC
proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .
Container – Mp4
Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints
With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.
The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8
Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints
Audio
Audio Level
audio level for speech transmission to avoid users having to manually adjust the playback and to facilitate mixing in conferencing applications.
normalization considering frequencies above 300 Hz, regardless of the sampling rate used.
adapted to avoid clipping, either by lowering the gain to a level below -19 dBm0 or through the use of a compressor.
GAIN calculation
If the endpoint has control over the entire audio-capture path like a regular phone the gain should be adjusted in such a way that an average speaker would have a level of 2600 (-19 dBm0) for active speech.
If the endpoint does not have control over the entire audio capture like software endpoint then the endpoint SHOULD use automatic gain control (AGC) to dynamically adjust the level to 2600 (-19 dBm0) +/- 6 dB.
For music- or desktop-sharing applications, the level SHOULD NOT be automatically adjusted, and the endpoint SHOULD allow the user to set the gain manually.
Acoustic Echo Cancellation (AEC)
Endpoints shoudl allow echo control mechsnisms
Codecs
WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.
Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)
For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommenda that Opus be offered before PCMA/PCMU.
AAC (Advanvced Audio Encoding)
part of the MPEG-4 (H.264) standard supported congainers – MP4, ADTS, 3GP
Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.
G.711 (PCMA and PCMU)
ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding. vital to interface with the standard teelcom network and carriers
Fixed 64Kbpd bit rate
supports 3GP container formats
G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU
G.722
ncoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression conatiners used 3GP, AMR-WB
Comfort noise (CN)
artificial background noise which is used to fill gaps in a transmission instead of using pure silence
avoids jarring or RTP Timeout
for streams encoded with G.711 or any other supported codec that does not provide its own CN. Use of Discontinuous Transmission (DTX) / CN by senders is optional
Internet Low Bitrate Codec (iLBC)
opensource narrow band codec designed specifically for streaming voice audio
Internet Speech Audio Codec (iSAC)
designed for voice transmissions which are encapsulated within an RTP stream.
DTMF and ‘audio/telephone-event’ media type
endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.
Major standards bodies including 3GPP, ITU-T, and ETSI have all adopted SIP as the core signalling protocol for services such as LTE , VoIP, conferencing, Video on Demand (VoD), IPTV (Internet Television), presence, and Instant Messaging (IM) etc. With the continous evolution of SIP as the defacto VoIP protocol , we need to underatdn the risk mitigartion practices around it .
I have written about VoIP and security in these blogs before
For Security around web browser based calling via webrtc i have written
Webrtc Security , which describes browser threat modal , access to local resource , Same Orogin Policy (SOP) and Cross Resource Sharing ( CORS) as well as Location sharing , ICE , TUEN and threats to privacy with screen sharing , microgone camera long term access and probable mid call attacks .
Genric secrutity of web Application build around hosting platform of webrtc. Includs concepts like Identity management , browser security – cross site security amd clickjacking , Authetication of devices and applications , Media Encryption and regex checking.
In this article we will cover types of attacks on SIP systems
Types of attacks on SIP based systems
Registration Hijacking
malicious registrations on registrar by a third party who modifies From header field of a SIP request.
exmaple implementation : attacker de-registers all existing contacts for a URI attacker can also register their own device as the appropriate contact address, thereby directing all requests for the affected user to him
solution – Autheticaion of user
Impersonating a Server
attacker impersonates the remote server user’s request can now be intercepted by some other party user’s request may be forwarded to insecure locations
Solution –
confidentiality, integrity, and authentication of proxy servers
Proxy/redirect sever, and registrars SHOULD possess a site certificate issued by CA which could be validated by UA
Temparing Message bodies
If users are relying on SIP message bodies to communicate either of
session encryption keys for a media session
MIME bodies
SDP
encapsulated telephony signals Then the atackers on proxy server can modify the session key or can act as a man-in-the-middle and do eaves droppng
exmaple implementation : attacker can point RTP media streams to a wiretapping device can changes Subject header field to appear to users as spam
solution – end to end ecryption over TLS + Digest Authorization
mid-session threats like tearing down session
Request forging attacker learns the params of the session like To , From tags etc then he can alter ongoing session parameters and even bring it down
example implementation : attacker inserts a BYE in a ongoing session thereby tearing it down can insert re INVITE and redierct the stream to wiretaping device
solution – authetication on every request signing and encrypting of MIME bodies, and transference of credentials with S/MIME
Denial of Service and Amplification
DOS attacks – rendering a particular network element unavailable, usually by directing an excessive amount of network traffic at its interfaces. dDOS – multiple network hosts to flood a target host with a large amount of network traffic.
Can be created by sending falsified sip requests to other parties such that numerous transactions originating in the backwards direction comes to the target server created congestion.
exmaple implementation : attackers creates a falsified source IP address and a corresponding Via header field that identify a targeted host as the originator of the request. Then send this to large number of SIP network element . This geneerates DOS aimed at target.
attackers uses falsified Route header field values in a request that identify the target host and then send such messages to forking proxies that will amplify messaging sent to the target.
Flooding with register attacks can deplete available memory and disk resources of a registrar by registering huge numbers of bindings. Flooding a stateful proxy server causes it to consume computational expense associated with processing a SIP transaction
Solution – detect flooding and pike in traffic and use ipban to block challenge questionable requests with only a single 401 (Unauthorized) or 407 (Proxy Authentication Required), forgoing the normal response retransmission algorithm, and thus behaving statelessly towards unauthenticated requests.
Security mchanisms
Full encryption vs hop by hop encrption
SIP mssages cannot be encrypted end-to-end in their entirety since message fields such as the Request-URI, Route, and Via need to be visible to proxies in most network architectures so that SIP requests are routed correctly. proxy servers need to also update the message with via headers
Thus SIP uses low level security along with hop by hop encrption and auth headers to verify the identity of proxy servers
Transport and Network Layer Security
IPsec – used where set of hosts or administrative domains have an existing trust relationship with one another.
TLS – used where hop-by-hop security is required between hosts with no pre-existing trust association.
SIPS URI Scheme
Used as an address-of-record for a particular user, signifies that each hop over which the request is forwarded, must be secured with TLS
HTTP Authentication
Reuse of the HTTP Digest authentication via 401 and 407 response codes that implement challenge for autehtication provides replay protection and one-way authentication.
S/MIME
allows SIP UAs to encrypt MIME bodies within SIP, securing these bodies end-to-end without affecting message headers. provides end-to-end confidentiality and integrity for message bodies
nonce-count
provides replay protection
SIP over TLS
SIP messages can be secured using TLS. There is also TLS for Datagrams called DTLS.
Security of SIP signalling is different from security of protocols used in concert with SIP like RTP , RTCP. and that will be covered in later topics of this article.
TLS operation consists of two phases: handshake phase and bulk data encryption phase
Handshake phase
Prepare algorithm to be used during TLS session
Server Authentication
server sends its certificate to the client, which then verifies the certificate using a certificate authority’s (CA’s) public key.
Client Authentication
Server sends an additional CertificateRequest message to request the client’s certificate. The client responds with
Certificate message containing the client certificate with the client public key and
CertificateVerify message containing a digest signature of the handshake messages signed by clients private key
Server authenticates client by client’s public key , since only client holding correct private key can sign the message.
prepare the shared secret for bulk data encryption
client generate a pre_master_secret, and encrypt it using the server’s public key obtained from the server’s certificate. The server decrypts the pre_master_secret using its own private key. Both the server and client then compute a master_secret they share based on the same pre_master_secret. The master_secret is further used to generate the shared symmetric keys for bulk data encryption and message authentication
Public key cryptographic operations such as RSA are much more expensive than shared key cryptography. This is why TLS uses public key cryptography to establish the shared secret key in the handshake phase, and then uses symmetric key cryptography with the negotiated shared secret as the data encryption key.
Stateless proxy servers do not maintain state information about the SIP session and therefore tend to be more scalable. However, many standard application functionalities, such as authentication, authorization, accounting, and call forking require the proxy server to operate in a stateful mode by keeping different levels of session state information.
Steps :
The SIP proxy server enforces proxy authentication with 407 Proxy Authentication Required challenge.
UAC provides credentials that verify its claimed identity (e.g., based on MD5 [34] digest algorithm) and retransmits in authorization header
Security of RTP
confidentiality protection of the RTP session and integrity protection of the RTP/RTCP packets requires source authentication of all the packets to ensure no man-in-the-middle (MITM) attack is taking place.
end to end media encryption – SRTP ( Secure RTP )
encodes the voice into encrypted IP packages and transport those via the internet from the transmitter to receive
References
The Impact of TLS on SIP Server Performance – Charles Shen† Erich Nahum‡ Henning Schulzrinne† Charles Wright , Department of Computer Science, Columbia University,IBM T.J. Watson Research Center
HTTP ( Hyper Text Transfer Protocol ) is the top application layer protocol atop the Tarnsport layer ( TCP ) and the Network layer ( IP )
HTTP/1.1
release in 1997. Since HTTP/1 allowed only 1 req at a time , HTTP/1.1
Allows one one outstanding connection on a TCP session but allowed request pieplinig to achieve concurency.
HTTP/2
In 2015, HTTP/2 was released which aimed at reducing latency while delivering heavy graphics, videos and other media cpmponents on web page especially on mobile sites . optimizes server push and service workers
FRAMES
A key differenet between Http/1.1 and HTTP/2 is the fact that former transmites requests and reponses in plaintext whereas the later encapsulates them into binary format , proving more features and scope for optimzation.
Thus at protocol level , it is all about frames of bytes which are part of stream.
“enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.”
It is important to know that Browsers only implement HTTP/2 under HTTPS, thus TLS connection is must for whichw e need certs ad keys signed by CA ( either self signed using openssl , signed by public CA like godaddy , verisign or letsencrypt)
Compatibility Layer between HTTP1.1 and HTTP2.0 in node
Nodejs >9 provides http2 as native module. Exmaple of using http2 with compatibility layer
const http2 = require('http2');
const options = {
key: 'ss/key', // path to key
cert: 'ssl/cert' // path to cert
};
const server = http2.createSecureServer(options, (req, res) => {
req.addListener('end', function () {
file.serve(req, res);
}).resume();
});
server.listen(8084);
in replacement for existing server http/https server
const https = require('https');
app = https.createServer(options, function (request, response) {
request.addListener('end', function () {
file.serve(request, response);
}).resume();
});
app.listen(8084);
Socket.io/ Websocket over HTTP2
The WebSocket Protocol uses the HTTP/1.1 Upgrade mechanism to transition a TCP connection from HTTP into a WebSocket connection
Due to its multiplexing nature, HTTP/2 does not allow connection-wide header fields or status codes, such as the Upgrade and Connection request-header fields or the 101 (Switching Protocols) response code. These are all required for opening handshake.
Ideally the code shouldve looekd like this with backward compatiability layer , but continue reading update ..
var app = http2
.createSecureServer(options, (req, res) => {
req.addListener('end', function () {
file.serve(req, res);
}).resume();
})
.listen(properties.http2Port);
var io = require('socket.io').listen(app);
io.origins('*:*');
io.on('connection', onConnection); // evenet handler onconnection
Error during WebSocket handshake: Unexpected response code: 403
update May 2020 : I tried using the http2 server with websocket like mentioned above ,h owever many many hours of working around WSS over HTTP2 secure server , I consistencly kept faccing the ECONNRESET issues after couple of seconds , which would crash the server
client 403server ECONNRESET
Therefore leaving the web server to server htmll conetnt I reverted the siganlling back to HTTPs/1.1 given the reasons for sticking with WSS is low latency and existing work that was already put in.
Reading Further of exploring HTTP CONNECT methods for setting WS handshake . Will update this section in future if it works .
Streams
A “stream” is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection. A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.
Core http2 module provides new core API (Http2Stream), accessed via a “stream” listener:
With the new binary framing mechanism in place, HTTP/2 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP/2 connections are persistent, and only one connection per origin is required,
Server Push
bundling multiple assets and resources into a single HTTP/2 and lets the srever proactively push resources to client’s cache .
Server issues PUSH_PROMISE , client validates whether it needs the resource of not. If the client matches it then they will load like regular GET call
The PUSH_PROMISE frame includes a header block that contains a complete set of request header fields that the server attributes to the request.
After sending the PUSH_PROMISE frame, the server can begin delivering the pushed response as a response on a server-initiated stream that uses the promised stream identifier.
Client receives a PUSH_PROMISE frame and can either chooses to accept the pushed response or if it does not wish to receive the pushed response from the server it can can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream’s identifier.
Push Stream Support
-tbd
respondWithFile() and respondWithFD() APIs can send raw file data that bypasses the Streams API.
Related technologies
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.
Email messages + MIME : transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).
MIME in HTTP in WWW : servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.
MIME header fields
MIME version
MIME-Version: 1.0
Content Type
Content-Type: text/plain
multipart/mixed , text/html, image/jpeg, audio/mp3, video/mp4, and application/msword
content disposition
Content-Disposition: attachment; filename=genome.jpeg;
modification-date="Wed, 12 Feb 1997 16:29:51 -0500";
This article describes various Certificates and compliances, Bill and Acts on data privacy, Security and prevention of Robocalls as adopted by countries around the world pertaining to Interconnected VoIP providers, telecommunications services, wireless telephone companies etc
Compliance certificates by Industry types
HIPAA (Health Insurance Portability and Accountability Act)
Deals with privacy and security of personal medical records and electronic health care transaction
Applicability : If voip company handles medical information
Includes :
Not allowed Voice mail transcription
Should have End-to-End Encryption
Restrict using unsecured WiFi networks to prevent Snooping
User security , strong password rules and mandatory monthly change
Secure Firmware on VoIP phones
Maintaining Call and Access Logs
SOX( Sarbanes Oxley Act of 2002)
Also known as SOX, SarbOX or Public Company Accounting Reform and Investor Protection Act
Applicability : if managing the communications operations of a regulated, publicly traded company
Includes :
Retain records which include financial and other sensitive data
ways employees are provided or denied access to records or data based on their roles and responsibilities
do information audit by a trusted third party.
Retention and deletion of files such as audio files like voicemails, text messages, video clips, declared paper records, storage, and logs of communications activities
Physical and digital security controls around cloud-based VoIP applications and the networks
Privacy Related Compliance certificates
COPPA (Children’s Online Privacy Protection Act ) of 1998
prohibits deceptive marketing to children under the age of 13, or collecting personal information without disclosure to their parents.
any information is to be passed on to a third party, must be easy for the child’s guardian to review and/or protect
2011 amendment requires that the data collected was erased after a period of time,
2014 FTC issued guidelines that apps and app stores require “verifiable parental consent.”
CPNI (Customer Proprietary Network Information) in united states is the information that communication providers acquire about their subscribers. This Individually identifiable information that is created by a customer’s relationship with a provider, such as data about the frequency, duration, and timing of calls, the information on a customer’s bill, and call identifying information. This processing information is governed strictly by FCC and certification should be renewed on an annual basis
Provider can pass along that information to marketers to sell other services, as long as the customer is notified
In 2007, the FCC explicitly extended the application of the Commission’s CPNI rules of the Telecommunications Act of 1996 to providers of interconnected VoIP service.
CALEA
Communications Assistance for Law Enforcement Act (CALEA) conduct electronic surveillance by imposing specific obligations on “telecommunications carriers” for assisting law enforcement, including delivering call interception and call identification functionality to the government with a minimum of interference to customer service and privacy.
Establishes requirements of organizations that process data, defines the rights of individuals to manage their data, and outlines penalties for those who violate these rights.
No personal data may be processed unless this processing is done under one of six lawful bases specified by the regulation (consent, contract, public task, vital interest, legitimate interest or legal requirement). When the processing is based on consent the data subject has the right to revoke it at any time.
Controllers must notify Supervising Authorities (SA)s of a personal data breach within 72 hours of learning of the breach.
California Consumer Privacy Act (CCPA) 2019
consumer rights relating to the access to, deletion of, and sharing of personal information that is collected by businesses.
Allows consumers to know whether their personal data is sold or disclosed , to whom .
Allows opt-out right for sales of personal information
Right to deletion – to request a business to delete any personal information about a consumer collected from that consumer
Personal Data Protection Bill (PDP) – India 2018
This bill introduces various private and sensitive protection frameworks like restriction on retention of personal data, Right to correction and erasure (such as right to be forgotten) , Prohibition and transparency of processing of personal data. It also classifies data fiduciaries including certain social media intermediaries.
The Bill amends the Information Technology Act, 2000 to delete the provisions related to compensation payable by companies for failure to protect personal data.
Other data privacy acts similar to GDPR
South Korea’s Personal Information Protection Act 2011
Brazil’s Lei Geral de Proteçao de Dados (LGPD) 2020
Privacy Amendment (Notifiable Data Breaches) to Australia’s Privacy Act 2018
Japan’s Act on Protection of Personal Information 2017
Thailand Personal Data Protection Act (PDPA) 2020
Features offered by VOIP companies for Data privacy
Access Control & Logging
Auto Data Redaction / Account Deletion policy
SIEM (Security information and event management) alerts
Information security , Encrypted Storage For Recordings & Transcripts
Disclosing all third party services that are involved in data processing too
Role Based Access Control and 2 Factor Authentication
Data Security Audits and appointing data protection officer to oversee GDPR compliance
Against Robocalls and SPIT ( SPAM over Internet Telephony)
2009 Truth in Caller ID Act
Telephone Consumer Protection Act of 1991
Implementation of Do not call registry against use of robocalls, automatic dialers, and other methods of communication
Do-Not-Call Implementation Act of 2003
if a business has an established relationship with a customer, it can continue to call them for up to 18 months. If a consumer calls the company, say, to ask for information about the product or service, the company has three months to get back to him.
if the customer asks to not receive calls, the company must stop calling, or be subject to fines.
Exemptions – Calls from a not-for-profit B organisation , informational messages as flight cancellations , Calls from sales and debt collectors etc
Personal Data Privacy and Security Act 2009
Implemented to curb identity theft and computer hacking. Sensitive personal identifiable information includes : victim’s name, social security number, home address, fingerprint/biometrics data, date of birth, and bank account numbers.
Any company that is breached must notify the affected individuals by mail, telephone, or email, and the message must include information on the company and how to get in touch with credit reporting agencies
If the breach involves government or national security , company must also contact the Secret Service within fourteen days
TRACED Act (Telephone Robocall Abuse Criminal Enforcement and Deterrence) 2019
Canadian Radio-television and Telecommunications Commission (CRTC) 2018 -32
Unlike traditional telephone connections, which are tied to a physical location, VOIP’s packet switched technology allows a particular number to be anywhere making it more difficult for it to reach localised services like emergency numbers of Public Safety Answering Points (PSAPs) . Thus FCC regulations as well as the New and Emerging Technologies 911 Improvement Act of 2008 (NET 911 Act), interconnected VoIP providers are required to provide 911 and E911 service.
supports OAuth 2.0 based authentication. The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection setConfiguration method to periodically refresh the TURN credentials.
RTCOAuthCredential Dictionary
describe the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server
ICE candidate policy [JSEP] to select candidates for the ICE connectivity checks
relay – use only media relay candidates such as candidates passing through a TURN server. It prevents the remote endpoint/unknown caller from learning the user’s IP addresses
all – ICE Agent can use any type of candidate when this value is specified.
RTCBundlePolicy Enum
balanced – Gather ICE candidates for each media type (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
max-compat – Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
max-bundle – Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.
If the value of configuration.bundlePolicy is set and its value differs from the connection’s bundle policy, throw an InvalidModificationError.
RTCRtcpMuxPolicy Enum
what ICE candidates are gathered to support non-multiplexed RTCP.
negotiate – Gather ICE candidates for both RTP and RTCP candidates. If the remote-endpoint is capable of multiplexing RTCP, multiplex RTCP on the RTP candidates. If it is not, use both the RTP and RTCP candidates separately.
require – Gather ICE candidates only for RTP and multiplex RTCP on the RTP candidates. If the remote endpoint is not capable of rtcp-mux, session negotiation will fail.
If the value of configuration.rtcpMuxPolicy is set and its value differs from the connection’s rtcpMux policy, throw an InvalidModificationError. If the value is “negotiate” and the user agent does not implement non-muxed RTCP, throw a NotSupportedError.
An RTCPeerConnection object has an operations chain which ensures that only one asynchronous operation in the chain executes concurrently.
Also an RTCPeerConnection object MUST not be garbage collected as long as any event can cause an event handler to be triggered on the object. When the object’s internal slot is true ie closed, no such event handler can be triggered and it is therefore safe to garbage collect the object.
CreateOffer() – generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including
descriptions of the local MediaStreamTracks attached to this RTCPeerConnection,
codec/RTP/RTCP capabilities
ICE agent (usernameFragment, password , local candiadtes etc )
DTLS connection
var pc = new RTCPeerConnection();
pc.createOffer({
mandatory: {
OfferToReceiveAudio: true,
OfferToReceiveVideo: true
},
optional: [{
VoiceActivityDetection: false
}]
}).then(function(offer) {
return pc.setLocalDescription(offer);
})
.then(function() {
// Send the offer to the remote through signaling server
})
.catch(handleError);
CreateAnswer() – generates an SDPanswer with the supported configuration for the session that is compatible with the parameters in the remote configuration
var pc = new RTCPeerConnection();
pc.createAnswer({
OfferToReceiveAudio: true
OfferToReceiveVideo: true
})
.then(function(answer) {
return pc.setLocalDescription(answer);
})
.then(function() {
// Send the answer to the remote through signaling server
})
.catch(handleError);
Codec preferences of an m= section’s associated transceiver is said to be the value of the RTCRtpTranceiver with the following filtering applied
If direction is “sendrecv”, exclude any codecs not included in the intersection of RTCRtpSender.getCapabilities(kind).codecs and RTCRtpReceiver.getCapabilities(kind).codecs.
If direction is “sendonly”, exclude any codecs not included in RTCRtpSender.getCapabilities(kind).codecs.
If direction is “recvonly”, exclude any codecs not included in RTCRtpReceiver.getCapabilities(kind).codecs.
RTCPriorityType Priority and QoS Model which can be
“very-low”, “low”, “medium”, “high”
RTP Media API
Send and receive MediaStreamTracks over a peer-to-peer connection. Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.
The actual encoding and transmission of MediaStreamTracks is managed through objects called RTCRtpSenders. Similarly, the reception and decoding of MediaStreamTracks is managed through objects called RTCRtpReceivers. These are associated with one track.
RTCRtpTransceivers are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack(), or explicitly when the application uses the addTransceiver(). They are also created when a remote description is applied that includes a new media description.
dictionary RTCRtpCodecParameters {
required octet payloadType;
required DOMString mimeType;
required unsigned long clockRate;
unsigned short channels;
DOMString sdpFmtpLine;
};
payloadType – identify this codec. mimeType – codec MIME media type/subtype. Valid media types and subtypes are listed in [IANA-RTP-2] clockRate – expressed in Hertz channels – number of channels (mono=1, stereo=2). sdpFmtpLine – “format specific parameters” field from the “a=fmtp” line in the SDP corresponding to the codec
voiceActivityFlag of type boolean – Only present for audio receivers. Whether the last RTP packet, delivered from this source, contains voice activity (true) or not (false).
RTCRtpTransceiver Interface
Each SDP media section describes one bidirectional SRTP (“Secure Real Time Protocol”) stream. RTCRtpTransceiver describes this permanent pairing of an RTCRtpSender and an RTCRtpReceiver, along with some shared state. It is uniquely identified using its mid property.
Thus it is combination of an RTCRtpSender and an RTCRtpReceiver that share a common mid. An associated transceiver( with mid) is one that’s represented in the last applied session description.
Method stop() – Irreversibly marks the transceiver as stopping, unless it is already stopped. This will immediately cause the transceiver’s sender to no longer send, and its receiver to no longer receive. stopping transceiver will cause future calls to createOffer to generate a zero port in the media description for the corresponding transceiver and stopped transceiver will cause future calls to createOffer or createAnswer to generate a zero port in the media description for the corresponding transceiver
Methods setCodecPreferences() – overrides the default codec preferences used by the user agent.
Example setting codec Preferebec for OPUS in audio
peer = new RTCPeerConnection();
const transceiver = peer.addTransceiver('audio');
const audiocapabilities = RTCRtpSender.getCapabilities('audio');
let codec = [];
codec.push(audiocapabilities.codecs[0]);
transceiver.setCodecPreferences(codec);
Access to information about the Datagram Transport Layer Security (DTLS) transport over which RTP and RTCP packets are sent and received by RTCRtpSender and RTCRtpReceiver objects, as well other data such as SCTP packets sent and received by data channels. Each RTCDtlsTransport object represents the DTLS transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].
“new”- DTLS has not started negotiating yet. “connecting” – DTLS is in the process of negotiating a secure connection and verifying the remote fingerprint. “connected”- DTLS has completed negotiation of a secure connection and verified the remote fingerprint. “closed” – transport has been closed intentionally like close_notify alert, or calling close(). “failed” – transport has failed as the result of an error like failure to validate the remote fingerprint
Protocols multiplexed with RTP (e.g. data channel) share its component ID. This represents the component-id value 1 when encoded in candidate-attribute while ICE candadte for RTCP has component-id value 2 when encoded in candidate-attribute.
This interface candidate Internet Connectivity Establishment (ICE) configuration used to setup RTCPeerconnection. To facilitate routing of media on given peer connection, both endpoints exchange several candidates and then one candidate out of the lot is chosen which will be then used to initiate the connection.
candidate – transport address for the candidate that can be used for connectivity checks.
component – candidate is an RTP or an RTCP candidate
foundation – unique identifier that is the same for any candidates of the same type , helps optimize ICE performance while prioritizing and correlating candidates that appear on multiple RTCIceTransport objects.
ip , port
priority
protocol – tcp/udp
relatedAddress , relatedPort
sdpMid – candidate’s media stream identification tag
sdpMLineIndex
usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).
active – An active TCP candidate is one for which the transport will attempt to open an outbound connection but will not receive incoming connection requests.
passive – A passive TCP candidate is one for which the transport will receive incoming connection attempts but not attempt a connection.
so – An so candidate is one for which the transport will attempt to open a connection simultaneously with its peer.
UDP candidate type
host – actual direct IP address of the remote peer
srflx – server reflexive , generated by a STUN/TURN server
prflx – peer reflexive ,IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE
usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).
Access to information about the ICE transport over which packets are sent and received. Each RTCIceTransport object represents the ICE transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].
“new” – ICE agent is gathering addresses or is waiting to be given remote candidates
“checking” –
“connected” – Found a working candidate pair, but still performing connectivity checks to find a better one.
“completed” – Found a working candidate pair and done performing connectivity checks.
“disconnected”,
“failed”,
“closed”
RTCIceRole Enum
“unknown”, // agent who role is not yet defined “controlling”, // controlling agent “controlled” // controlled agent
RTCIceComponent Enum
“rtp”, // ICE Transport is used for RTP (or RTCP multiplexing) “rtcp” // ICE Transport is used for RTCP
Peer-to-peer Data API
-tbd
Peer-to-peer DTMF
-tbd
Statistics Model
The browser maintains a set of statistics for monitored objects, in the form of stats objects. A group of related objects may be referenced by a selector( like MediaStreamTrack that is sent or received by the RTCPeerConnection).
Statistics API extends the RTCPeerConnection interface
RTCReceivedRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes packetsReceived, packetsLost, jitter, packetsDiscarded
RTCInboundRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes trackId, receiverId, remoteId, framesDecoded, nackCount
RTCRemoteInboundRTPStreamStats, all required attributes from its inherited dictionaries, and also attributes localId, bytesReceived, roundTripTime
RTCSentRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes packetsSent, bytesSent
RTCOutboundRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes trackId, senderId, remoteId, framesEncoded, nackCount
RTCRemoteOutboundRTPStreamStats, with all required attributes from its inherited dictionaries, and also attributes localId, remoteTimestamp
RTCPeerConnectionStats, with attributes dataChannelsOpened, dataChannelsClosed
To understand the need for implementing an identification verification technique in Internet protocol based network to network communication system , we need to evaluate the existing problem plaguing the VoIP setup .
What is Call ID spoofing ?
Vulnerability of existing interconnection phone system which is used by robo-callers to mask their identity or to make it appear the call is from a legitimate source, usually originates from voice-over-IP (VOIP) systems.
In this context understand the Caller Line identification CLI/ NCLI techniques used by VoIP and OTT( over the top) providers today.
CLI (Caller Line Identification)
If call goes out on a CLI route ( White Route ) the received party will likely see your callerID information
Lawful – Termination is legal on the remote end ie abiding country’s telco infrastructure and stable
Expensive – usually with direct or via leased line (TDM) interconnections with the tier-1 carriers.
Non-CLI (Non-Caller Line Identification)
The Caller ID is not visible at the call If call goes out on a Non-CLI route (Grey Route) goes out on a non-CLI routes they will see either a blocked call or some generic number.
Unlawful – questionable legality or maybe violating some providers AUP(Acceptable Use Policy ) on the remote end.
Cheaper – low quality , usually via VoIP-GSM gateways
Example include robocalls , tele-marketting / spam etc which are unwilling to share their Caller Id for call receiver, to not be blocked or cancelled.
To overcome the problem of non-verifiable spam , robocalls a suite of protocols and procedures are proposed that can combat caller ID spoofing on VOIP and connected public telephone networks.
STIR/SHAKEN
Secure Telephony Identity Revisited / Signature-based Handling of Asserted information using toKENs
Used by robocallers to mask their identity or to make it appear the call is from a legitimate source usually orignates from voice-over-IP (VOIP) systems
STIR
Standards developed by the Internet Engineering Task Force (IETF)
For telecommunication service providers implement certificate management system to create and manage the public and private keys, digital certificates used to sign and verify Caller ID details.
Adds information to the SIP headers that allow the endpoints along the system to positively identify the origin of the data , such as JSON web tokens encrypted with the provider’s private key, encoded using Base64,
There are three levels of verification, or “attestation”
A : Full Attestation indicates that the provider recognizes the entire phone number as being registered with the originating subscriber.
B : Partial Attestation call originated with a known customer but the entire number cannot be verified,
C : Gateway Attestation call can only be verified as coming from a known gateway
How can the Public Key Infrastructure be used ?
In an interconnection network , each telephone service provider will obtain its digital certificate from a certificate authority (CA) that is trusted by other telephone service providers. Calling party signs the SIP Header caller ID as legitimate . The called party verifies that the calling number is authentic
STIR
Originating service provider’s encrypted SIP Identity Header includes the following data:
Attestation level
Date and Time
Calling and Called Numbers
Orig ID for analytics and/or traceback purposes among others
Location of certificate repository
Signature
Encryption algorithm
FCC has also assigned the role of a Secure Telephone Identity Policy Administrator (STI-PA) which oversees that CAs do not provide certificate to spoofing robocallers and enforce the framework for STIR /SHAKEN .
STIR is based on the SIP protocol and is designed to work with calls being routed through a VOIP network. Since traditional endpoints like POTS and SS7 networks also should be covered under this call authenticity framework , SHAKEN was developed to manage call via IP-to-telephone gateways .
Developed by the Alliance of Telecommunications Industry Solutions (ATIS)
Working Steps :
When a call is initiated, a SIP INVITE is received by the originating service provider.
Originating service provider verifies the call source and number to determine how to confirm validity.
Full Attestation (A) — The service provider authenticates the calling party AND confirms they are authorized to use this number. An example would be a registered subscriber.
Partial Attestation (B) — The service provider verifies the call origination but cannot confirm that the call source is authorized to use the calling number. An example would be a calling number from behind an enterprise PBX.
Gateway Attestation (C) — The service provider authenticates the call’s origin but cannot verify the source. An example would be a call received from an international gateway.
Create a SIP Identity header that contains information on the calling number, called number, attestation level, and call origination, along with the certificate thus caller ID “signed” as legitimate
SIP INVITE with the SIP Identity header with the certificate is sent to the destination service provider.
Destination service provider verifies the identity of the header and certificate.
Diagrammatic depiction of flow of how Telecom carriers to digitally validates authenticity before receiving or handoff through their network
SHAKEN
References
RFC 8224 – Authenticated Identity Management in the Session Initiation Protocol (SIP)
RFC 8226 – Secure Telephone Identity Credentials: CertificatesRFC 8588 – Personal Assertion Token (PaSSporT) Extension for Signature-based Handling of Asserted information using toKENs (SHAKEN)
Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1
MPEG 2
MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
generic coding of moving pictures and associated audio information
combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.
better than MPEG 1
evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.
Application
over-the-air digital television broadcasting and in the DVD-Video standard.
TV stations, TV receivers, DVD players, and other equipment
MOD and TOD – recording formats for use in consumer digital file-based camcorders.
XDCAM – professional file-based video recording format.
DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:
H264
Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
introduced in 2004
Better than MPEG2
40-50% bit rate reduction compared to MPEG-2
Support Up to 4K (4,096×2,304) and 59.94 fps
21 profiles ; 17 levels
Compression Model
Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion
Hybrid spatial-temporal prediction model Flexible partition of Macro Block(MB), sub MB for motion estimation Intra Prediction (extrapolate already decoded neighbouring pixels for prediction) Introduced multi-view extension 9 directional modes for intra prediction Macro Blocks structure with maximum size of 16×16 Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )
Applications
most deployed video compression standard
Delivers high definition video images over direct-broadcast satellite-based television services,
Digital storage media and Blu-Ray disc formats,
Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
Security and surveillance systems and DVB
Mobile video, media players, video chat
H265
High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC video compression standard designed to substantially improve coding efficiency stream high-quality videos in congested network environments or bandwidth constrained mobile networks Jan 2013 product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).
HVEC wikipedia
better than H264
overcome shortage of bandwidth, spectrum, storage
bandwidth savings of approx. 45% over H.264 encoded content
resolutions up to 8192×4320, including 8K UHD
Supports up to 300 fps
3 approved profiles, draft for additional 5 ; 13 levels
Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.
multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.
Compression Model
Enhanced Hybrid spatial-temporal prediction model
CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures
Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
Individual rectangular regions that divide the image are independent
Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.
Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
Entropy coding is only CABAC
Applications
cater to growing HD content for multi platform delivery
differentiated and premium 4K content
reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums also provide greater video quality experience at same bitrate
Using ffmpeg for H265 encoding
I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg
After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.
> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4 ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers built with Apple LLVM version 10.0.1 (clang-1001.0.46.4) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr libavutil 56. 22.100 / 56. 22.100 libavcodec 58. 35.100 / 58. 35.100 libavformat 58. 20.100 / 58. 20.100 libavdevice 58. 5.100 / 58. 5.100 libavfilter 7. 40.101 / 7. 40.101 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 3.100 / 5. 3.100 libswresample 3. 3.100 / 3. 3.100 libpostproc 55. 3.100 / 55. 3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2019-06-23T04:58:13.000000Z Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default) Metadata: creation_time : 2019-06-23T04:58:13.000000Z handler_name : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices : 1 x265 [info]: frame threads / pool features : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0 x265 [info]: References / ref-limit cu / depth : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf58.20.100 Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default) Metadata: creation_time : 2019-06-23T04:58:13.000000Z handler_name : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 encoder : Lavc58.35.100 libx265 frame= 746 fps= 64 q=-0.0 Lsize= 606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I: 3, Avg QP:27.18 kb/s: 1884.53 x265 [info]: frame P: 179, Avg QP:27.32 kb/s: 523.32 x265 [info]: frame B: 564, Avg QP:35.17 kb/s: 38.69 x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9% encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25
if you get error like
Unknown encoder 'libx265'
then reinstall ffmpeg with h265 support
AV1
Realtime High quality video encoder product of product of the Alliance for Open Media (AOM) Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)
Wikipedia AV1
better than H265
AV1 is royalty free and overcomes the patent complexities around H265/HVEC
Applications
Video transmission over internet , voip , multi conference
Virtual / Augmented reality
self driving cars streaming
intended for use in HTML5 web video and WebRTC together with the Opus audio format
Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.
synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)
Effects for audio streams processing
delay or echo
To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
Implemented using tape delays or bucket-brigade devices.
flanger
delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
phaser
signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
chorus
delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
equalization
frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
pitch shift
shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
time stretching
changing the speed of an audio signal without affecting its pitch.
resonators
emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
modulation
change the frequency or amplitude of a carrier signal in relation to a predefined signal.
compression
reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
3D audio effects
place sounds outside the stereo basis
reverse echo
swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
wave field synthesis
spatial audio rendering technique for the creation of virtual acoustic environments
ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)
Acoustic echo control
aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
Noise control
microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
Gain control
how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
Linear filtering
ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).
ASP applications in music playback
AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming
Post-processing
techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services
ASP for virtual assitants
Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in
Speech enhancement
multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.
Other areas of ASP
Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).
This article is aimed at explaning the intricacies and detailed offer answer flow in webrtc handshake and JSEP . You can read the following artciles on WebRTC as prereq before reading through this one
WebRTC API – Peerconnection , getUserMedia , Datachannel , DataStaats
JSEP (JavaScript Session Establishment Protocol)
JSEP (JavaScript Session Establishment Protocol) is used during signalling via w3c RTCPeerConnectionAPI interface to set up a multimedia session. The multimedia session description specfies the crtical components of setting up a session between local and remote such as transport ports , protcol , profiles . It also handles the intercation with ICE state machine
Offer/Answer Excahange Flow
prereq : Setup Client side for the caller PeerConnectionFactory to generate PeerConnections PeerConnection for every connection to remote peer MediaStream audio and video from client device
Side initiating the session creates a offer by CreateOffer() API
5. Remote party stores the answer in its local config using setLocalDescription() API
6. Answer is transferred to Initiator side using choice of signalling ( SIP , WS , HTTP, XMPP .. ) again
7. Initiating side stores it use setRemoteDescription() API
Interfaces of webrtc and tracks to stream addition
Process to perform webrtc handshake
Webrtc call setup and incoming call callflow between remote peer , peerconnection actory , peerconnection and application
setup a callreceive a call
Signalling state Transitions on PeerConnection
As the caller initiates a new RTCPeerConnection() , the RTCSignalingState state is “stable” as remote and local descriptions are empty
As the caller initiates call and calls createOffer() , he now has offer SDP and procced to store offer locally with setLocalDescription(offer) the RTCSignalingState state is “have-local-offer” . After than caller send the offer to callee over signalling channel
Simillarily as the calle recives the offer , it starts with RTCSignalingState stable and then proceeds to store the Remote’s offer using setRemoteDescription(offer) , its state is now “have-remote-offer”
The callee generates a provsional answer and for caller and stores it locally , state transitiosn to “have-local-pranswer” . the pranswer SDP is send to caller over signalling channel again .
Caller stores the callee’s pr answer SDP and state updates to “have-remote-pranswer”
Once there is no offer/answer exchange in progress the state again changes to ” stable “.
State schanges to “closed” if RTCpeerConnection is closed
Media Section : An m= section is generated for each RtpTransceiver that has been added to the PeerConnection. For the initial offer since no ports are available yet , dummy port 9 can be sadded. However if it is bundle only then port value is set to 0. Later the port value will be set to the port value of default ICE candidate.
DTLS filed “UDP/TLS/RTP/SAVPF” is followed by the list of codecs in order of priority.
“c=” line in msection too must be filled with dummy values if IP 0.0.0.0 as no candidates are available yet .
For each media format on the m= line, “a=rtpmap” for “rtx” with the clock rate of codec and “a=fmtp” to reference the payload type of the primary codec. “a=rtcp-fb” specified RTCP feedback
When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processig is different due to gathered ICE candidates . However the <session-version> is not changed .
Additionally m section is updated if RtpTransceiver is added or removed
Each “m=” and c=” line MUST be filled in with the port, relevant RTP profile, and address of the default candidate for the m= section
If the m= section is not bundled into another m= section, update the “a=rtcp” with port and address of RTCP camdidate and add “a=camdidate” with “a=end-of-candidates”
Local Answer created by side receiving the session/ Callee
When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer.
Each offered m= section will have an associated RtpTransceiver
Remote Destination / Callee can reject the m section by setting port in m line to 0 . It can reject msection if neither of the offered media format are supported , RtpTransceiver is stoopped etc.
For the initial offer the dummy port value of 9 is set as no ICE candudate is avaible yet . Simillarly “c=” line must contain the “dummy” value “IN IP4 0.0.0.0” too.
The <proto> field MUST be set to exactly match the <proto> field for the corresponding m= line in the offer.
If the answer contains any “a=ice-options” attributes where “trickle” is listed as an attribute, update the PeerConnection canTrickle property to be true.
Modifying Offer/answer SDP
SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription. After calling setLocalDescription with an offer or answer, the application MAY modify the SDP to reduce its capabilities before sending it to the far side
Assume we have a MCU at location and want the video stream to relay via a Media Server.
SDP Parsing
SDP is used for session parsing and contians sequence of line with key value pairs. SDP is read, line-by-line, and converted to a data structure that contains the deserialized information.
Line “v=” , “o=”,”b=” and “a=” are processed . The “i=”, “u=”, “e=”, “p=”, “t=”, “r=”, “z=”, and “k=” lines are not used by this specification; they MUST be checked for syntax but their values are not used. Line “c=” is checked for syntax and ICE mismatch detection
“a= ” attribute could be : “a=group” , “s=”ice-lite” , “a=ice-pwd”, “a=ice-options” , “a=fingerprint”, “a=setup” , a=tls-id”, “a=identity” , “a=extmap”
Media Section Parsing
Line “m=” for media , proto , port , fmt in RTP
Attributes “a=” can be
“a=rtpmap” or “a=fmtp”
map from an RTP payload type number to a media encoding name that identifies the payload format.
dierction as “a=sendrecv” , a=recvonly , a=sendonly , a=inactive
Muxing as “a=rtcp-mux” ,
“a=rtcp-mux-only”
RTCP attributes “a=rtcp” , “a=rtcp-rsize”
Line “c=” is checked .
Line “b=” for bandiwtdh , bwtype
Attribites for “a=” could be “a=ice-ufrag”, “a=”ice-pwd”, “a=ice-options” , “a=candidate”, “a=remote-candidate” , a=end-of-candidates” and “a=fingerprint”
Semantics Verification
Interactive Connectivity Establishment (ICE) for NAT traversal
Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .
An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks. Checks done by STUN and TURN also allows for address selection for multi-homed and dual-stack hosts
ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.
ICE Gathering
Caller and callee performs checks to finalize the protocol and routing needed to establish a peer connection . Number of candudates are proposed till they mutually agree upon one . Peerconnection then uses that candiadte detaisl to initiate the connection .
While Applying a Local Description at the media engine level if m= section is new, WebRTC media stacks begins gathering candidates for it.
RTCPeerconnection specified canTrickleIceCandidates . ICE trickling is the process of continuing to send candidates after the initial offer or answer has already been sent to the other peer.
ICE TransportRole is responsible for Choosing a candidate pair
ICE layer sets one peer as controlling and other as controlled agent . The controling agent makes the final decision as to which candidate pair to choose.
An agent identifies all CANDIDATE whic is a transport address. Types:
HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP) if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
STUN or TURN to obtain additional candidates. Types
translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
addresses on TURN servers (RELAYED CANDIDATES)
Mapping Server Reflexive address
Agent sends the TURN Allocate request from IP address and port X:x, NAT will create a binding X1′:x1′, mapping this server reflexive candidate to the host candidate X:x ( BASE). Outgoing packets sent from the host candidate will be translated by the NAT to the server reflexive candidate. Incoming packets sent to the server reflexive candidate will be translated by the NAT to the host candidate and forwarded to the agent.
Allocate Request and response fom TURN – Informing the agent of this relayed candidate
Only STUN based Binding
agent sends a STUN Binding request to its STUN server which will get server reflexive candidate and send back Binding response.
STUN Binding request for connectivity checks on CANDIDATE PAIRS
The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.
PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT
Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.
Checks : ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first.
TRIGGERED CHECKS – accelerates the process of finding a valid candidate ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.
Checks ensure maintaining frozen candidates and pairs with some foundation for media stream
Each candidate pair in the check list has a foundation and a state. States for candidates pairs
1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.
2. In-Progress: A check has been sent for this pair, but the transaction is in progress.
3. Succeeded: A check for this pair was already done and produced a successful result.
4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.
5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.
selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. Ways regular nomination and aggressive nomination
TBD
To read More on WebRTC Communication as a platform
Web resources are usually build on request/response paradigm such as HTTP , SIP messages . This means that server responds only when a client requests it to. This made web intercations very slow and unsuited for VOIP signalling
Long Poll involved repeated polling checks to load new server resources by itself instead of client made explicit request
AJAX and multipart XHR tried to patch the problem by selective reloading however they still required that client perform the mapping for an incomig reply to map to correct request.
However due to overhead latency involved with HTTP transaction and its working mode to open new TCP connetion for every request and reponse and add HTTP headers, none of them were suited to realtime operations
Websocket is the current (2017) most idelistic solution to perform realtime sigalling suited to VOIP requirnments due to its nature os establish a socket .
Websocket Protocol
Enables two-way communication between a client running untrusted code in a controlled environment to a remote host that has opted-in to communications from that code.
protocol consists of an opening handshake followed by basic message framing, layered over TCP.
handshake is interpreted by HTTP servers as an Upgrade request.
Secure websocket example :
Request URL: wss://site.com:8084/socket.io/?transport=websocket&sid=hh3Dib_aBWgqyO1IAAEL
Request Method: GET
Status Code: 101 Switching Protocols
A new websocket can be opned with ws or wss and it can have sub protocols like in example .
var wsconnection = new WebSocket('wss://voipsever.com', ['soap', 'xmpp']);
It can be attached with event handlers
wsconnection.onopen = function () {
...
};
wsconnection.onerror = function (error) {
console.log('WebSocket Error ' + error);
};
wsconnection.onmessage = function (e) {
console.log('message received : ' + e.data);
};
Send Data on websocket
message string
wsconnection.send('Hi);
Blob or ArrayBuffer object to send binary data
Ex : Sending canvas ImageData as ArrayBuffer
var img = canvas_context.getImageData(0, 0, 400, 320);
var binary = new Uint8Array(img.data.length);
for (var i = 0; i < img.data.length; i++) {
binary[i] = img.data[i];
}
wsconnection.send(binary.buffer);
Ex : sending file as Blob
var file = document.querySelector('input[type="file"]').files[0];
wsconnection.send(file);
Closing the connection
if (socket.readyState === WebSocket.OPEN) {
socket.close();
}
Registry for Close codes for WS
1000 Normal Closure [IESG_HYBI] [RFC6455]
1001 Going Away [IESG_HYBI] [RFC6455]
1002 Protocol error [IESG_HYBI] [RFC6455]
1003 Unsupported Data [IESG_HYBI] [RFC6455]
1004 Reserved [IESG_HYBI] [RFC6455]
1005 No Status Rcvd [IESG_HYBI] [RFC6455]
1006 Abnormal Closure [IESG_HYBI] [RFC6455]
1007 Invalid frame payload data [IESG_HYBI] [RFC6455]
1008 Policy Violation [IESG_HYBI] [RFC6455]
1009 Message Too Big [IESG_HYBI] [RFC6455]
1010 Mandatory Ext. [IESG_HYBI] [RFC6455]
1011 Internal Error [IESG_HYBI] [RFC6455][RFC Errata 3227]
1012 Service Restart [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1013 Try Again Later [Alexey_Melnikov] [http://www.ietf.org/mail-archive/web/hybi/current/msg09670.html]
1014 The server was acting as a gateway or proxy and received an invalid response from the upstream server. This is similar to 502 HTTP Status Code. [Alexey_Melnikov] [https://www.ietf.org/mail-archive/web/hybi/current/msg10748.html]
1015 TLS handshake [IESG_HYBI] [RFC6455]
1016-3999 Unassigned
4000-4999 Reserved for Private Use [RFC6455]
rwpcp Reverse Web Process Control Protocol (RWPCP)
xmpp WebSocket Transport for the Extensible Messaging and Presence Protocol (XMPP) [RFC7395]
ship SHIP – Smart Home IP SHIP (Smart Home IP) is a an IP based approach to plug and play home automation and smart energy / energy efficiency, which can easily be extended to additional domains such as Ambient Assisted Living (AAL). SHIP can be used solely on the customer premises or can be integrated into a cloud based solution.
mielecloudconnect Miele Cloud Connect Protocol This protocol is used to securely connect household or professional appliances to an internet service portal via a public communication network in order to enable remote services.
v10.pcp.sap.com Push Channel Protocol
msrp WebSocket Transport for MSRP (Message Session Relay Protocol) [RFC7977]
bfcp WebSocket Transport for BFCP (Binary Floor Control Protocol)
sldp.softvelum.com Softvelum Low Delay Protocol SLDP is a low latency live streaming protocol for delivering media from servers to MSE-based browsers and WebSocket-enabled applications.
opcua+uacp OPC UA Connection Protocol
opcua+uajson OPC UA JSON Encoding
v1.swindon-lattice+json Swindon Web Server Protocol (JSON encoding)
v1.usp USP (Broadband Forum User Services Platform)
sqlnet.oracle.com sqlnet This protocol is used for communication between Oracle database client and database server, and its usage as subprotocol of websocket is primarly geared towards cloud deployments. sqlnet supports bi-directional data transfer and is full duplex in nature.
webrtc.softvelum.com Softvelum WebSocket signaling protocol WebRTC live streaming requires WebSocket-based signaling protocol for every specific implementation. Softvelum products will use this subprotocol for signaling
Wi‑Fi is a trademark of the Wi-Fi Alliance family of radio technologies commonly used for wireless local area networking (WLAN) of devices.
Current and older Wifi standards
standards operate on varying frequencies, deliver different bandwidth, and support different numbers of channels.
802.11a – transmits at 5 GHz frequency band of the radio spectrum with 54 megabits of data per second.
uses orthogonal frequency-division multiplexing (OFDM) which splits that radio signal into several sub-signals before they reach a receiver to reduces interference.
802.11b – transmits at 2.4 GHz with speed of 11 megabits of data per second,
uses complementary code keying (CCK) modulation to improve speeds.
802.11g – transmits at 2.4 GHz but faster upto 54 megabits of data per second.
uses OFDM coding
802.11n – speeds 140 megabits per second
backward compatible with a, b and g.
can transmit up to four streams of data, each at a maximum of 150 megabits per second, but most routers only allow for two or three streams.
802.11ac – n on the 2.4 GHz band and ac on the 5 GHz band.
backward compatible with 802.11n and thus others
450 megabits per second on a single stream
allows for transmission on multiple spatial streams upo 8
called 5G WiFi because of its frequency band
Very High Throughput (VHT)
Wifi 6
Wi-Fi CERTIFIED 6 networks enable lower battery consumption in devices, making it a solid choice for any environment, including smart home and Internet of Things (IoT) uses.
Wifi Compoents
wireless access point (AP) allows wireless devices to connect to the wireless network.
takes the bandwidth coming from a router and stretches it so that many devices can go on the network from farther distances away.
give useful data about the devices on the network, provide proactive security, and serve many other practical purposes.
Wireless routers are hardware devices that Internet service providers use to connect you to their cable or xDSL Internet network.
combines the networking functions of a wireless access point and a router.
mobile hotspot – feature on smartphones with both tethered and untethered connections
share your wireless network connection with other devices
Wifi performance
Wi-Fi operational range depends on factors such as the frequency band, radio power output, receiver sensitivity, antenna gain and antenna type as well as the modulation techniquea and propagation charestristics of the signal
Transmitter power
Compared to cell phones and similar technology, Wi-Fi transmitters are low power devices. In general, the maximum amount of power that a Wi-Fi device can transmit is limited by local regulations, such as FCC Part 15 in the US. Equivalent isotropically radiated power (EIRP) in the European Union is limited to 20 dBm (100 mW).
Antenna
An access point compliant with either 802.11b or 802.11g, using the stock omnidirectional antenna might have a range of 100 m
Security
Wired Equivalent Privacy WEP
client connects to a WEP-protected network, the WEP key is added to some data to create an “initialization vector”/ IV
WiFi Protected Access version 2 (WPA2)
successor to WEP and WPA
uses either TKIP or Advanced Encryption Standard (AES) encryption
WiFi Protected Setup (WPS)
ties a hard-coded PIN to the router for setup is vulnerabile for exploitation by hackers
WPA3™
Use the latest security methods , higher grader security protocls
Disallow outdated legacy protocols
Require use of Protected Management Frames (PMF)
increased protections from password guessing attempts
better password protection through Simultaneous Authentication of Equals (SAE), which replaces Pre-shared Key (PSK) in WPA2-Personal.
WPA3-Enterprise
192-bit minimum-strength security protocols and cryptographic tools
Authenticated encryption: 256-bit Galois/Counter Mode Protocol (GCMP-256)
Key derivation and confirmation: 384-bit Hashed Message Authentication Mode (HMAC) with Secure Hash Algorithm (HMAC-SHA384)
Key establishment and authentication: Elliptic Curve Diffie-Hellman (ECDH) exchange and Elliptic Curve Digital Signature Algorithm (ECDSA) using a 384-bit elliptic curve
Robust management frame protection: 256-bit Broadcast/Multicast Integrity Protocol Galois Message Authentication Code (BIP-GMAC-256)