Bandwidth are dependant on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.
An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.
A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.
(+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
(-) can introduce latency for packets that arrive early
(-) Need active resisizing by means of feedback
for hi speed and goog network jitterbuffer can ve small sized
for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
(-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.
Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.
using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.
Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.
a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
FIR
PIL
request a full key frame from the sender , when new memeber enters the session.
request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss
Congestion is created when a network path has reached its maximum limits which could be due to
failures(switches, routers, cables, fibres ..)
over subscription and operating at peak bandwidth.
broadcast storms
Inapt BGP routing and congestion detection
BGP is responsisble for finiding the shortest routable path for a packet
The direct consequences of congestion for any network transport can be
High Latency
Connection Timeouts
Low throughput
Packet loss
Queueing delay
With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.
A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.
Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems
Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.
On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
On the sender side TWCC(Transport wide congestion control) can be used.
Other congestion control algorithms
QUIC Loss Detection and Congestion Control RFC 9002
Coupled Congestion Control for RTP Media rfc8699
NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
Self-Clocked Rate Adaptation for Multimedia RMCAT WG
SCReAM – Mobile optimised congestion control algorithm by Ericson
High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.
Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.
Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.
Optimize Compute resource
mobile agents have lesser computative power
Camera with features such as auto focus or other adjustments will taker more time to cappture
network should be of suited bandwidth and strength
Reduce information to be encoded and sent
Subject focus and blurring backgroud
Filtering noise at source
Voice Activity Detection (VAD)
send extra data in FEC only is there is voice activity detected in packet
Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization
Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.
Features :
multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
Multistreaming transmit several independent streams of chunks in parallel
SCTP has similarities to TCP retransmission and partial reliability like UDP.
Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
Validation and acknowledgment mechanisms protect against flooding attack
SCTP frames data as datagrams and not as a byte stream
(+) SCTP enables WebRTC to be multiplexing
(+) It has flow control and congestion avoidance support
End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.
Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.
With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .
(+) easier to manage security by gathering ICE candidates for a single port only instead of 2
(+) increases the systesm capacity for media session using the same number of ports
(+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing
This post is about making performance enhancements to a WebRTC app so that they can be used in the area which requires sensitive data to be communicated, cannot afford downtime, fast response and low RTT, need to be secure enough to withstand and hacks and attacks.
As a communication agent become a single HTML page driven client, a lot of authentication, heartbeat sync, web workers, signalling event-driven flow management resides on the same page along with the actual CPU consumption for the audio-video resources and media streams processing. This in turn can make the webpage heavy and many a time could result in a crash due to being ” unresponsive”.
Here are some my best to-dos for making sure the webrtc communication client page runs efficiently
CLS metrics measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.
To have a good user interactionn experiences, the DOM elements should display as less movement as possible so that page appears stable . In the opposite case for a flickering page ( maybe due to notification DOM dynamically pushing the other layout elements ) it is difficult to precisely interact with the page elements such as buttons .
The main thread is where a browser processes runs all the JavaScript in your page, as well as to perform layout, reflows, and garbage collection. therefore long js processes can block the thread and make the page unresponsive.
Unoptimized JS code takes longer to execute and impacts network , parse-compileand memory cost.
If your JavaScript holds on to a lot of references, it can potentially consume a lot of memory. Pages appear janky or slow when they consume a lot of memory. Memory leaks can cause your page to freeze up completely.
Some effective tips to spedding up JS execution include
Cross-site request forgery (CSRF) attacks rely on the fact that cookies are attached to any request to a given origin, no matter who initiates the request.
While adding cookies we must ensure that if SameSite =None , the cookies must be secure
SameSite to Strict, your cookie will only be sent in a first-party context. In user terms, the cookie will only be sent if the site for the cookie matches the site currently shown in the browser’s URL bar.
Set-Cookie: promo_shown=1; SameSite=Strict
You can test this behavior as of Chrome 76 by enabling chrome://flags/#cookies-without-same-site-must-be-secure and from Firefox 69 in about:config by setting network.cookie.sameSite.noneRequiresSecure.
Key Performance Indicators (KPIs) are used to evaluate the performance of a website . It is crticial that a webrtc web page must be light weight to acocmodate the signalling control stack javscript libs to be used for offer answer handling and communicating with the signaller on open sockets or long polling mechnism .
Lighthouse tab in chrome developer tools shows relavnat areas of imporevemnt on the webpage from performmace , Accesibility , Best Practices , Search Engine optimization and progressive Web App
Page attributes under Chrome developers control depicts the page load and redering time for every element includeing scripts and markup. Specifically it has
Time to Title
Time to render
Time to inetract
Networking attributes to be cofigured based on DNS mapping and host provider. These Can be evalutaed based on chrome developer tool reports
Other page interaction crtiteria includes the frames their inetraction and timings for the same.
In the screenhosta ttcjed see the loading tasks which basically depcits the delay by dom elements under transitions owing to user interaction . This ideally should be minimum to keep the page responsive.
The above functions ( old and new ) estimates the memory usage of the entire web page
these calls can be used to correlate new JS code with the impact on memery and subsewuntly find if there are any memeory leaks. Can also use these memery metrics to do A/B testing .
Loading assests over CDN , minfying sripts and reducing over all weight of the page are good ways to keep the page light and active and prevent any chrome tab crashes.
The non critical compoenents could then be loaded on async .
Lazy load must be used for large files like js paylaods which are costly to load. To send a smaller JavaScript payload that contains only the code needed when a user initially loads your application, split the entire bundle and lazy load chunks on demand.
Describes the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server
what ICE candidates are gathered to support non-multiplexed RTCP.
negotiate – Gather ICE candidates for both RTP and RTCP candidates. If the remote-endpoint is capable of multiplexing RTCP, multiplex RTCP on the RTP candidates. If it is not, use both the RTP and RTCP candidates separately.
require – Gather ICE candidates only for RTP and multiplex RTCP on the RTP candidates. If the remote endpoint is not capable of rtcp-mux, session negotiation will fail.
If the value of configuration.rtcpMuxPolicy is set and its value differs from the connection’s rtcpMux policy, throw an InvalidModificationError. If the value is “negotiate” and the user agent does not implement non-muxed RTCP, throw a NotSupportedError.
An RTCPeerConnection object has a signaling state, a connection state, an ICE gathering state, and an ICE connection state.
An RTCPeerConnection object has an operations chain which ensures that only one asynchronous operation in the chain executes concurrently.
Also an RTCPeerConnection object MUST not be garbage collected as long as any event can cause an event handler to be triggered on the object. When the object’s internal slot is true ie closed, no such event handler can be triggered and it is therefore safe to garbage collect the object.
generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including
descriptions of the local MediaStreamTracks attached to this RTCPeerConnection,
codec/RTP/RTCP capabilities
ICE agent (usernameFragment, password , local candiadtes etc )
DTLS connection
const pc = new RTCPeerConnection();
pc.createOffer()
.then(desc => pc.setLocalDescription(desc));
With more attributes
var pc = new RTCPeerConnection();
pc.createOffer({
mandatory: {
OfferToReceiveAudio: true,
OfferToReceiveVideo: true
},
optional: [{
VoiceActivityDetection: false
}]
}).then(function(offer) {
return pc.setLocalDescription(offer);
})
.then(function() {
// Send the offer to the remote through signaling server
})
.catch(handleError);
generates an SDPanswer with the supported configuration for the session that is compatible with the parameters in the remote configuration
var pc = new RTCPeerConnection();
pc.createAnswer({
OfferToReceiveAudio: true
OfferToReceiveVideo: true
})
.then(function(answer) {
return pc.setLocalDescription(answer);
})
.then(function() {
// Send the answer to the remote through signaling server
})
.catch(handleError);
Codec preferences of an m= section’s associated transceiver is said to be the value of the RTCRtpTranceiver with the following filtering applied
If direction is “sendrecv”, exclude any codecs not included in the intersection of RTCRtpSender.getCapabilities(kind).codecs and RTCRtpReceiver.getCapabilities(kind).codecs.
If direction is “sendonly”, exclude any codecs not included in RTCRtpSender.getCapabilities(kind).codecs.
If direction is “recvonly”, exclude any codecs not included in RTCRtpReceiver.getCapabilities(kind).codecs.
Send and receive MediaStreamTracks over a peer-to-peer connection. Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.
RTCRtpTransceivers interface describes a permanent pairing of an RTCRtpSender and an RTCRtpReceiver. Each transceiver is uniquely identified using its mid ( media id) property from the corresponding m-line.
They are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack(), or explicitly when the application uses the addTransceiver(). They are also created when a remote description is applied that includes a new media description.
dictionary RTCRtpCodecParameters {
required octet payloadType;
required DOMString mimeType;
required unsigned long clockRate;
unsigned short channels;
DOMString sdpFmtpLine;
};
payloadType – identify this codec. mimeType – codec MIME media type/subtype. Valid media types and subtypes are listed in [IANA-RTP-2] clockRate – expressed in Hertz channels – number of channels (mono=1, stereo=2). sdpFmtpLine – “format specific parameters” field from the “a=fmtp” line in the SDP corresponding to the codec
voiceActivityFlag of type boolean – Only present for audio receivers. Whether the last RTP packet, delivered from this source, contains voice activity (true) or not (false).
RTCRtpTransceiver Interface
Each SDP media section describes one bidirectional SRTP (“Secure Real Time Protocol”) stream. RTCRtpTransceiver describes this permanent pairing of an RTCRtpSender and an RTCRtpReceiver, along with some shared state. It is uniquely identified using its mid property.
Thus it is combination of an RTCRtpSender and an RTCRtpReceiver that share a common mid. An associated transceiver( with mid) is one that’s represented in the last applied session description.
Method stop() – Irreversibly marks the transceiver as stopping, unless it is already stopped. This will immediately cause the transceiver’s sender to no longer send, and its receiver to no longer receive. stopping transceiver will cause future calls to createOffer to generate a zero port in the media description for the corresponding transceiver and stopped transceiver will cause future calls to createOffer or createAnswer to generate a zero port in the media description for the corresponding transceiver
Access to information about the Datagram Transport Layer Security (DTLS) transport over which RTP and RTCP packets are sent and received by RTCRtpSender and RTCRtpReceiver objects, as well other data such as SCTP packets sent and received by data channels. Each RTCDtlsTransport object represents the DTLS transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].
Protocols multiplexed with RTP (e.g. data channel) share its component ID. This represents the component-id value 1 when encoded in candidate-attribute while ICE candadte for RTCP has component-id value 2 when encoded in candidate-attribute.
This interface candidate Internet Connectivity Establishment (ICE) configuration used to setup RTCPeerconnection. To facilitate routing of media on given peer connection, both endpoints exchange several candidates and then one candidate out of the lot is chosen which will be then used to initiate the connection.
const pc = new RTCPeerConnection();
pc.addIceCandidate({candidate:''});
candidate – transport address for the candidate that can be used for connectivity checks.
component – candidate is an RTP or an RTCP candidate
foundation – unique identifier that is the same for any candidates of the same type , helps optimize ICE performance while prioritizing and correlating candidates that appear on multiple RTCIceTransport objects.
ip , port
priority
protocol – tcp/udp
relatedAddress , relatedPort
sdpMid – candidate’s media stream identification tag
sdpMLineIndex
usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).
RTCIceCredentialType Enum : supports OAuth 2.0 based authentication. The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection setConfiguration method to periodically refresh the TURN credentials.
ICE candidate policy [JSEP] to select candidates for the ICE connectivity checks
relay – use only media relay candidates such as candidates passing through a TURN server. It prevents the remote endpoint/unknown caller from learning the user’s IP addresses
all – ICE Agent can use any type of candidate when this value is specified.
RTCBundlePolicy Enum
balanced – Gather ICE candidates for each media type (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
max-compat – Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
max-bundle – Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.
If the value of configuration.bundlePolicy is set and its value differs from the connection’s bundle policy, throw an InvalidModificationError.
Interfaces for Connectivity Establishment
describes ICE candidates
interface RTCIceCandidate {
DOMString candidate;
DOMString sdpMid;
unsigned short sdpMLineIndex;
DOMString foundation;
RTCIceComponent component;
unsigned long priority;
DOMString address;
RTCIceProtocol protocol;
unsigned short port;
RTCIceCandidateType type;
RTCIceTcpCandidateType tcpType;
DOMString relatedAddress;
unsigned short relatedPort;
DOMString usernameFragment;
RTCIceCandidateInit toJSON();
};
RTCIceProtocol can be either tcp or udp
TCP candidate type which can be either of
active – An active TCP candidate is one for which the transport will attempt to open an outbound connection but will not receive incoming connection requests.
passive – A passive TCP candidate is one for which the transport will receive incoming connection attempts but not attempt a connection.
so – An so candidate is one for which the transport will attempt to open a connection simultaneously with its peer.
UDP candidate type
host – actual direct IP address of the remote peer
srflx – server reflexive , generated by a STUN/TURN server
prflx – peer reflexive ,IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE
usernameFragment – randomly-generated username fragment (“ice-ufrag”) which ICE uses for message integrity along with a randomly-generated password (“ice-pwd”).
Access to information about the ICE transport over which packets are sent and received. Each RTCIceTransport object represents the ICE transport layer for the RTP or RTCP component of a specific RTCRtpTransceiver, or a group of RTCRtpTransceivers if such a group has been negotiated via [BUNDLE].
With SCTP, the protocol used by WebRTC data channels, reliable and ordered data delivery is on by default.
Sending large files
Split data channel message in chunks
var CHUNK_LEN = 64000; // 64 Kb
var img = photoContext.getImageData(0, 0, photoContextW, photoContextH),
len = img.data.byteLength,
n = len / CHUNK_LEN | 0;
for (var i = 0; i < n; i++) {
var start = i * CHUNK_LEN, end = (i + 1) * CHUNK_LEN;
dataChannel.send(img.data.subarray(start, end));
}
// last chunk
if (len % CHUNK_LEN) {
dataChannel.send(img.data.subarray(n * CHUNK_LEN));
}
The browser maintains a set of statistics for monitored objects, in the form of stats objects. A group of related objects may be referenced by a selector( like MediaStreamTrack that is sent or received by the RTCPeerConnection).
Statistics API extends the RTCPeerConnection interface
This article is aimed at explaining the intricacies and detailed offer answer flow in webrtc handshake and JSEP. You can read the following articles on WebRTC as a prereq before reading through this one. WebRTC has API s namely – Peerconnection , getUserMedia , Datachannel and getStats.
JSEP is used during signalling via w3c’s recommended RTCPeerConnectionAPI interface to set up a multimedia session. The multimedia session description specifies the critical components of setting up a session between local and remote such as transport ports, protocol, profiles. It also handles the interaction with the ICE state machine.
prereq : Setup Client side for the caller PeerConnectionFactory to generate PeerConnections PeerConnection for every connection to remote peer MediaStream audio and video from client device
Side initiating the session creates a offer by CreateOffer() API
As the caller initiates a new RTCPeerConnection() , the RTCSignalingState state is “stable” as remote and local descriptions are empty
As the caller initiates call and calls createOffer() , he now has offer SDP and procced to store offer locally with setLocalDescription(offer) the RTCSignalingState state is “have-local-offer” . After than caller send the offer to callee over signalling channel
Simillarily as the calle recives the offer, it starts with RTCSignalingState stable and then proceeds to store the Remote’s offer using setRemoteDescription(offer), its state is now “have-remote-offer”
The callee generates a provsional answer and for caller and stores it locally , state transitiosn to “have-local-pranswer“. The pranswer SDP is send to caller over signalling channel again .
Caller stores the callee’s pr answer SDP and state updates to “have-remote-pranswer”
Media Section : An m= section is generated for each RtpTransceiver that has been added to the PeerConnection. For the initial offer since no ports are available yet , dummy port 9 can be sadded. However if it is bundle only then port value is set to 0. Later the port value will be set to the port value of default ICE candidate.
DTLS filed “UDP/TLS/RTP/SAVPF” is followed by the list of codecs in order of priority.
“c=” line in msection too must be filled with dummy values if IP 0.0.0.0 as no candidates are available yet .
For each media format on the m= line, “a=rtpmap” for “rtx” with the clock rate of codec and “a=fmtp” to reference the payload type of the primary codec. “a=rtcp-fb” specified RTCP feedback
When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processig is different due to gathered ICE candidates . However the <session-version> is not changed .
Additionally m section is updated if RtpTransceiver is added or removed
Each “m=” and c=” line MUST be filled in with the port, relevant RTP profile, and address of the default candidate for the m= section
If the m= section is not bundled into another m= section, update the “a=rtcp” with port and address of RTCP camdidate and add “a=camdidate” with “a=end-of-candidates”
Local Answer created by side receiving the session/ Callee
When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer.
Each offered m= section will have an associated RtpTransceiver
Remote Destination / Callee can reject the m section by setting port in m line to 0 . It can reject msection if neither of the offered media format are supported , RtpTransceiver is stoopped etc.
For the initial offer the dummy port value of 9 is set as no ICE candudate is avaible yet. Simillarly “c=” line must contain the “dummy” value “IN IP4 0.0.0.0” too.
The <proto> field MUST be set to exactly match the <proto> field for the corresponding m= line in the offer.
If the answer contains any “a=ice-options” attributes where “trickle” is listed as an attribute, update the PeerConnection canTrickle property to be true.
SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription. After calling setLocalDescription with an offer or answer, the application MAY modify the SDP to reduce its capabilities before sending it to the far side.
Assume we have a MCU at location and want the video stream to relay via a Media Server.
SDP is used for session parsing and contians sequence of line with key value pairs. SDP is read, line-by-line, and converted to a data structure that contains the deserialized information.
Line “v=” , “o=”,”b=” and “a=” are processed . The “i=”, “u=”, “e=”, “p=”, “t=”, “r=”, “z=”, and “k=” lines are not used by this specification; they MUST be checked for syntax but their values are not used. Line “c=” is checked for syntax and ICE mismatch detection
“a= ” attribute could be : “a=group” , “s=”ice-lite” , “a=ice-pwd”, “a=ice-options” , “a=fingerprint”, “a=setup” , a=tls-id”, “a=identity” , “a=extmap”
Media Section Parsing
Line “m=” for media , proto , port , fmt in RTP
Attributes “a=” can be :
“a=rtpmap” or “a=fmtp” : map from an RTP payload type number to a media encoding name that identifies the payload format.
Packetization parameters as “a=ptime” , “a=maxptime” which define the length of each RTP packet.
Direction as “a=sendrecv” , a=recvonly , a=sendonly , a=inactive“
Muxing as “a=rtcp-mux” , “a=rtcp-mux-only”
RTCP attributes “a=rtcp” , “a=rtcp-rsize”
Line “c=” is checked.
Line “b=” for bandiwtdh , bwtype
Attribites for “a=” could be “a=ice-ufrag”, “a=”ice-pwd”, “a=ice-options” , “a=candidate”, “a=remote-candidate” , a=end-of-candidates” and “a=fingerprint”
Protocols using offer/answer are difficult to operate through Network Address Translators (NATs) since flow of media packets require IP addresses and ports of media sources and sinks within their messages. Also realtime media emphasises on reduced latency and decreased packet loss .
An extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks. Checks done by STUN and TURN, also allows for address selection for multi-homed and dual-stack hosts
ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. Then it systematically tries all possible pairs (in a carefully sorted order) until it finds one or more that work.
Caller and callee performs checks to finalize the protocol and routing needed to establish a peer connection . Number of candudates are proposed till they mutually agree upon one . Peerconnection then uses that candiadte detaisl to initiate the connection .
While Applying a Local Description at the media engine level if m= section is new, WebRTC media stacks begins gathering candidates for it.
RTCPeerconnection specified canTrickleIceCandidates. ICE trickling is the process of continuing to send candidates after the initial offer or answer has already been sent to the other peer.
ICE TransportRole is responsible for Choosing a candidate pair.
ICE layer sets one peer as controlling and other as controlled agent. The controling agent makes the final decision as to which candidate pair to choose.
An agent identifies all CANDIDATE whic is a transport address. Types:
HOST CANDIDATE – directly from a local interface which could be Wifi, Virtual Private Network (VPN) or Mobile IP (MIP) if an agent is multihomed ( private and public networks) , it obtains a candidate from each IP address and includes all candidates in its offer.
STUN or TURN to obtain additional candidates. Types
translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
The candidates are carried in attributes in the SDP offer . The remote peer also follows this process and gather and send lits own sorted list of candidates. Hence CANDIDATE PAIRS from both sides are formed.
PEER REFLEXIVE CANDIDATES – connectivity checks can produce aditional candidates espceialy around symmetric NAT
Since the same address is used for STUN. and media ( RTP/RTCP) Demultiplexing based on packet contents helps to identify which one is which.
Checks : ICE checks are performed in a specific sequence, so that high-priority candidate pairs are checked first.
TRIGGERED CHECKS – accelerates the process of finding a valid candidate
ORDINARY CHECKS – agent works through ordered prioritised check list by sending a STUN request for the next candidate pair on the list periodically.
Checks ensure maintaining frozen candidates and pairs with some foundation for media stream. Each candidate pair in the check list has a foundation and a state. States for candidates pairs
1.Waiting: A check has not been performed for this pair, and can be performed as soon as it is the highest-priority Waiting pair onthe check list.
2. In-Progress: A check has been sent for this pair, but the transaction is in progress.
3. Succeeded: A check for this pair was already done and produced a successful result.
4. Failed: A check for this pair was already done and failed, either never producing any response or producing an unrecoverable failure response.
5. Frozen: A check for this pair hasn’t been performed, and it can’t yet be performed until some other check succeeds, allowing this pair to unfreeze and move into the Waiting state.
Selecting low-latency media paths can use various techniques such as actual round-trip time (RTT) measurement. Controlling agent gets to nominate which candidate pairs will get used for media amongst the ones that are valid. There are 2 ways : regular nomination and aggressive nomination.
This post describes the requirement of creating a SIP phone application on android over the same codecs as WebRTC ( PCMA , PCMU , VP8) . In my project concerning the demonstration of WebRTC inter operability ( presence , audio / video call , message ) with a native android client , I had to develop a lightweight Android SIP application , customized for the look and feel of the webrtc web application . This also enables the added services to WebRTC client such as geolocation , visual voice mail , phonebook , call control options be set from android application as well .
Aim :
Android webrtc- sip client development , using sipml5 stack implemented through web services and native android programming .
Software Used:
⦁ Eclipse IDE ⦁ Java SE Development Kit 7.0 ⦁ Android SDK
Tasks :
⦁ Authorization of a user, based on his/her credentials (Database local to the application).
⦁ Navigation Drawer on the home page which shows a menu giving the user various options like: ⦁ View Home Page ⦁ View Contact List ⦁ View/Edit My Profile ⦁ View My Location ⦁ Sign Out
⦁ Phonebook sync : Importing contact list of the Android Phone into the application. Editing user profile with values like User Name , Password , Domain.
⦁ Inclusion of a Web View in the application which currently opens the desired webpage(http://sipml5.org/call.htm).
⦁ Geolocation: Showing marker for the current location of user in Google Maps.Displaying the address of the user in a Toast Message.
⦁ Audio / Video call capability
figure 1 : Login page , figure 2 : Call page , Figure 3 : Menu bar
Future Roadmap:
⦁ Connecting the application to a database which sits on the cloud. ⦁ Based on the entries in the database the user will be able to: ⦁ Login to the application. ⦁ View or edit his/her details in the My Profile Section. ⦁ Understanding codes of sample applications for making SIP calls from Android OS like: ⦁ SipDroid ⦁ SipDemo ⦁ IMSDroid ⦁ Modifying the existing application to be able to make SIP calls like one of the apps listed above.
Modules :
Development Done:
Development of an authorization page connecting the application to a local database from where values are inserted and retrieved.
Development of navigation drawer where additional options for the application will be displayed making it a user friendly application.
Development Planned:
1.Connectivity to a cloud database.
2. App engine on cloud.
3. Importing contacts from phone address book .
4. Offine storage of profile details and few call logs .
A lot of service providers ie telecom operators had deduced their own ways to provide Web based communication even before WebRTC was born . With time , as WebRTC has become stronger , more secure , resilient to failure they have come around to migrate their existing system from previous closed box native APIs to opensource WebRTC APIs.
The first figure ( given below ) depicts a communication platform build over plugins and proprietary APIs using HTTP REST based signaling .
Web Communication Service Architecture over HTTP/ REST API
As the migration took place the proprietary API components were replaced by Open standard based entities such as plugins were replaced by WebRTC APIs, HTTP REST based signalling was replaced by SIP ( Session Initiation Protocol ) .
Web Communication Service Architecture over WebRTC SIP
Note telecom operator network did not had to face transformation by integration of WebRTC elements .
WebRTC (Web Real-Time Communication) is an API definition drafted by the World Wide Web Consortium (W3C) that supports browser-to-browser applications for voice calling, video chat, and P2P file sharing without the need for either internal or external plugins.
Enables browser to browser media streaming over secure RTP profile
Standardization, on an API level at the W3C and at the protocol level at the IETF.
Enables web browsers with Real-Time Communications (RTC) capabilities
written in c++ and javascript
BSD style license
free, open project available in all major browsers
VideoEngine is a framework video media chain for video, from camera to the network, and from network to the screen.
VP8 : Video codec from the WebM Project. Designed for low latency Real time Comm.
Video Jitter Buffer: conceal the effects of jitter and packet loss on overall video quality.
Image enhancements : removes video noise
Transport
Transport / Session Layer of WebRTC stack provide Session Management for WebRTC media streams .
It consists of network stack for Secure RTP, the Real Time Protocol.
STUN/ICE for NAT , Network Address Traversal across various types of networks.
Session Management which is an abstracted session layer for call setup.
Standardization by IETF and W3C
As of the 2019 update the W3C defines it as
a set of ECMAScript APIs in WebIDL to allow media to be sent to and received from another browser or device implementing the appropriate set of real-time protocols. The specification being developed in conjunction with a protocol specification developed by the IETF RTCWEB group and an API specification to get access to local media devices.
W3C contribution to WebRTC standardization
Media Stream Functions : API for connecting processing functions to media devices and network connections, including media manipulation functions.
Audio Stream Functions : An extension of the Media Stream Functions to process audio streams (e.g. automatic gain control, mute functions and echo cancellation).
Video Stream Functions : An extension of the Media Stream Functions to process video streams (e.g. bandwidth limiting, image manipulation or “video mute“).
Functional Component : API to query presence of WebRTC components in an implementation, instantiate them and connect them to media streams.
P2P Connection Functions : API functions to support establishing signalling protocol-agnostic peer-to-peer connections between Web browsers
API specification Availability
WebRTC 1.0: Real-time Communication Between Browsers – Draft 3 June 2013 available
Implementation Library: WebRTC Native APIs
Media Capture and Streams – Draft 16 May 2013
Supported by Chrome , Firefox, Opera in desktop of all OS ( Linux, Windows , Mac )
Supported by Chrome , Firefox in Mobile browsers ( android )
IETF contribution to to WebRTC standardization
Communication model
Security model
Firewall and NAT traversal
Media functions
Functionality such as media codecs, security algorithms, etc.,
Media formats
Transport of non media data between clients
Input to W3C for APIs development
Interworking with legacy VoIP equipment
Open and Free Codecs
Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session . The list codecs are sent between each other as part of offeer and answer or SDP in SIP.
WebRTC uses bare MediaStreamTrack objects for each track being shared from one peer to another. Codecs associated in those tracks is not mandated by webrtc soecification.
For video as per RFC 7742 WebRTC Video Processing and Codec Requirements , the manadatory codesc to be supported by webrtc clients are : VP8 and H.264‘s Constrained Baseline profile.
For Audio as per RFC 7874 WebRTC Audio Codec and Processing Requirements, browser must support Opus codec as well as G.711‘s PCMA and PCMU formats.
Unless the SDP specifically signals otherwise, the web browser receiving a WebRTC video stream must be able to handle video at at least 20 FPS at a minimum resolution of 320 pixels wide by 240 pixels tall.
In the best scenarios ( avaible bandwidth and media devices ) VP8 had no upper mark set on resolution of vdieo stream hence the stream can even go asfar as maximum resolution of 16384×16384 pixels.
Webrtc does not specify any signalling / telecommunication protocl and it is upto the adoptor to perform ofeer/answer exchaneg in any way deemed fit for the usecase . For ex maple for a web only application on may use only plain websockets, whereas for a teelcom endpoints compatible app one should SIP as the protocol.
NAT-traversal ( ICE, STUN, and TURN)
The post describe ICE (Interactive Connectivity Establishment ) framework which is mandatory by WebRTC standards. It is find network interfaces and ports in Offer / Answer Model to exchange network based information with participating communication clients. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN). I have written in detail about TURN based WebRTC flow diagrams in post below.
Learn about hosting / integrating different TURN servers for WebRTC in the article on “TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys “.
WebRTC video quality is noticeably better than Flash.
(+) Up to 6x faster connection times
Using JavaScript WebSockets, also an HTML5 standard, improves session connection times and accelerates delivery of other OpenTok events.
(+) Reduced audio/video latency
WebRTC offers significant improvements in latency through WebRTC, enabling more natural and effortless conversations.
(+) Freedom from plugins like Flash
With WebRTC and JavaScript WebSockets, you no longer need to rely on Flash for browser-based RTC.
(+) Native HTML5 elements
Customize the look and feel and work with video like you would any other element on a web page with the new video tag in HTML5.
The major players behind the conception and advancement of WebRTC standards and libraries are IETF, W3C, Java community, GSMA. The idea is to develop a Lightweight browser-based call console, to make SIP calls from a Web page. This was successfully achieved using fundamental technologies – Javascript, html5, web-sockets and TCP /UDP, open-source sip server. It is good to note that there is no extra extension, plugin or gateway required, such as flash support. Also, it bears cross-platform support, including Mozilla, chrome so on.
Bottlnecks
Although WebRTC is a great technology and holds very good potential it is not devoid of problems
(-) Secure networks and Firewalls block RTP (-) Security in VPN and topology hiding (-) Cross-platform concerns and codecs incompatible (-) Late adopters like Microsoft and Apple
WebRTC forms a p2p communication channel between all the peers . that means as the participant count grows , it converts to a mesh networking topology with incoming and outgoing stream towards direction of each of its peers .
Two party call p2p
Peer to peer calling
p2p call
Multiparty Call and mesh network
Mesh based arrangement .
Mesh based webrtc video confeerncing
In special case of broadcasting or large number of viewers ( without outgoing media stream ) it is recommended to setup a Media Control Unit ( MCU) which will replay the incoming stream to large number of users without putting traffic load on the clients from where the stream is actually originating . Important note :
It should be noted that these diagrams do not depict the ICE and NAT traversal and have been simplified for better understanding. In real-world scenarios, almost all the time STUN and TURN servers are involved.
Also, the webrtc mandates the use of secure origin ( HTTPS ) on the webpage which invoke getusermedia to capture user media devices like audio, video and location.
navigator.mediaDevices.getUserMedia({ audio: true, video: true })
.then(function(stream) {
var video = document.querySelector('video');
// Older browsers may not have srcObject
if ("srcObject" in video) {
video.srcObject = stream;
} else {
// Avoid using this in new browsers, as it is going away.
video.src = window.URL.createObjectURL(stream);
}
video.onloadedmetadata = function(e) {
video.play();
};
})
.catch(function(err) {
console.log(err.name + ": " + err.message);
});
DOMException Error on getusermedia
Rejections of the returned promise are made by passing a DOMException error object to the promise’s failure handler. Possible errors are:
AbortError : Although the user and operating system both granted access to the hardware device, problem occurred which prevented the device from being used.
NotAllowedError : One or more of the requested source devices cannot be used at this time. This will happen if the browsing context is insecure( http instead of https) or if the user has specified that the current browsing instance /sessionis not permitted access to the device or has denied all access to user media devices globally.
NotFoundError : No media tracks of the type specified were found that satisfy the given constraints.
NotReadableError : Although the user granted permission to use the matching devices, a hardware error occurred at the operating system, browser, or Web page level which prevented access to the device.
OverconstrainedError : no candidate devices which met the criteria requested. String value is the name of a constraint which was not meet, and a message property containing a human-readable string explaining the problem. Exmaple conatraints :
SecurityError : User media support is disabled on the Document on which getUserMedia() was called.
TypeError : The list of constraints specified is empty, or has all constraints set to false.
Pan/Tilt/Zoom camera controls
RTCPeerConnection
enables audio and video communication between peers. It performs signal processing, codec handling, peer-to-peer communication, security, and bandwidth management.
stable : There is no offer/answer exchange in progress. This is also the initial state, in which case the local and remote descriptions are empty.
have-local-offer : Local description, of type “offer”, has been successfully applied.
have-remote-offer : Remote description, of type “offer”, has been successfully applied.
have-local-pranswer : Remote description of type “offer” has been successfully applied and a local description of type “pranswer” has been successfully applied.
have-remote-pranswer : Local description of type “offer” has been successfully applied and a remote description of type “pranswer” has been successfully applied. closed The RTCPeerConnection has been closed; its [[IsClosed]] slot is true.
RTCSDPType
offer : SDP offer.
pranswer : RTCSdpType of pranswer indicates that a description MUST be treated as an [SDP] answer, but not a final answer.
answer : treated as an [SDP] final answer, and the offer-answer exchange MUST be considered complete. A description used as an SDP answer may be applied as a response to an SDP offer or as an update to a previously sent SDP pranswer.
rollback : canceling the current SDP negotiation and moving the SDP [SDP] offer back to what it was in the previous stable state.
RTCPeerConfiguration
Defines a set of parameters to configure how the peer-to-peer communication established via RTCPeerConnection
iceServers of type sequence : array of objects describing servers available to be used by ICE, such as STUN and TURN servers.
iceTransportPolicy of type RTCIceTransportPolicy : bundle policy affects which media tracks are negotiated if the remote endpoint is not bundle-aware, and what ICE candidates are gathered. If the remote endpoint is bundle-aware, all media tracks and data channels are bundled onto the same transport.
relay : ICE Agent uses only media relay candidates such as candidates passing through a TURN server.
all : The ICE Agent can use any type of candidate when this value is specified.
bundlePolicy of type RTCBundlePolicy. media-bundling policy to use when gathering ICE candidates. Types :
balanced : Gather ICE candidates for each media type in use (audio, video, and data). If the remote endpoint is not bundle-aware, negotiate only one audio and video track on separate transports.
max-compat : Gather ICE candidates for each track. If the remote endpoint is not bundle-aware, negotiate all media tracks on separate transports.
max-bundle : Gather ICE candidates for only one track. If the remote endpoint is not bundle-aware, negotiate only one media track.
rtcpMuxPolicy of type RTCRtcpMuxPolicy. rtcp-mux policy to use when gathering ICE candidates.
certificates of type sequence A set of certificates that the RTCPeerConnection uses to authenticate.
iceCandidatePoolSize of type octet, defaulting to 0 Size of the prefetched ICE pool as defined in [JSEP]
RTCDataChannel
Allows bidirectional communication of arbitrary data between peers. It uses the same API as WebSockets and has very low latency.
(+) DataChannel is p2p and is also ened to end encrypted leader to higher privacy
(+) build in security due to p2p transfer
(+) high throughput than text transfer via a messaging server
(+) lower latency as p2p transfer takes shortest route
getStats
allows the web application to retrieve a set of statistics about WebRTC sessions. These statistics data are being described in a separate W3C document.
Basics for building a WebRTC based communication solution :-
Websockets for signalling / Offer Answer
TURN server like xirsys(paid), CoTURN(opensource , self hosted)
Js library for WebRTC wrappers
Https served webpage
WebRTC enabled Browser
Approaches to develop webrtc unified communication system
1. Pluggable module or npm
Source code for the WebRTC project is shipped as a pluggable library or npm module.
2. collaboration as a Service ie CaaS
Clients redirect users to our WebRTC platform for communication.
3. Communication Platform
We provider all communication and related Services as a standalone platform
Updates in W3C 13 Dec , 2019
Over the years since its adoption many of the associated tech were depricated from the Webrtc based platforms and enviornments , some of which are: OAuth as a credential method for ICE servers Negotiated RTCRtcpMuxPolicy (previously marked at risk) voiceActivityDetection RTCCertificate.getSupportedAlgorithms() RTCRtpEncodingParameters: ptime, maxFrameRate, codecPayloadType, dtx, degradationPreference RTCRtpDecodingParameters: encodings RTCDatachannel.priority
Some of the newly added features include:
restartIce() method added to RTCPeerConnection Introduced the concept of “perfect negotiation”, with an example to solve signalling races. Implicit rollback in setRemoteDescription to solve races. Implicit offer/answer creation in setLocalDescription to solve races.
References :
[1] WebRTC 1.0: Real-time Communication Between Browsers – W3C Candidate Recommendation 13 December 2019 https://www.w3.org/TR/webrtc/
WebRTC stands for Web Real-Time Communications and introduces a real-time media framework in the browser core alongside associated JavaScript APIs for controlling the media frame and HTML5 tags for displaying. If you are new to WebRTC, read “What is WebRTC?” From a technical point of view, WebRTC will hide all the complexity of real-time media behind a very simple JavaScript API.
Webrtc is a media framework which is independant of signalling protocol which means that we can plug any form of signalling to support session establishment using offer-answer handshake and SDP. Some of the popular options
Polling
XHR ( XML over HTTP Request)
Websocket ( HTTP upgraded )
SSE ( Server Sent Events )
socket.io ( use set of protocols for best compatibility and fallback)
HTTP/2
Other form of less used signalling options
FTP
HTTP
long poll
XMPP
MQTT
One may also send the SDP for local and remote over any other means of communication mechanism such as email, REST API or any custom propriatory protocol.
SSL is the secure session layer which adds encryption capability to an otherwise readable packet.
DTLS (Datagram TLS) adds Security on UDP packets which is used by Media stream and Data Channel messages.
TLS ( Tansport Layer Security) adds security to TCP messahes used in signalling such as SDP based offer answer handshake which enables setup, modification or breakdown of the session.
WebRTC offers web application developers the ability to write rich, realtime multimedia applications (think video chat) on the web, without requiring plugins, downloads or installs. It’s purpose is to help build a strong RTC platform that works across multiple web browsers, across multiple platforms.
Web API – An API to be used by third-party developers for developing web-based video chat-like applications.
WebRTC Native C++ API – An API layer that enables browser makers to easily implement the Web API proposal
Transport / Session – The session components are built by re-using components from libjingle, without using or requiring the XMPP/jingle protocol.
RTP Stack – A network stack for RTP, the Real-Time Protocol.
Session Management – An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.
VoiceEngine is a framework for the audio media chain, from sound card to the network.
NetEQ for Voice– A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.
Acoustic Echo Canceler (AEC) – The Acoustic Echo Canceler is a software-based signal processing component that removes, in real-time, the acoustic echo resulting from the voice being played out coming into the active microphone.
Noise Reduction (NR) -The Noise Reduction component is a software-based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)
REMB (receiver-side bandwidth estimation) is more common and transport-wide-cc (sender-side bandwidth estimation) is the more modern and future looking approach
BWE (Bandwidth Estimation )
FEC (Forward Error Correction) and ULPFEC (Uneven Level Protection Forward Error Correction) RED (Redundant coding) FIR (Full Intra Request) PLI (Picture Loss Indication) for video
RTCRtpEncodingParameters dictionary describes a single configuration of a codec for an RTCRtpSender.
active : flag to set if encoding is currently actively being used. codecPayloadType : single 8-bit byte (or octet) specifying the codec to use for sending the stream.
dtx : used for audio to indicate if discontinuous transmission (a feature by which a phone is turned off or the microphone muted automatically in the absence of voice activity)
maxBitrate : (unsigned long integer) maximum number of bits per second to allow for this encoding.
maxFramerate : (double-precision floating-point) maximum number of frames per second to allow for this encoding.
ptime: (unsigned long integer) preferred duration of a media packet in milliseconds used in audio encodings.
rid : (DOMString) if set, specifies an RTP stream ID (RID) to be sent using the RID header extension.
scaleResolutionDownBy :(double-precision floating-point) specifying a factor by which to scale down the video during encoding.
default value, 1.0 if video’s size will be the same as the original.
2.0 scales the video frames down by a factor of 2 in each dimension, resulting in a video 1/4 the size of the original.
update 2020 – This article was written very early in 2013 while WebRTC was being standardised and not as widely adopted since the inception of WebRTC began in 2012.
There are many more articles written after that to explain and emphasize the detailing and application of WebRTC. List of these is below :
We are using open source HTML5 SIP client entirely written in javascript to make it light and to have easy integration with the SIP server. No extension, plugin or gateway is needed to initiate the call from the web Client. The media stack rely on WebRTC. The client can be used to connect to any SIP or IMS network from HTML5 and WebRTC enabled browser to make and receive audio/video calls and instant messages.
Proxy Server / WS to UDP Translator :
For the Proposed Solution we are proposing the Freeware light SIP – Server which besides acting like the normal Sip Server and Registrar can also act like the Translator Engine to convert the SIP over WS message to SIP over UDP. As one of the requirement we need to terminate the call on the hard-phone like Turret which supports only SIP over UDP we need to have the translator in the overall picture which can convert the SIP over WS request to SIP over UDP. Through this component the use case like initiating the call from the web Browser the terminating the call at the Hard-phone is possible.
Soft Phone/ SIP Client :
We are using the Boghe IMS client to act like the Soft phone which supports the Audio Codec required to talk with web Client like PCMU And PCMA audio Codec.
Working on the discussed Components we have successfully established the following Use- Case Scenario.
Call Initiated from the Browser and Terminated on Browser :
(a) Signalling Part – Initial Handshake is done and Call is established. (Captured from Wire-Shark)
(b) Media Part – SDP is being exchanged as capture by Wire-shark and both the client can exchange Voice.
Call initiated from the Browser and Terminated on the Softphone and Vice-Vera :
(a) Signalling Part – Initial Handshake is done and Call is established. (Captured from Wire-Shark)
(b) Media Part – SDP is being exchanged as capture by Wire-shark and both the client can exchange Voice but have some dependency on machine being used.
Call initiated from the Softphone and Terminating on SoftPhone.
(a) Signalling Part : Initial Handshake is done and Call is established. (Captured from Wire-Shark)
(b) Media Part : No hiccup its working fine.
The structure for multi network traversal using ICE – STUN and TURN is described in the following diagram .
You can read more about NAT traversal using STUN and TURN here .
Detailed TURN server for WebRTC – RFC5766-TURN-Server , Coturn , Xirsys is here .