SIP is the most popular signaling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario, yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must handle multimedia streams. It should also provide conference control for building communication and collaboration apps. These solutions should be new and customisable.

SIP Recap

Apart from VoIP, it is used in other multimedia technologies like online games, video conferencing, instant messaging and other services. SIP is an IETF-defined signaling protocol for controlling communication sessions over IP. It is an application layer protocol, which runs on TCP, UDP and SCTP. SIP is based on the Web protocol Hypertext Transfer Protocol (HTTP) and is a request/response protocol.

SIPv1 :SIPv1 was text-based. It used Session Description Protocol (SDP) to describe the sessions and UDP as a transport protocol. SIPv1 only handled session establishment and did not handle Mid-conference controls.

SIPv2 :Simple Conference Invitation Protocol( SCIP) utilized Transmission Control Protocol (TCP) as the transport protocol to manage conferences. Ur was based on HTTP and used e-mail addresses as identifiers for users. SIP v2 was also text-based but based on HTTP and could use both UDP and TCP as transport protocols. The combination of SIPv1 and SCIP resulted in Session Initiation Protocol.

SIP is used to distribute session descriptions among potential participants. Once the session description is distributed, SIP can be used to negotiate and modify session parameters. SIP can also terminate the session. Role of SIP in conference involves

initiating confs
inviting participants
enabling them to join conf
leave conf
terminate conf
expel participants
configure media flow
control activities in conf

Mesh vs star topology

Mesh has p2p streaming thus has maximum data privacy and low cost for service provider because there arnt any media stream to take care of. Infact it just comes out of the box with WebRTC peerconnections. But ofcourse you cant scale a p2p mesh based architecture . Although the communication provider is now indifferent to the media stream traffic , the call quality of session is entirely dependent of the end clients processing and their bandwidths which in my experience cannot accommodate more than 20-25 participants in a call even above average bandwidth of 30-40 Mbps uplink , downlink both.
On the other hand in a star topolgy the participants only need to communicate with the media server , irrespective of the network conditions of the receivers .

Centralised (star) structure

In a Centralized ( star) signalling model , all communication flows via a centralized control point

Applications of star topology could be MCU and SFU.

Centralised Media / MCU

Multipoint Control Unit (MCU) uses mixer found in video conferencing bridges.

(+) proven interworking with legacy systems
(+) single point to manage transcoding
(+) energy efficient mode of operation , keeping cleint side stream management low
(+) single point for DTMF inband/signaling processing
(-) CPU and resource intensive on server side
(-) adds latency for traversal via media server
(-) self managed scaling , heavy tarffic and resource maintained
(-) possible security vulnerability as server decryptes media packets

Centralised Media via SFU

Single Forwarding Unit ( SFU) is a neew topology where the centralized media server only forwards or proxies the streams without mixing.

(+) scales for low latency video streaming
(+) less CPU consumption of server
(+) can control output stream for each peers based on their network capabilities
(-) still susceptable to security vulnerability at the focal point.

Decentralised structure

In a decentralized ( mesh) signaling structure, participants can communicate p2p

Decentralized media, Multi unicast streaming

Decentralized media, Multicast streaming

Mesh based communication

Limitations of WebRTC mesh Architecture

WebRTC is intrinsically a p2p system and as more participants join the session , the network begins to resemble a mesh. Audio and textual data being the lighter option from heavy video media streams can still adjust to the difficult conditions without much noticible lag. However video streams take a hit when peers are on difficult bandwidth and use differnt qualities of video sources.

Lets assume 3 different clients communication on WebRTC mesh session

WebRTC browser on high resolution system ( desktop , laptop , kiosk) – this client will likely have high quality stream and would like to consume high quality as well
Mobile browser of native WebRTC client – this will have average quality stream and may fluctuate owing to telecom network handover or instability in moving beween locations
Embedded system like Raspberry pi with camera module – since this is an embedded system likley part of IoT surveillance system , it will try to restrict the outgoing usuage and incoming stream consumption to minimal.

Some issue with WebRTC mesh conference include

Unmatched quality of stream for individual p2p streams in mesh make it difficult to have a homogeneous session quality.
Often video packet go out of sync with audio packets leading to delay or freezing due to packet loss.
Pixelating video when resolution of incoming video does not match the viewers display mode eg : low quality 320×280 pixel video viewed on desktop monitor with 1080×720 resolution.
Different source encoders at peers WebRTC client behave different . eg : webrtc stream from an embedded system like Rpi will be different from that of a WebRTC browser like Safari or mozilla or a mobile browser like chrome on Android.

Although with auto adjustments in WebRTC’s media stack , combinations of bitrate and resolution are manipulated in realtime based on feedback packets to adjust the qualities of your video streaming to bandwidth constraints of your own and the peer, there exist many difficulties to have large number of partcipants ( in order of few tens to hundreds) to join the mesh session. Even with an excellent connection and great scale of bandwidth of 5G networks it is just not feasible to host even upto 100 users on a common mesh based video system.

Unicast, BroastCast and Multicast media distribution

Unicast	BroadCast	MultiCast
one-to-one transmission	one-to-all within a range Its types are – limited broadcast and – Direct broadcast	one-to-many servers direct single streams towards any listener who wants to connect. the stream is replicated many times across the network.
usage : RTC over the networks between two specific endpoints	usage : conference streaming	usage : IPTV that distributes to hundres or thousands of viewers

Inspite of both being a star topology, SFU/Selective Forwarding Unit is different from MCU as in contrast to MCU it does not do any heavy duty processing on media streams , it only fetches the stream and routes them to other peers .

On the other hand MCU ( Multipoint Control Unit ) media servers need a lot of computational strength to perform many operations on RTP stream such as mixing , multiplexing, filytering echo /noise etc.

Scalable Video Coding (SVC) for large groups

while simulcast streams multiple versions of the same stream with differenet qualities like resolutions where the SFU can pick the appropriate one for the destination. SFU can also forward different framerates to different destinations absed on their bandwidth. Some of the Conference Bridge types :-

1. Bridge

Centralised entity to book conf , start conf , leave conf . Therefore single point of failure potentially .

To create conf : conf created on a bridge URL , bridge registers on SIP Server, participants join the conf on the bridge using INVITES
To stop conf : either participant can Leave with BYE or conf can terminate by sending BYE to all

2. Endpoints as Mixer

Endpoints handle stream , decentralized media , therefore adhoc suited

(-) mixer UAs cannot leave untill conf finishes

3. Mesh

complex and more processing power on each UA required

(+) no single point of failure
(-) high network congestion and endpoint processing
(-) endpoints have to handle NATIng

Large scale multiparticipant WebRTC sessions

A MCU ( Media control Unit) which acts as a bridge between all particpants is a traditionally used system to host large conferences. However a MCU limits or lowers the bandwidth usuage by packing the streams together .

A SFU ( Single Forwarding Unit ) on the other hand simply forwards the stream.

This setup is usualy designed with heavy bandwidth and upload rates in mind and are more scalable and resilient to bad quality stream than p2p type mesh setups. As these media gateways servers scale to accommodate more simultaneous real time users , their bandwidth consumption is heavy and expensive( some thing to be kept in mind while buying instances fro cloud providers like azure or AWS).

Some of the many options to make SFU (single forwarding unit setup) for WebRTC mediastreams are listed below :-

Kurento

Opensource (Apache 2.0) WebRTC gateways that has buildin integration to OpenCV.

Pipeline Architecture Design Pattern

Features in KMS ( Kurento Media Server) include Augmentation, face reciognition, filetrs, Object tracking and even virtual fencing.

Other features like mixing , transcoding, recording as well as client APIs make it suitable for integration into rich multimedia applications.

(+) It can function as both MCU and SFU.
(+) Added Media processing and transformations – Augmented Reality , Blending , Mixing , Analyzing ..
(+) ML freindly + openCV filter ( samples provided )
(+) pipeline used with computer vision

Nightly build, good docuemntion and developer gtraction make this a good choice. Latest version at the time of writing this article is Kurento 6.15.0 release on november 2020.

website : http://www.kurento.org/
Developer docs – https://doc-kurento.readthedocs.io/en/latest/
Related : OpenVidu

Licode

Opensource (MIT) WebRTC Comm platform by Lynckia.

Simple and starightforward to build from source . Latest release is v8 on sep 2019.

Erizo, its WebRTC core, is by default is SFU but also can be switched to MCU for more features like output streaming, transcoding.

It is written in c++ and uses nodejs API to communicate with server.

Supports modules like recording which can be added.

Website : http://lynckia.com/licode/
Developer docs : https://licode.readthedocs.io/en/master/

Jitsi

Opensource (Aapache 2.0)Video conferencing called Jitsi Video Bridge ( jvb).

JITSI Components

Jitsi VideoBridge – SFU
Jicofo – “Focus” component that initiates Jingle on behalf of JVB
Jigasi – SIP to XMPP signaling and media gateway
Jirecon – Component that allows for recording of RTP streams
Jibri – New live recording component

Other client side components and SDK

lib-jitsi-meet/Strophe.js – Javascript (Browser/Node.js)
XMPPFramework/MeetRTC_iOS – iOS
Smack – Java/Android

(+) Supports high capacity SFU. Provides tools ( jibri) for recording and/or streaming.
(+) Has Android and iOS SDKs.
(-) Low sip support ( more on XMPP) Orignally uses XMPP signalling but can communicate with SIP platfroms using a gateway which is part of Jitsi project .

It is best used as a binary package on debina / ubuntu instead of self Maven compile. The most recent release is 2.0.5390 release on 12 Jan 2021.

Website : https://jitsi.org/jitsi-videobridge/
Developer : https://jitsi.github.io/handbook/docs/dev-guide/dev-guide-web

MediaSoup

Opensource ( ISC) SFU conferecing server for both WebRTc and plain non secured RTP.

Producer consumer archietecture design pattern.

(+) It is signalling agnostic
(+) nodejs module in server ( media handling in cpp)
(+) Provides JS and c++ client libraries
(+) audio/video consumers or consumers can be Gstreamer and FFMPEG scripts

Relatively new with less documentation however simpleistic and minimilistic deisgn make it easy to grasp and run.

Website : https://mediasoup.org/
Developer docuemntation : https://github.com/versatica/mediasoup/

Janus

WebRTc gateway is also opensource ( GNU GPL v3)

Build on C. It does have ability to switch between SFU and MCU and provides pligins on top like recording.

By default uses a Websocket based protocol , HTTP/JSON and XMPP but can communicate with SIP platforms too using plugin.

JANUS as WebRTC SFU

Website : https://janus.conf.meetecho.com/
Developer docs : https://janus.conf.meetecho.com/docs/

Asterisk SFU

MCU based pure SIP signalling and media server ( GNU GPL v2 ) from Sangoma Technologies.

Powerful server core to many OTT / VOIP providers and call centre platfroms.

(+) Can be modified to any role using combination of hundres of modules.
(-) Project does not provide client SDK.

Asterisk – installation and dial plans for WebRTC supported PJSIP clients

Website : https://www.asterisk.org/
Developer docs : https://wiki.asterisk.org/wiki/display/AST/Home/

Red5

live streaming with SDK for native (ios , android) and html5. cutom server side application.

(+) supports Ip camera , drone , RTSP, RTMP , hardware encoders ( many client instances )
(+) failover to HLS and flash

website : https://red5pro.com
Documentation : https://red5pro.com/docs/

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

	Anonymous on NAT traversal using STUN and…
	Anonymous on VoIP/ OTT / Telecom Solution s…
	What is IPTV Player… on IPTV ( Internet Based Televisi…
	Anonymous on Proxying Media Streams via Kam…
	Anonymous on Proxying Media Streams via Kam…
	WebRTC 安全之道 –… on WebRTC Security Architecture
	Boris Ivanov on Asterisk – installation…

SIP conferencing and Media Bridges