SIP conferencing and Media Bridges

SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

Role of SIP in conference involves

  • initiating confs
  • inviting participants
  • enabling them to join conf
  • leave conf
  • terminate conf
  • expel participants
  • configure media flow
  • control activities in conf

Mesh vs star topology

Yes Mesh has p2p streaming so maximum data privacy and low cost for service provider because tehre arnt any media stream to take care of. Infact it just comes out of the box with WebRTC peerconnections .

But ofcourse you cant scale a p2p mesh based archietcture . Although the communication provider is now indifferent to the media stream traffic , the call quality of session is entirely dependent of the end clients processing and their bandwidths which in my experince caanot accomodate more than 20-25 particpants in a call even above average bandwidth of 30-40 Mbps uplink , downlink both.

On the other hand in a star topolgy the participants only need to communicate with the media server , irrrespective of the network conditions of the receivers .

Centralised ( star) structure

In a Centralised ( star) signalling model , all communication flows via a centralised control point

Decentralised ( mesh) structure

In a decentralised ( mesh) signalling structure , participants can communicate p2p

Unicast vs Multicast Media Distribution

Decentralised Media , Multi unicast streaming

Decentralised media , Multicast streaming

Centralised Media / MCU

Inspite of both being a star topology , SFU/Selective Forwarding Unit is different from MCU as in contrast to MCU it does not do any heavy duty processing on media streams , it only fetches the stream and routes them to other peers .

On the other hand MCU ( Multipoint Control Unit ) media servers need a lot of computational strength to perform many operations on RTP stream such as mixing , multiplexing, filytering echo /noise etc.

Scalable Video Coding (SVC) for large groups

while simulcast streams multiple versions of the same stream with differenet qualities like resolutions where the SFU can pick the appropriate one for the destination. SFU can also forward different framerates to differnrt detsinations absed on their bandwidth

Conference types

1. Bridge

Centralised entity to book conf , start conf , leave conf . Therefore single point of failure potentially .

To create conf : conf created on a bridge URL , bridge registers on SIP Server, participants join the conf on the bridge using INVITES

To stop conf : either participant can Leave with BYE or conf can terminate by sending BYE to all

2. Endpoints as Mixer

Endpoints handle stream , decentralised media , therefore adhoc suited

mixer UAs cannot leave untill conf finishes

3. Mesh

complex and more processing power on each UA required

no single point of failure but endpoints have to handle NATIng

Limitations of WebRTC mesh Archietcture

WebRTC is intrinsically a p2p system and as more participants join the session , the network begins to resemble a mesh. Audio and textual data being the lighter option from heavy video media streams can still adjust to the difficult conditions without much noticible lag. However video streams take a hit when peers are on difficult bandwidth and use differnt qualities of video sources.

Lets assume 3 different clients communication on WebRTC mesh session

  1. WebRTC browser on high resolution system ( desktop , laptop , kiosk) – this client will likely have high quality stream and would like to consume high quality as well
  2. Mobile browser of native WebRTC client – this will have aberage quality stream and may fluctuate owing to telecom network handover or instability in moving beween locations
  3. Embedded system like Raspberry pi with camera module – since this is an embedded system likley part of IoT survillance system , it will try to restrict the outgoing usuage and incoming stream consumption to minimal.

Some issue with WebRTC mesh conference include

  • Unmatched quality of stream for idnividual p2p streams in mesh make it difficult to have a homogenous session quality.
  • Often video packet go out of sync with audio packets leading to delay or freezing due to packet loss.
  • Pixelating video when resolution of incoming video does not match the viewers display mode eg : low quality 320×280 pixel video viewed on desktop monitor with 1080×720 resolution.
  • Different source encoders at peers WebRTC client behave different . eg : webrtc stream from an embedded system like Rpi will be different from that of a WebRTC browser like Safari or mozilla or a mobile browser like chrome on Android.

Although with auto adjustments in WebRTC’s media stack , combinations of bitrate and resolution are manipulated in realtime based on feedback packets to adjust the qualities of your video streaming to bandwidth constraints of your own and the peer, there exist many difficulties to have large number of partcipants ( in order of few tens to hundreds) to join the mesh session. Even with an excellent connection and great scale of bandwidth of 5G networks it is just not feasible to host even upto 100 users on a common mesh based video system.

Large scale multiparticipant WebRTC sessions

A MCU ( Media control Unit) which acts as a bridge between all particpants is a tradiotionally used system to host large conferences. However a MCU limits or lowers the bandwidth usuage by packing the streams together .

A SFU ( Single Forwarding Unit ) on the other hand simply forwards the stream.

This setup is usualy designed with heavy bandwidth and upload rates in mind and are more scalable and resilient to bad quality stream than p2p type mesh setups. As these media gateways servers scale to accomodate more simultanous real time users , their bandwidth consumption is heavy and expensive( some thing to be kept in mind while buying instances fro cloud providers like azure or AWS).

Some of the many options to make SFU (single forwarding unit setup) for WebRTC mediastreams include


Opensource (Apache 2.0) WebRTC gateways that has buildin integration to OpenCV.

Features in KMS ( Kurento Media Server) include Augmentation, face reciognition , filetrs , Object tracking and even virtual fencing.

Other features like mixing , transcoding, recording as well as client APIs make it suitable for integration into rich multimedia applications.

It can function as both MCU and SFU.

Nightly build , good docuemntion and developer gtraction make this a good choice. Latest version at the time of writing this article is Kurento 6.15.0 release on november 2020.


Opensource (MIT) WebRTC Comm platform by Lynckia.

Simple and starightforward to build from source . Latest release is v8 on sep 2019.

Erizo, its WebRTC core, is by default is SFU but also can be switched to MCU for more features like output streaming , tgranscoding.

It is written in c++. and uses nodejs API to communicate with server.

Supports modules like recording which can be added.


Opensource (Aapache 2.0)Video conferecing called Jitsi Video Bridge ( jvb) . Supports high capacity SFU.

Provides tools ( jibri) for recording and/or streaming . Also has Android and iOS SDKs.

It is best used as a binary package on debina / ubuntu instead of self Maven compile.

Orignally uses XMPP signalling but can communicate with SIP platfroms using a gateway which is part of Jitsi project .

The most recent release is 2.0.5390 release on 12 Jan 2021


Opensource ( ISC) SFU conferecing server for both WebRTc and plain non secured RTP.

It is signalling agnostic .

Relatively new with less documentation however simpleistic and minimilistic deisgn make it easy to grasp and run

Provides JS and c++ client libraries

Latets release v3 on March 2021.


MCU based pure SIP signalling and media server ( GNU GPL v2 ) from Sangoma Technologies.

Powerful server core to many OTT / VOIP providers and call centre platfroms.

Can be modified to any role using combination of hundres of modules.

Project does not provide client SDK.

Latest is verion 18.x release on Oct 2020.


WebRTc gateway is also opensource ( GNU GPL v3)

Build on C. It does have ability to switch between SFU and MCU and provides pligins on top like recording.

By default uses a Websocket based protocol but can communicate with SIP platfroms too

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.