SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.
Role of SIP in conference involves
- initiating confs
- inviting participants
- enabling them to join conf
- leave conf
- terminate conf
- expel participants
- configure media flow
- control activities in conf
Mesh vs star topology
Yes Mesh has p2p streaming so maximum data privacy and low cost for service provider because tehre arnt any media stream to take care of. Infact it just comes out of the box with WebRTC peerconnections .
But ofcourse you cant scale a p2p mesh based archietcture . Although the communication provider is now indifferent to the media stream traffic , the call quality of session is entirely dependent of the end clients processing and their bandwidths which in my experince caanot accomodate more than 20-25 particpants in a call even above average bandwidth of 30-40 Mbps uplink , downlink both.
On the other hand in a star topolgy the participants only need to communicate with the media server , irrrespective of the network conditions of the receivers .
In a Centralised ( star) signalling model , all communication flows via a centralised control point

In a decentralised ( mesh) signalling structure , participants can communicate p2p

Unicast vs Multicast Media Distribution
Decentralised Media , Multi unicast streaming

Decentralised media , Multicast

Centralised Media / MCU

Inspite of both being a star topology , SFU/Selective Forwarding Unit is different from MCU as in contrast to MCU it does not do any heavy duty processing on media streams , it only fetches the stream and routes them to other peers .
On the other hand MCU ( Multipoint Control Unit ) media servers need a lot of computational strength to perform many operations on RTP stream such as mixing , multiplexing, filytering echo /noise etc.
Scalable Video Coding (SVC) for large groups
while simulcast streams multiple versions of the same stream with differenet qualities like resolutions where the SFU can pick the appropriate one for the destination. SFU can also forward different framerates to differnrt detsinations absed on their bandwidth
….
Conference types
1. Bridge
Centralised entity to book conf , start conf , leave conf . Therefore single point of failure potentially .
To create conf : conf created on a bridge URL , bridge registers on SIP Server, participants join the conf on the bridge using INVITES
To stop conf : either participant can Leave with BYE or conf can terminate by sending BYE to all
2. Endpoints as Mixer
Endpoints handle stream , decentralised media , therefore adhoc suited
mixer UAs cannot leave untill conf finishes
3. Mesh
complex and more processing power on each UA required
no single point of failure but endpoints have to handle NATIng
References :