Media Architecture , RTP topologies

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Point to Point

Endpoints communicating over unicast
RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session.

Advantages of P2p Disadvantages of p2p
  • Facilitates private communication between the parties
  • Only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports
  • Point to Point via Middlebox

    Same as above but with a middle-box involved


    mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion
    does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

    Subtypes of Multibox :

    Transport/Relay Anchoring

    Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server

    Middleboxes for auditing or privacy control of participant’s IP

    Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup

    Transport translator

    interconnecting networks like multicast to unicast

    media packetization to allow other media to connect to the session like non-RTP protocols

    Media translator

    modified the media inside of RTP streams commonly known as transcoding

    can do up to full encoding/decoding of RTP streams

    in many cases it can also act on behalf of non-RTP supported endpoints , receiving and responding to feedback reports ad performing FEC ( forward error corrected )

    Back-To-Back RTP Session

    Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions.

    Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs

    Advantages of Back-To-Back RTP SessionDisadvantages of Back-To-Back RTP Session
    B2BUA / media bridge take responsibility tpo relay and manages congestion
  • It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations
  • Point to Point using Multicast

    Any-Source Multicast (ASM)

    traffic from any particpant sent to the multicat group address reaches all other partcipants

    Source-Specific Multicast (SSM)

    Selective Sender stream to the multicast group which streams it to the recibers

    Point to Multipoint using Mesh

    many unicast RTP streams making a mesh

    Point to Multipoint + Translator

    Some more variants of this topology are Point to Multipoint with Mixer

    Media Mixing Mixer

    receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through

    static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

    Media Switching Mixer

    RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

    The Mixer can reduce bitrate or switch between sources like active speakers.

    SFU ( Selective Forwarding Unit)

    Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.

    Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

    Advantges of SFUDisadvatages of SFU
    Low lanetncy and low jitter buffer requirnment by avoiding re encondingunable to manage network and control bitrate

    On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.

    Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.

    Other Hybrid Topologies

    There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan

    Hybrid model

    Some endpoints receive forwarded streams while others receive mixed/composited streams.

    Serverless models

    Centralized topology in which one endpoint serves as an MCU or SFU.

    Used by Jitsi and Skype

    Point to Multipoint Using Video-Switching MCUs

    Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.

    This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP

    To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains

    Cascaded SFUs

    SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources

    Transport Protocols

    Before getting into an in-depth discussion of all possible types of Media Architectures in VoIP systems, let us learn about TCP vs UDP

    TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.

    Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction is used due to its tunnelling methods where packets of other protocols are encapsulated inside of UDP packet. However to provide E2E security other methods for Auth and encryption are used.

    Audio PCAP storage and Privacy constraints for Media Servers

    A Call session produces various traces for offtime monitoring and analysis which can include

    CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

    Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

    Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

    Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

    Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

    Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

    References :

    To learn about the difference between Media Server tologies

    • centralized vs decentralised,
    • SFU vs MCU ,
    • multicast vs unicast ,

    Read – SIP conferecning and Media Bridge

    SIP conferencing and Media Bridges

    SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

    To read more about buildinga scalable VoIP Server Side architecture and

    • Clustering the Servers with common cache for High availiability and prompt failure recovery
    • Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
    • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
    • Containerization and Autoscalling

    Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    I have been contemplating points that make for a successful developer to develop solutions and services for a Telecom Application Server. The trend has shown many variations from pure IN programs like VPN, Prepaid billing logic to SIP servlets for call parking, call completion. From SIP servlets to JAISNLEE open standard-based communication. Read about Introduction … Continue reading VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform


    GStreamer ( LGPL )ia a media handling library written in C for applicatioan such as streaming , recording, playback , mixing and editing attributes etc. Even enhnaced applicaiosn such as tsrancoding , media ormat conversion , streaming servers for embeeded devices ( read more about Gstreamer in RPi in my srticle here).
    It encompases various codecs, filters and is modular with plugins developement to enhance its capabilities. Media Streaming application developers use it as part of their framework at either the broadcaster’s end or as media player.

    gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

    More detailed reading :

    GStreamer-1.8.1 rtsp server and client on ubuntu – Install and configuration for a RTSP Streaming server and Client

    crtmpserver + ffmpeg –

    Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

     attempts of streaming / broadcasting Live Video WebRTC call to non WebRTC supported browsers and media players such as VLC , ffplay , default video player in Linux etc .

    continue : Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

    httontinuation to the attempts / outcomes and problems in building a WebRTC to RTP media framework that successfully stream / broadcast WebRTC content to non webrtc supported browsers ( safari / IE ) / media players ( VLC )

    TO continue with basics of gstreamer keep reading

    To list all packages of Gstreamer

    pkg-config --list-all | grep gstreamer
    • gstreamer-gl-1.0 GStreamer OpenGL Plugins Libraries – Streaming media framework, OpenGL plugins libraries
    • gstreamer-bad-video-1.0GStreamer bad video library – Bad video library for GStreamer elements
    • gstreamer-tag-1.0 GStreamer Tag Library – Tag base classes and helper functions
    • gstreamer-bad-base-1.0 GStreamer bad base classes – Bad base classes for GStreamer elements
    • gstreamer-net-1.0GStreamer networking library – Network-enabled GStreamer plug-ins and clocking
    • gstreamer-sdp-1.0 GStreamer SDP Library – SDP helper functions
    • gstreamer-1.0 GStreamer – Streaming media framework
    • gstreamer-bad-audio-1.0 GStreamer bad audio library, uninstalled – Bad audio library for GStreamer elements, Not Installedgstreamer-allocators-1.0 GStreamer Allocators Library – Allocators implementation
    • gstreamer-player-1.0 GStreamer Player – GStreamer Player convenience library
    • gstreamer-insertbin-1.0 GStreamer Insert Bin – Bin to automatically and insertally link elements
    • gstreamer-plugins-base-1.0 GStreamer Base Plugins Libraries – Streaming media framework, base plugins libraries
    • gstreamer-vaapi-glx-1.0 GStreamer VA-API (GLX) Plugins Libraries – Streaming media framework, VA-API (GLX) plugins librariesgstreamer-codecparsers-1.0 GStreamer codec parsers – Bitstream parsers for GStreamer elementsgstreamer-base-1.0 GStreamer base classes – Base classes for GStreamer elements
    • gstreamer-app-1.0 GStreamer Application Library – Helper functions and base classes for application integration
    • gstreamer-vaapi-drm-1.0 GStreamer VA-API (DRM) Plugins Libraries – Streaming media framework, VA-API (DRM) plugins librariesgstreamer-check-1.0 GStreamer check unit testing – Unit testing helper library for GStreamer modules
    • gstreamer-vaapi-1.0 GStreamer VA-API Plugins Libraries – Streaming media framework, VA-API plugins libraries
    • gstreamer-controller-1.0 GStreamer controller – Dynamic parameter control for GStreamer elements
    • gstreamer-video-1.0 GStreamer Video Library – Video base classes and helper functions
    • gstreamer-vaapi-wayland-1.0 GStreamer VA-API (Wayland) Plugins Libraries – Streaming media framework, VA-API (Wayland) plugins libraries
    • gstreamer-fft-1.0 GStreamer FFT Library – FFT implementation
    • gstreamer-mpegts-1.0 GStreamer MPEG-TS – GStreamer MPEG-TS support
    • gstreamer-pbutils-1.0 GStreamer Base Utils Library – General utility functions
    • gstreamer-vaapi-x11-1.0 GStreamer VA-API (X11) Plugins Libraries – Streaming media framework, VA-API (X11) plugins libraries
    • gstreamer-rtp-1.0 GStreamer RTP Library – RTP base classes and helper functions
    • gstreamer-rtsp-1.0 GStreamer RTSP Library – RTSP base classes and helper functions
    • gstreamer-riff-1.0 GStreamer RIFF Library – RIFF helper functions
    • gstreamer-audio-1.0 GStreamer Audio library – Audio helper functions and base classes
    • gstreamer-plugins-bad-1.0 GStreamer Bad Plugin libraries – Streaming media framework, bad plugins libraries
    • gstreamer-rtsp-server-1.0 gst-rtsp-server – GStreamer based RTSP server

    At the time of writing this article Gstreamer an much early version in 1.X , which was newer than its then stable version 0.x. Since then the library has updated many fold. summarising release highlights for major versions as the blog was updated over time .

    Project : Making and IP survillance system using gstreamer and Janus

    To build a turn-key easily deployable surveillance solution 

    Features :

    1. Paring of Android Mobile with box
    2. Live streaming from Box to Android
    3. Video Recording inside the  box
    4. Auto parsing of recorded video around motion detection 
    5. Event listeners 
    6. 2 way audio
    7. Inbuild Media Control Unit
    8. Efficient use of bandwidth 
    9. Secure session while live-streaming


    1. Authentication ( OTP / username- password)
    2. Livestreaming on Opus / vp8 
    3. Session Security and keepalives for live-streaming sessions
    4. Sync local videos to cloud storage 
    5. Record and playback with timeline and events 
    6. Parsing and restructuring video ( transcoding may also be required ) 
    7. Coturn server for NAT and ICE
    8. Web platform on box ( user interface )+ NoSQL
    9. Web platform on Cloud server ( Admin interface )+ NoSQL
    10.  REST APIs for third party add-ons ( Node based )
    11. Android demo app for receiving the live stream and feeds

    Varrying experiments and working gstreamer commands

    Local Network Stream 

    To create /dev/video0

    modprobe bcm2835-v4l2

    To stream on rtspserver using rpicamsrc using h264 parse

    ./gst-rtsp-server-1.4.4/examples/test-launch --gst-debug=2 '(rpicamsrc num-buffers=5000 ! 'video/x-h264,width=1080,height=720,framerate=30/1' ! h264parse ! rtph264pay name=pay0 pt=96 )'

    ./test-launch “( tcpclientsrc host= port=5000 ! gdpdepay ! rtph264pay name=pay0 pt=96 )”

    pipe raspivid to tcpserversink

    raspivid -t 0 -w 800 -h 600 -fps 25 -g 5 -b 4000000 -vf -n -o - | gst-launch-1.0 -v fdsrc ! h264parse ! gdppay ! tcpserversink host= port=5000;

    Stream Video over local Network with 15 fps

    raspivid -n -ih -t 0 -rot 0 -w 1280 -h 720 -fps 15 -b 1000000 -o - | nc -l -p 5001

    streaming video over local network with 30FPS and higher bitrate

    raspivid -n -t 0 -rot 0 -w 1920 -h 1080 -fps 30 -b 5000000 -o - | nc -l -p 5001


    Audio record to file
    Using arecord :

    arecord -D plughw:1 -c1 -r 48000 -f S16_LE -t wav -v file.wav;

    Using pulse :
    pulseAudio -D

    gst-launch-1.0 -v pulsesrc device=hw:1 volume=8.0 ! audio/x-raw,format=S16LE ! audioconvert ! voaacenc bitrate=48000 ! aacparse ! flvmux ! filesink location = "testaudio.flv";

    Video record to file ( mpg)

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! 'video/x-h264,width=640,height=480’ ! mux. avimux name=mux ! filesink location=testvideo2.mpg;

    Video record to file ( flv )

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! flvmux ! filesink location="testvieo.flv";

    Video record to file ( h264)
    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! filesink location=”raw3.h264″;

    Video record to file ( mp4)

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! mp4mux ! filesink location=video.mp4;

    Audio + Video record to file ( flv)

    gst-launch-1.0 -e /
    rpicamsrc bitrate=500000 ! /
    video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! muxout. /
    pulsesrc volume=8.0 ! /
    queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. /
    flvmux name=muxout streamable=true ! filesink location ='test44.flv';

    Audio + Video record to file ( flv) using pulsesrc

    gst-launch-1.0 -v --gst-debug-level=3 pulsesrc device="alsa_input.platform-asoc-simple-card.0.analog-stereo" volume=5.0 mute=FALSE ! audio/x-raw,format=S16LE,rate=48000,channels=1 ! audioresample ! audioconvert ! voaacenc ! aacparse ! flvmux ! filesink location="voicetest.flv";

    Audio + Video record to file (mp4)

    gst-launch-1.0 -e /
    rpicamsrc bitrate=500000 ! /
    video/x-h264,width=320,height=240,framerate=10/1 !s h264parse ! muxout. /
    pulsesrc volume=4.0 ! /
    queue ! audioconvert ! voaacenc ! muxout. /
    flvmux name=muxout streamable=true ! filesink location = 'test224.mp4';


    stream raw Audio over RTMP to srtmpsink

    gst-launch-1.0 pulsesrc device=hw:1 volume=8.0 ! /
    audio/x-raw,format=S24LE ! audioconvert ! voaacenc bitrate=48000 ! aacparse ! flvmux ! rtmpsink location = “rtmp://”;

    stream AACpparse Audio over RTMP to srtmpsink

    gst-launch-1.0 -v --gst-debug-level=3 pulsesrc device="alsa_input.platform-asoc-simple-card.0.analog-stereo" volume=5.0 mute=FALSE ! audio/x-raw,format=S16LE,rate=48000,channels=1 ! audioresample ! audioconvert ! voaacenc ! aacparse ! flvmux ! rtmpsink location="rtmp://";

    stream Video over RTMP

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! /
    video/x-h264,width=320,height=240,framerate=6/1 ! h264parse ! /
    flvmux ! rtmpsink location = ‘rtmp:// live=1’;

    stream Audio + video over RTMP from rpicamsrc , framerate 10

    gst-launch-1.0 rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! muxout. pulsesrc volume=8.0 ! queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. flvmux name=muxout streamable=true ! rtmpsink location ='rtmp://';

    stream Audio + video over RTMP from rpicamsrc , framerate 30

    gst-launch-1.0 rpicamsrc bitrate=500000 ! video/x-h264,width=1280,height=720,framerate=30/1 ! h264parse ! muxout. pulsesrc ! queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. flvmux name=muxout ! queue ! rtmpsink location ='rtmp://';

    VOD ( video On Demand )

    Stream h264 file over RTMP

    gst-launch-1.0 -e filesrc location="raw3.h264" ! video/x-h264 ! h264p
    arse ! flvmux ! rtmpsink location = 'rtmp://';

    Stream flv file over RTMP

    gst-launch-1.0 -e filesrc location=”testvieo.flv” ! /
    video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! /
    flvmux ! rtmpsink location = 'rtmp://';

    Github Repo for Livestreaming

    Contains code for Android and ios Publishers , players on various platforms including HLS and Flash , streamings servers , Wowza playing mosules , webrtc broadcast

    Gstreamer 1.8.0 – 24 March 2016

    Features Hardware-accelerated zero-copy video decoding on Android

    New video capture source for Android using the android.hardware.Camera API

    Windows Media reverse playback support (ASF/WMV/WMA)

    tracing system provides support for more sophisticated debugging tools

    high-level GstPlayer playback convenience API

    Initial support for the new Vulkan API

    Improved Opus audio codec support: Support for more than two channels; MPEG-TS demuxer/muxer can handle Opus; sample-accurate encoding/decoding/transmuxing with Ogg, Matroska, ISOBMFF (Quicktime/MP4), and MPEG-TS as container; new codec utility functions for Opus header and caps handling in pbutils library. The Opus encoder/decoder elements were also moved to gst-plugins-base (from -bad), and the opus RTP depayloader/payloader to -good.

    Asset proxy support in the GStreamer Editing Services

    GStreamer 1.16.0 – 19 April 2019.

    GStreamer WebRTC stack gained support for data channels for peer-to-peer communication based on SCTP, BUNDLE support, as well as support for multiple TURN servers.

    AV1 video codec support for Matroska and QuickTime/MP4 containers and more configuration options and supported input formats for the AOMedia AV1 encoder

    Closed Captions and other Ancillary Data in video

    planar (non-interleaved) raw audio

    GstVideoAggregator, compositor and OpenGL mixer elements are now in -base

    New alternate fields interlace mode where each buffer carries a single field

    WebM and Matroska ContentEncryption support in the Matroska demuxer

    new WebKit WPE-based web browser source element

    Video4Linux: HEVC encoding and decoding, JPEG encoding, and improved dmabuf import/export

    Hardware-accelerated Nvidia video decoder gained support for VP8/VP9 decoding, whilst the encoder gained support for H.265/HEVC encoding.

    Improvements to the Intel Media SDK based hardware-accelerated video decoder and encoder plugin (msdk): dmabuf import/export for zero-copy integration with other components; VP9 decoding; 10-bit HEVC encoding; video post-processing (vpp) support including deinterlacing; and the video decoder now handles dynamic resolution changes.

    ASS/SSA subtitle overlay renderer can now handle multiple subtitles that overlap in time and will show them on screen simultaneously

    Meson build feature-complete (with the exception of plugin docs) and it is now the recommended build system on all platforms. The Autotools build is scheduled to be removed in the next cycle.

    GStreamer Rust bindings and Rust plugins module

    GStreamer Editing Services allows directly playing back serialized edit list with playbin or (uri)decodebin

    References :