Media Architecture , RTP topologies

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Point to Point

Endpoints communicating over unicast
RTP and RTCP tarffic is private between sender and reciver even if the endpoints contains multiple SSRC’s in RTP session.

Advantages of P2p Disadvantages of p2p
  • Facilitates private communication between the parties
  • Only limitaion to number of stream between the partcipants are the physical limiations such as bandwidth, num of available ports
  • Point to Point via Middlebox

    Same as above but with a middle-box involved


    mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion
    does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

    Subtypes of Multibox :

    Transport/Relay Anchoring

    Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server

    Middleboxes for auditing or privacy control of participant’s IP

    Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup

    Transport translator

    interconnecting networks like multicast to unicast

    media packetization to allow other media to connect to the session like non-RTP protocols

    Media translator

    modified the media inside of RTP streams commonly known as transcoding

    can do up to full encoding/decoding of RTP streams

    in many cases it can also act on behalf of non-RTP supported endpoints , receiving and responding to feedback reports ad performing FEC ( forward error corrected )

    Back-To-Back RTP Session

    Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions.

    Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs

    Advantages of Back-To-Back RTP SessionDisadvantages of Back-To-Back RTP Session
    B2BUA / media bridge take responsibility tpo relay and manages congestion
  • It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations
  • Point to Point using Multicast

    Any-Source Multicast (ASM)

    traffic from any particpant sent to the multicat group address reaches all other partcipants

    Source-Specific Multicast (SSM)

    Selective Sender stream to the multicast group which streams it to the recibers

    Point to Multipoint using Mesh

    many unicast RTP streams making a mesh

    Point to Multipoint + Translator

    Some more variants of this topology are Point to Multipoint with Mixer

    Media Mixing Mixer

    receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through

    static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

    Media Switching Mixer

    RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

    The Mixer can reduce bitrate or switch between sources like active speakers.

    SFU ( Selective Forwarding Unit)

    Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.

    Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

    Advantges of SFUDisadvatages of SFU
    Low lanetncy and low jitter buffer requirnment by avoiding re encondingunable to manage network and control bitrate

    On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.

    Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.

    Other Hybrid Topologies

    There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan

    Hybrid model

    Some endpoints receive forwarded streams while others receive mixed/composited streams.

    Serverless models

    Centralized topology in which one endpoint serves as an MCU or SFU.

    Used by Jitsi and Skype

    Point to Multipoint Using Video-Switching MCUs

    Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.

    This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP

    To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains

    Cascaded SFUs

    SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources

    Transport Protocols

    Before getting into an in-depth discussion of all possible types of Media Architectures in VoIP systems, let us learn about TCP vs UDP

    TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.

    Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction is used due to its tunnelling methods where packets of other protocols are encapsulated inside of UDP packet. However to provide E2E security other methods for Auth and encryption are used.

    Audio PCAP storage and Privacy constraints for Media Servers

    A Call session produces various traces for offtime monitoring and analysis which can include

    CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

    Signalling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

    Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

    Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

    Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

    Firther more , strict accesscontrol , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

    References :

    To learn about the difference between Media Server tologies

    • centralized vs decentralised,
    • SFU vs MCU ,
    • multicast vs unicast ,

    Read – SIP conferecning and Media Bridge

    SIP conferencing and Media Bridges

    SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

    To read more about buildinga scalable VoIP Server Side architecture and

    • Clustering the Servers with common cache for High availiability and prompt failure recovery
    • Multitier archietcture ie seprartion between Data/session and Application Server /Engine layer
    • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
    • Containerization and Autoscalling

    Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    I have been contemplating points that make for a successful developer to develop solutions and services for a Telecom Application Server. The trend has shown many variations from pure IN programs like VPN, Prepaid billing logic to SIP servlets for call parking, call completion. From SIP servlets to JAISNLEE open standard-based communication. Read about Introduction … Continue reading VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

    Video Codecs – H264 , H265 , AV1

    Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1

    MPEG 2

    MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
    generic coding of moving pictures and associated audio information
    combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

    better than MPEG 1

    evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.


    • over-the-air digital television broadcasting and in the DVD-Video standard.
    • TV stations, TV receivers, DVD players, and other equipment
    • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
    • XDCAM – professional file-based video recording format.
    • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:


    Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
    introduced in 2004

    Better than MPEG2

    40-50% bit rate reduction compared to MPEG-2

    Support Up to 4K (4,096×2,304) and 59.94 fps
    21 profiles ; 17 levels

    Compression Model

    Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

    Hybrid spatial-temporal prediction model
    Flexible partition of Macro Block(MB), sub MB for motion estimation
    Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
    Introduced multi-view extension
    9 directional modes for intra prediction
    Macro Blocks structure with maximum size of 16×16
    Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )


    • most deployed video compression standard
    • Delivers high definition video images over direct-broadcast satellite-based television services,
    • Digital storage media and Blu-Ray disc formats,
    • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
    • Security and surveillance systems and DVB
    • Mobile video, media players, video chat


    High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC
    video compression standard designed to substantially improve coding efficiency
    stream high-quality videos in congested network environments or bandwidth constrained mobile networks
    Jan 2013
    product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

    better than H264

    overcome shortage of bandwidth, spectrum, storage
    bandwidth savings of approx. 45% over H.264 encoded content

    resolutions up to 8192×4320, including 8K UHD
    Supports up to 300 fps
    3 approved profiles, draft for additional 5 ; 13 levels
    Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

    multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

    Compression Model

    Enhanced Hybrid spatial-temporal prediction model
    CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures

    Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
    Individual rectangular regions that divide the image are independent

    Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.

    Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
    33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
    Entropy coding is only CABAC


    • cater to growing HD content for multi platform delivery
    • differentiated and premium 4K content

    reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
    also provide greater video quality experience at same bitrate

    Using ffmpeg for H265 encoding

    I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

    After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

    > ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4                                              ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   libavutil      56. 22.100 / 56. 22.100   libavcodec     58. 35.100 / 58. 35.100   libavformat    58. 20.100 / 58. 20.100   libavdevice    58.  5.100 / 58.  5.100   libavfilter     7. 40.101 /  7. 40.101   libavresample   4.  0.  0 /  4.  0.  0   libswscale      5.  3.100 /  5.  3.100   libswresample   3.  3.100 /  3.  3.100   libpostproc    55.  3.100 / 55.  3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     creation_time   : 2019-06-23T04:58:13.000000Z   Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping:   Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices                              : 1 x265 [info]: frame threads / pool features       : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0 x265 [info]: References / ref-limit  cu / depth  : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     encoder         : Lavf58.20.100     Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1       encoder         : Lavc58.35.100 libx265 frame=  746 fps= 64 q=-0.0 Lsize=     606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x     video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I:      3, Avg QP:27.18  kb/s: 1884.53  x265 [info]: frame P:    179, Avg QP:27.32  kb/s: 523.32   x265 [info]: frame B:    564, Avg QP:35.17  kb/s: 38.69    x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9%  encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25

    if you get error like

    Unknown encoder 'libx265'

    then reinstall ffmpeg with h265 support


    Realtime High quality video encoder
    product of product of the Alliance for Open Media (AOM)
    Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)

    better than H265

    AV1 is royalty free and overcomes the patent complexities around H265/HVEC


    • Video transmission over internet , voip , multi conference
    • Virtual / Augmented reality
    • self driving cars streaming
    • intended for use in HTML5 web video and WebRTC together with the Opus audio format

    Audio and Acoustic Signal Processing

    Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

    Application of audio Signal processing in general

    • storage
    • data compression
    • music information retrieval
    • speech processing ( emotion recognition/sentiment analysis , NLP)
    • localization
    • acoustic detection
    • Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
    • noise cancellation
    • acoustic fingerprinting
    • sound recognition ( speaker Identification , biometric speech verification , voice commands )
    • synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
    • enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

    Effects for audio streams processing

    • delay or echo
      To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
      Implemented using tape delays or bucket-brigade devices.
    • flanger
      delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
      signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
    • phaser
      signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
    • chorus
      delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
    • equalization
      frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
      overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
    • pitch shift
      shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
    • time stretching
      changing the speed of an audio signal without affecting its pitch.
    • resonators
      emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
    • modulation
      change the frequency or amplitude of a carrier signal in relation to a predefined signal.
    • compression
      reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
    • 3D audio effects
      place sounds outside the stereo basis
    • reverse echo
      swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
    • wave field synthesis
      spatial audio rendering technique for the creation of virtual acoustic environments

    ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

    • Acoustic echo control
      aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
    • Noise control
      microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
    • Gain control
      how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
    • Linear filtering
      ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
    • Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

    ASP applications in music playback

    AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

    • Post-processing
      techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
    • Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

    ASP for virtual assitants

    Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

    • Speech enhancement
      multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
    • Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
    • Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

    Other areas of ASP

    • Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

    Ref :
    wikipedia –
    IEEE –

    RealTime Transport protocol (RTP) and supporting protocols

    RTP is a protocol for delivering media stream end-to-end in real time over an IP network. Its applications include VoIP with SIP/XMPP, push to talk, WebRTC and teleconf, IOT media streaming, audio/video or simulation data, over multicast or unicast network services so on.

    RTSP provides stream control features to an RTP stream along with session management.

    RTCP, is also a companion protocol to RTP, used for feedback and inter-frame synchronization.

    • Receiver Reports (RRs) include information about the packet loss, interarrival jitter, and a timestamp allowing computation of the round-trip time between the sender and receiver.
    • Sender Reports( SR) include the number of packets and bytes sent, and a pair of timestamps facilitating inter-stream synchronization.

    SRTP provides security by end-to-end encryption while SDP provides session negotiation capabilities.

    In this article I will be going over RTP and its associated protocols in depth to show the inner workings in a RTP media streaming session .

    RTP (Real-time Transport Protocol)

    RTP is independent of the underlying transport and network layers and can be described as an application layer protocol dealing with IP networks.

    It does not address resource reservation and does not guarantee quality-of-service for real-time services.
    However it does provide services like payload type identification, sequence numbering, timestamping and delivery monitoring.

    RTP Packet via Wireshark
    RTP Packet

    The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence,

    Usage :
    Multimedia Multi particpant conferences
    Storage of continuous data
    Interactive distributed simulation
    active badge, control and measurement applications

    UDP provides best-effort delivery of datagrams for point-to-point as well as for multicast communications.

    RTP Session

    Real-Time Transport Protocol
        [Stream setup by SDP (frame 554)]
            [Setup frame: 554]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 .... = Extension: False
        .... 0000 = Contributing source identifiers count: 0
        0... .... = Marker: False
        Payload type: ITU-T G.711 PCMU (0)
        Sequence number: 39644
        [Extended sequence number: 39644]
        Timestamp: 2256601824
        Synchronization Source identifier: 0x78006c62 (2013293666)
        Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

    Ordering via Timestamp (TS) and Sequence Number (SN)

    TS used to order packets in correct timing order

    SN to detect packet loss

    For a video frame that spans multiple packets – TS is same but SN is different


    RTP payload type is a 7-bit numeric identifier that identifies a payload format. 


    • 0 PCMU
    • 1 reserved (previously FS-1016 CELP)
    • 2 reserved (previously G721 or G726-32)
    • 3 GSM
    • 4 G723
    • 8 PCMA
    • 9 G722
    • 12 QCELP
    • 13 CN
    • 14 MPA
    • 15 G728
    • 18 G729
    • 19 reserved (previously CN)


    • 25 CELB
    • 26 JPEG
    • 28 nv
    • 31 H261
    • 32 MPV
    • 33 MP2T
    • 34 H263
    • 72-76 reserved
    • 77–95 unassigned
    • dynamic H263-1998, H263-2000
    • dynamic (or profile) H264 AVC, H264 SVC , H265theora , iLBC , PCMA-WB ( G711 a law) , PCMU-WB ( G711 u law)G718, G719, G7221, vorbis , opus , speex , VP8 , VP9, raw , ac3 , eac3,

    Note : difference between PCMA ( G711 alaw) and PCMU ( G711 u law)G.711 μ-law tends to give more resolution to higher range signals while G.711 A-law provides more quantization levels at lower signal levels.


    • dynamic tone
    • telephone event ( DTMF)

    These codes were initially specified in RFC 1890, “RTP Profile for Audio and Video Conferences with Minimal Control” (AVP profile), superseded by RFC 3550, and are registered as MIME types in RFC 3555.  Now registering static payload types is now considered a deprecated practice in favor of dynamic payload type negotiation.

    Session identifiers

    In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

      0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       |V=2|P|X|  CC   |M|     PT      |       sequence number         |
       |                           timestamp                           |
       |           synchronization source (SSRC) identifier            |
       |            contributing source (CSRC) identifiers             |
       |                             ....                              |

    Synchronization source (SSRC)– 32-bit numeric SSRC identifier for source of a stream of RTP packets.

    This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier.

    example in packet above : Synchronization Source identifier: 0x78006c62 (2013293666)

    All packets from a synchronisation source form part of the same timing and sequence number space, so a receiver groups packets by synchronisation source for playback.

    Binding of the SSRC identifiers is provided through RTCP. If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

    Contributing source (CSRC) – A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the SSRC identifiers of the sources, called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet.

    An example application is – audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

    Constructor of an RTPpacket object from header fields and payload bitstream


    Thereading and Queues by RTP stacks

    Reception and transmission queues handled are by the RTP stack.

    Packet Reception – Application does not directly read packets from sockets but gets them from a reception queue. RTP stack is responsuble for updating this queue.

    Packt transmission – Packets are not directly written to sockets but inserted in a transmission queue handled by the stack.

    Incoming packet queue takes care of functions such as packet reordering or filtering out duplicate packets.

    Threading model – Most libraries uses separate execution thread for each RTP session handling the queues.

    RTSP (Real-Time Streaming Protocol)

    RTSP is is Streaming Session protocol using RTP. It is also a network control protocol which uses TCP to maintain an end-to-end connection. Session protocols are actually negotiation/session establishment protocols that assist multimedia applications.

    Applications : control real-time streaming media applications such as live audio and HD video streaming.
    RTSP establishes a media session between RTSP end-points ( can be 2 RTSP media servers too) and initiates RTP streams to deliver the audio and video payload from the RTSP media servers to the clients.

    Flow for RTSP stream between client and Server

    1. Initialize the RTP stack on Server and Client – Can be done by calling the constructor for object and ind initilaizing object with arguments

    At Server

    Server rtspserver = new Server();

    At client

    Client rtsplient = new Client();

    2. Initiate TCP connection with the client and server respectively (via socket ) for the RTSP session

    At Server

    ServerSocket listenSocket = new ServerSocket(RTSPport);
    rtspserver.RTSPsocket = listenSocket.accept();
    rtspserver.ClientIPAddr = rtspserver.RTSPsocket.getInetAddress();

    At Client

    rtspclient.RTSPsocket = new Socket(ServerIPAddr, RTSP_server_port);

    3. Set input and output stream filters

    RTSPBufferedReader = new BufferedReader(new InputStreamReader(theServer.RTSPsocket.getInputStream()));
    RTSPBufferedWriter = new BufferedWriter(new OutputStreamWriter(theServer.RTSPsocket.getOutputStream()));

    4. Parse and Reply to RTSP commands

    ReadLine from RTSPBufferedReader and parse tokens to get the RTSP request type

    request = rtspserver.parse_RTSP_request();

    On receiving each request send the appropriate response using RTSPBufferedWriter


    Request Can be either of DESCRIBE, SETUP , PLAY, PAUSe , TEARDOWN

    4. TEARDOWN RTSP Command

    Either calls destructor which release the resources and end the session or call the BYE explicietly and close sockets


    RTP processing

    1. At Transmitter ( Server) – packetization of the video data into RTP packets.

    This involves creating the packet, set the fields in the packet header, and copy the payload (i.e., one video frame) into the packet.

    Get next frame to send from the video and build the RTP packet

    RTPpacket rtp_packet = new RTPpacket(MJPEG_TYPE, imagenb, imagenb * FRAME_PERIOD, buf, video.getnextframe(buf));

    RTP header formation from above accept parameters – PType, SequenceNumber, TimeStamp , buffer byte[] data and data_length of next frame in buffer go in the packet

    3. TransmitterRetrieve the packet bitstream and store it in an array of bytes and send it as Datagram packet over UDP socket

    senddp = new DatagramPacket(packet_bits, packet_length, ClientIPAddr, RTP_dest_port);

    4. At Receiver – construct a new DatagramSocket to received RTP packets, on client’s RTP port

    rcvdp = new DatagramPacket(buf, buf.length);

    5. Receiver RTP packet header and payload retrival

    RTPpacket rtp_packet = new RTPpacket(rcvdp.getData(), rcvdp.getLength());
    rtp_packet.getpayload(payload); // payload is bitstreams

    6. Decode the payload as image/ video frame / audio segment and send for consumption by player or file or socket etc.

    RTCP (Real-Time Transport Control Protocol )

    Real-time Transport Control Protocol (RTCP) defined in RFC 3550, is used to send control packets and feedback on QoS to participants in a call along with RTP which sends actual media packets.

    RTCP provides monitoring of the data delivery, qos in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.

    Control and Management

    • Periodic transmission of control packet
    • Monitor data deliver on large multicast networks
    • Underlying protocol must provide multiplexing of the data and control packets
    • Provide feedback on the quality of the data distribution , congestion control, fault diagnosis , control of adaptive encoding
    • Observer for number of participants to rate of sending packets for scaling up
    • convey minimal session control information

    Gathers statistics on media connection

    Bytes sent, packets sent, lost packets, jitter, feedback and round trip delay.
    Application may use this information to increase the quality of service, perhaps by limiting flow or using a different codec.

    RTCP often uses the next consecutive port as RTP. Example screenshot shows port 20720 for RTP

    And next consecutive port 20721 fo RTCP

    When RTCP is not being used or the CNAME identifier corresponding to a synchronization source has not been received yet, the participant associated with a synchronization source is not known.

    Types of RTCP packet

    1. SR: Sender report, for transmission and reception statistics from
      participants that are active senders
    2. RR: Receiver report, for reception statistics from participants
      that are not active senders and in combination with SR for
      active senders reporting on more than 31 sources
    3. SDES: Source description items, including CNAME
    4. BYE: Indicates end of participation
    5. APP: Application-specific functions

    The SR: Sender Report RTCP Packet

    Sender Report RTCP PAcket

    Expanded Sender Report RTCP Packet

    SR Report in RTCP

    Explanation for some attributes

    • fraction lost: 8 bits size , fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent
    • cumulative number of packets lost: 24 bits size , total number of RTP data packets from source SSRC_n that have been lost since the beginning of reception.
    • interarrival jitter: 32 bits , estimate of the statistical variance of the RTP data packet interarrival time, measured in timestamp unit

    RR: Receiver Report RTCP Packet

    SDES: Source Description RTCP Packet

    SDES items can contain

    CNAME: Canonical End-Point Identifier SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | CNAME=1 | length | user and domain name …

    NAME: User Name SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | NAME=2 | length | common name of source …

    EMAIL: Electronic Mail Address SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | EMAIL=3 | length | email address of source …

    PHONE: Phone Number SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | PHONE=4 | length | phone number of source …

    LOC: Geographic User Location SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | LOC=5 | length | geographic location of site …

    TOOL: Application or Tool Name SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | TOOL=6 | length |name/version of source appl. …

    NOTE: Notice/Status SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    | NOTE=7 | length | note about the source …

    PRIV: Private Extensions SDES Item

    0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PRIV=8 | length | prefix length |prefix string... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... | value string ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    BYE: Goodbye RTCP Packet

    APP: Application-Defined RTCP Packet

    Intended for experimental use

    Instance of RTCP sender and receiver reports on transmission and reception statistics

    Real-time Transport Control Protocol (Receiver Report)
        [Stream setup by SDP (frame 4)]
            [Setup frame: 4]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 0001 = Reception report count: 1
        Packet type: Receiver Report (201)
        Length: 7 (32 bytes)
        Sender SSRC: 0x796dd0d6 (2037240022)
        Source 1
            Identifier: 0x00000000 (0)
            SSRC contents
                Fraction lost: 0 / 256
                Cumulative number of packets lost: 1
            Extended highest sequence number received: 6534
                Sequence number cycles count: 0
                Highest sequence number received: 6534
            Interarrival jitter: 0
            Last SR timestamp: 0 (0x00000000)
            Delay since last SR timestamp: 0 (0 milliseconds)
    Real-time Transport Control Protocol (Source description)
        [Stream setup by SDP (frame 4)]
            [Setup frame: 4]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 0001 = Source count: 1
        Packet type: Source description (202)
        Length: 6 (28 bytes)
        Chunk 1, SSRC/CSRC 0x796DD0D6
            Identifier: 0x796dd0d6 (2037240022)
            SDES items
                Type: CNAME (user and domain) (1)
                Length: 8
                Text: 796dd0d6
                Type: NOTE (note about source) (7)
                Length: 5
                Text: telecomorg
                Type: END (0)

    Negative Acknowledgement (NACK) packets can be used to explicitly indicate that packets have not been received.

    Full Intra Request (FIR) and Picture Loss Indication (PLI) packets are used for video to indicate that there is a need for the sender to produce a refresh point( key frame) in the stream.

    Receiver-Estimated Maximum Bitrate (REMB) feedback packets signal to a sender the maximum bitrate a receiver wishes to receive.

    Transport-wide Congestion Control (TCC) feedback packets are used to provide detailed packet-by-packet reception information from a receiver to the sender.

    Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)

    RTP provides continuous feedback about the overall reception quality from all receivers — thereby allowing the sender(s) in the mid-term to adapt their coding scheme and transmission behaviour to the observed network quality of service (QoS). And also perform

    RTP makes no provision for timely feedback that would allow a sender to repair the media stream immediately: through retransmissions, retroactive Forward Error Correction (FEC) control, or media-specific mechanisms for some video codecs, such as reference picture selection.

    Components of RTCP based feedback

    • Status reports contained in sender report (SR)/received report (RR) packet transmitted at regular intervals . Can also contain SDES
    • FB ( Feedback ) messages . Indicate loss or reception of particular pieces of a media stream

    Types of RTCP Feedback packet

    Minimal compound RTCP feedback packet

    minimize the size of the RTCP packet transmitted to convey feedback
    maximize the frequency at which feedback can be provided
    MUST contain only the mandatory information :

    • encryption prefix if necessary,
    • exactly one RR or SR,
    • exactly one SDES with only the CNAME item present, and
    • FB message(s)

    Full compound RTCP feedback packet

    MAY contain any additional number of RTCP packet

    RTCP operation modes

    1. Immediate Feedback mode
    2. Early RTCP mode
    3. Regular RTCP Mode

    The Application specific feedback threshold is a function of a number of parameters including (but not necessarily limited to):

    • type of feedback used (e.g., ACK vs. NACK),
    • bandwidth,
    • packet rate,
    • packet loss
    • probability and distribution,
    • media type,
    • codec, and
    • (worst case or observed) frequency of events to report (e.g., frame received, packet lost).

    To read on SRTP session with RTP/SAVP and crypto attributes , read

    SRTP (Secure Real-time Transport Protocol)

    Neither RTP or RTCP provide any flow encryption or authentication means, which is where SRTP comes into picture.

    SRTP is the security layer which resides between the RTP/RTCP application layer and the transport layer. It provides confidentiality, message authentication, and replay protection for both unicast and multicast RTP and RTCP streams.

    SRTP Packet

    Cryptographic context includes includes

    • session key used directly in encryption/message authentication
    • master key securely exchanged random bit string used to derive session keys
    • other working session parameters ( master key lifetime, master key identifier and length, FEC parameters, etc)
      it must be maintained by both the sender and receiver of these streams.

    Salting keys” are used to protect against pre-computation and time-memory tradeoff attacks.

    To learn more about SRTP specifically visit :

    RTP in a VoIP Communication system and Conference streaming


    client encodes the same audio/video stream twice in different resolutions and bitrates and sending these to a router who then decides who receives which of the streams.

    Multicast Audio Conference

    Assume obtaining a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets. The audio conferencing application used by each conference participant sends audio data in small chunks of ms duration. Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.

    The RTP header indicates what type of audio encoding (such as PCM, ADPCM or LPC) is contained in each packet so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.

    Every packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. Thus RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source. The sequence number can also be used by the receiver to estimate how many packets are being lost.

    For QoS, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP(control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits.

    A site sends the RTCP BYE packet when it leaves the conference.

    Audio and Video Conference

    Audio and video media are transmitted as separate RTP sessions, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name in the RTCP packets for both so that the sessions can be associated.

    Synchronized playback of a source’s audio and video is achieved using timing information carried in the RTCP packets

    Layered Encodings

    In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
    Rate-adaptation should be done by a layered encoding with a layered transmission system.

    In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

    Mixers , Translators and Monitors

    Note that in a VOIP system, where SIP is a signaling protocol , a SIP signalling proxy never participates in the media flow, thus it is media agnostic.


    An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

    Example of Mixer for hi-speed to low-speed packet stream conversion . In conference cases where few participants are connected through a low-speed link where other have hi-speed link, instead of forcing lower-bandwidth, reduced-quality audio encoding for all, an RTP-level relay called a mixer may be placed near the low-bandwidth area.
    This mixer resynchronises incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed links.

    All data packets originating from a mixer will be identified as having the mixer as their synchronization source.
    The RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.


    An intermediate system that forwards RTP packets with their synchronization source identifier intact.

    Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls.

    Tranasltor for Firewall Limiting IP packet pass

    Some of the intended participants in the audio conference may be connected with high bandwidth links but might not be directly reachable via IP multicast, for reasons such as being behind an application-level firewall that will not let any IP packets pass. For these sites, mixing may not be necessary, in which case another type of RTP-level relay called a translator may be used.

    Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through asecure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site’s internal network.

    Other cases :

    video mixers can scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

    Translator usage when connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.


    An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics.

    Layered Encodings

    In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
    Rate-adaptation should be done by a layered encoding with a layered transmission system.

    In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

    Multiplexing RTP Sessions

    In RTP, multiplexing is provided by the destination transport address (network address and port number) which is different for each RTP session ( seprate for audio and video ). This helps in cases where there is chaneg in encodings , change of clockrates , detection of packet loss suffered and RTCP reporting .
    Moreover RTP mixer would not be able to combine interleaved streams of incompatible media into one stream.

    Interleaving packets with different RTP media types but using the same SSRC would introduce several problems.
    But multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions.

    REMB ( Receiver Estimated Maximum Bitrate)

    RTCP message used to provide bandwidth estimation in order to avoid creating congestion in the network.
    support for this message is negotiated in the Offer/Answer SDP Exchange.

    contains total estimated available bitrate on the path to the receiving side of this RTP session (in mantissa + exponent format).
    used by sender to configure the maximum bitrate of the video encoding.

    also notify the available bandwidth in the network and by media servers to limit the amount of bitrate the sender is allowed to send.

    In Chrome it is deprecated in favor of the new sender side bandwidth estimation based on RTCP Transport Feedback messages.

    Session Description Protocol (SDP) Capability Negotiation

    SDP Offer/Answer flow

    RTP can carry multiple formats.For each class of application (e.g., audio, video), RTP defines a profile and associated payload formats. Session Description Protocol used to specify the parameters for the sessions.

    Usually in voIP systems SDP packets describing a session with codecs , open ports , media formats etc are embedded in a SIP request such as INVITE .

    SDP can negotiate use of one out of several possible transport protocols. The offerer uses the expected least-common-denominator (plain RTP) as the actual configuration, and the alternative transport protocols as the potential configurations.

    m=audio 53456 RTP/AVP 0 18

    plain RTP (RTP/AVP)
    Secure RTP (RTP/SAVP)
    RTP with RTCP-based feedback (RTP/AVPF)
    Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)

    Adaptive bitrate control

    Adapt the audio and video codec bitrates to the available bandwidth, and hence optimize audio & video quality
    For video, since resolution is chosen at the start only , encoder use bitrate and frame-rate attributes only during runtime to adapt

    RTCP packet called TMMBR (Temporary Maximum Media Stream Bit Rate Request) is sent to the remote client


    SIP conferencing and Media Bridges

    SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

    Role of SIP in conference involves

    • initiating confs
    • inviting participants
    • enabling them to join conf
    • leave conf
    • terminate conf
    • expel participants
    • configure media flow
    • control activities in conf

    Mesh vs star topology

    Yes Mesh has p2p streaming so maximum data privacy and low cost for service provider because tehre arnt any media stream to take care of. Infact it just comes out of the box with WebRTC peerconnections .

    But ofcourse you cant scale a p2p mesh based archietcture . Although the communication provider is now indifferent to the media stream traffic , the call quality of session is entirely dependent of the end clients processing and their bandwidths which in my experince caanot accomodate more than 20-25 particpants in a call even above average bandwidth of 30-40 Mbps uplink , downlink both.

    On the other hand in a star topolgy the participants only need to communicate with the media server , irrrespective of the network conditions of the receivers .

    Centralised ( star) structure

    In a Centralised ( star) signalling model , all communication flows via a centralised control point

    Decentralised ( mesh) structure

    In a decentralised ( mesh) signalling structure , participants can communicate p2p

    Unicast vs Multicast Media Distribution

    Decentralised Media , Multi unicast streaming

    Decentralised media , Multicast streaming

    Centralised Media / MCU

    Inspite of both being a star topology , SFU/Selective Forwarding Unit is different from MCU as in contrast to MCU it does not do any heavy duty processing on media streams , it only fetches the stream and routes them to other peers .

    On the other hand MCU ( Multipoint Control Unit ) media servers need a lot of computational strength to perform many operations on RTP stream such as mixing , multiplexing, filytering echo /noise etc.

    Scalable Video Coding (SVC) for large groups

    while simulcast streams multiple versions of the same stream with differenet qualities like resolutions where the SFU can pick the appropriate one for the destination. SFU can also forward different framerates to differnrt detsinations absed on their bandwidth

    Conference types

    1. Bridge

    Centralised entity to book conf , start conf , leave conf . Therefore single point of failure potentially .

    To create conf : conf created on a bridge URL , bridge registers on SIP Server, participants join the conf on the bridge using INVITES

    To stop conf : either participant can Leave with BYE or conf can terminate by sending BYE to all

    2. Endpoints as Mixer

    Endpoints handle stream , decentralised media , therefore adhoc suited

    mixer UAs cannot leave untill conf finishes

    3. Mesh

    complex and more processing power on each UA required

    no single point of failure but endpoints have to handle NATIng

    Limitations of WebRTC mesh Archietcture

    WebRTC is intrinsically a p2p system and as more participants join the session , the network begins to resemble a mesh. Audio and textual data being the lighter option from heavy video media streams can still adjust to the difficult conditions without much noticible lag. However video streams take a hit when peers are on difficult bandwidth and use differnt qualities of video sources.

    Lets assume 3 different clients communication on WebRTC mesh session

    1. WebRTC browser on high resolution system ( desktop , laptop , kiosk) – this client will likely have high quality stream and would like to consume high quality as well
    2. Mobile browser of native WebRTC client – this will have aberage quality stream and may fluctuate owing to telecom network handover or instability in moving beween locations
    3. Embedded system like Raspberry pi with camera module – since this is an embedded system likley part of IoT survillance system , it will try to restrict the outgoing usuage and incoming stream consumption to minimal.

    Some issue with WebRTC mesh conference include

    • Unmatched quality of stream for idnividual p2p streams in mesh make it difficult to have a homogenous session quality.
    • Often video packet go out of sync with audio packets leading to delay or freezing due to packet loss.
    • Pixelating video when resolution of incoming video does not match the viewers display mode eg : low quality 320×280 pixel video viewed on desktop monitor with 1080×720 resolution.
    • Different source encoders at peers WebRTC client behave different . eg : webrtc stream from an embedded system like Rpi will be different from that of a WebRTC browser like Safari or mozilla or a mobile browser like chrome on Android.

    Although with auto adjustments in WebRTC’s media stack , combinations of bitrate and resolution are manipulated in realtime based on feedback packets to adjust the qualities of your video streaming to bandwidth constraints of your own and the peer, there exist many difficulties to have large number of partcipants ( in order of few tens to hundreds) to join the mesh session. Even with an excellent connection and great scale of bandwidth of 5G networks it is just not feasible to host even upto 100 users on a common mesh based video system.

    Large scale multiparticipant WebRTC sessions

    A MCU ( Media control Unit) which acts as a bridge between all particpants is a tradiotionally used system to host large conferences. However a MCU limits or lowers the bandwidth usuage by packing the streams together .

    A SFU ( Single Forwarding Unit ) on the other hand simply forwards the stream.

    This setup is usualy designed with heavy bandwidth and upload rates in mind and are more scalable and resilient to bad quality stream than p2p type mesh setups. As these media gateways servers scale to accomodate more simultanous real time users , their bandwidth consumption is heavy and expensive( some thing to be kept in mind while buying instances fro cloud providers like azure or AWS).

    Some of the many options to make SFU (single forwarding unit setup) for WebRTC mediastreams include


    Opensource (Apache 2.0) WebRTC gateways that has buildin integration to OpenCV.

    Features in KMS ( Kurento Media Server) include Augmentation, face reciognition , filetrs , Object tracking and even virtual fencing.

    Other features like mixing , transcoding, recording as well as client APIs make it suitable for integration into rich multimedia applications.

    It can function as both MCU and SFU.

    Nightly build , good docuemntion and developer gtraction make this a good choice. Latest version at the time of writing this article is Kurento 6.15.0 release on november 2020.


    Opensource (MIT) WebRTC Comm platform by Lynckia.

    Simple and starightforward to build from source . Latest release is v8 on sep 2019.

    Erizo, its WebRTC core, is by default is SFU but also can be switched to MCU for more features like output streaming , tgranscoding.

    It is written in c++. and uses nodejs API to communicate with server.

    Supports modules like recording which can be added.


    Opensource (Aapache 2.0)Video conferecing called Jitsi Video Bridge ( jvb) . Supports high capacity SFU.

    Provides tools ( jibri) for recording and/or streaming . Also has Android and iOS SDKs.

    It is best used as a binary package on debina / ubuntu instead of self Maven compile.

    Orignally uses XMPP signalling but can communicate with SIP platfroms using a gateway which is part of Jitsi project .

    The most recent release is 2.0.5390 release on 12 Jan 2021


    Opensource ( ISC) SFU conferecing server for both WebRTc and plain non secured RTP.

    It is signalling agnostic .

    Relatively new with less documentation however simpleistic and minimilistic deisgn make it easy to grasp and run

    Provides JS and c++ client libraries

    Latets release v3 on March 2021.


    MCU based pure SIP signalling and media server ( GNU GPL v2 ) from Sangoma Technologies.

    Powerful server core to many OTT / VOIP providers and call centre platfroms.

    Can be modified to any role using combination of hundres of modules.

    Project does not provide client SDK.

    Latest is verion 18.x release on Oct 2020.


    WebRTc gateway is also opensource ( GNU GPL v3)

    Build on C. It does have ability to switch between SFU and MCU and provides pligins on top like recording.

    By default uses a Websocket based protocol but can communicate with SIP platfroms too

    Video analytics

    Today , there are billions of  of video cameras in our homes,  phones , ATMs ,  baby monitors , laptops , smart watches , traffic monitoring , IOT devices , bots , you name it. The underlying purpose of most of them  is to capture media streams and optimize the content for further processing.

    Stages of video Analytics-

    1.Data Acquisition

    Data gathered from multiple camera sourced need to be streamed in an aggregated manner to data processing unit in an analytics engine  or for archiving . It may be subjected to overall monitoring such as sunlight hours etc  or detailed low level object identification  such as facial recognition of passengers

    2.Transformation to Data Sets

    The assimilated data is grouped to operable entities. The identification and classification is done by adding attributes to recognizable shapes ,movements , edges and patterns  .

    3.Calculate deviation or compliance

    A trained model recognizes normal behavior and differentiation from the same is calculated .


    Video content Analytics in surveillance 

    Considering the use case for monitoring and surveillance cameras  , there is a growing need for realtime video analytics  , to ” detect and determine the temporal and spatial events “. Consider Surveillance cam recordings as forensic evidence or just monitoring incidents and reporting the specific crucial segments of video , both these usecases involve filtering a vast amount of recorded or steaming media to filter out the exact parts that the authorities are looking for. This involves custom identification and recognition of objects in frames .

    There is growing research into extracting real time events of interest , minimizing the search time and maximizing accuracy from computer vision .

    Consider following use-cases :

    1. Surveillance cam in solar farms or home based setups to predict sun light hours and forecast energy generation value  . Described in greater details here .
    2. Traffic monitoring cameras :
    • Automatic license / number plate recognition – surveillance cams for traffic need to record vehicle plate number to identify and tag the vehicles as they pass by  .
    • Car  dashboard cams for investigative purposes  post accidents and insurance claims
    • Motion tracking – Mapping the vehicle movement to detect any wrong turns , overtakes , parking etc
    • Scan for QR codes and passes at toll gates.
    • Identifying over-speeding vehicles

    3. Security and Law enforcement

    • Trigger alarms or lockdowns  on suspicious activity or intrusion into safe facility
    • Virtual fencing and perimeter breach – Map facial identification from known suspects
    • Detection of left items and acceleration of emergency response

    Communication based video analytics 

    Unified enterprise communication , conferences , meeting , online webcasts , webinars , social messengers , online project demos are extensively using video analytics for building intuitive use cases and boost innovation around their platform . Few examples of vast number of usecases are

    1. Sentiment Analysis : Capturing emotions by mapping key words to ascertain whether the meeting went , happy , positive , productive or  sad , complaining , negative
    2. Augmented Reality for overlaying information such as interactive manual or a image . Areas for current usage include , e-learning and customer support .
    3. Dynamic masking for privacy

    Autonomous Robot purpose Video analytics

    Self driving drones , cars and even bots extensive use the feed from wide angle / fish eye lens cameras to create a 3D model of their movement in given space of 3 dimensional coordinates system.

    Key technologies includes :

    1. Ego-motion estimation – mapping a 3D space with captured camera feed
    2. Deep Learning ( part of AI) from continuous feed from video cameras to find route around obstacles
    3. Remote monitoring for an unmanned vehicle
    4. Sterile monitoring for a unreachable or hazardous area example war-zone , outer  territorial objects as moon , mars , satellites

    Bio-metrics based Video analytics 

    Often video feed is now used for advanced search, redaction and facial recognition , which leads to features such as

    • unlocking laptop or phone
    • performing click with blink of eyes
    • creating concentration maps of webpage based on where eyes focused

    Read more about role of webrtc in bio- metrics here

    Video analytics in Industrial and Retail applications 

    Application of video analytics in Industrial landscape are manyfold . On one hand it can be for intelligence and  information gathering such as worker foot count . Machine left unattended etc while on the other hand by using specific image optimization techniques can also audit automation testing of engines , machine parts , rotation counts etc .

    1. Flame and Smoke Detection – images from video streams are analysed for color chrominance, flickering ratio, shape, pattern and moving direction to ascertain a fire hazard.
    2. Collect demographics of the area with people counting
    3. Ensure quality control and procedural compliance
    4. Identify tail-gateing or loitering


    List of few companies focusing on Video Analytics :

    1. Avigilon -
    2. 3VR –
    3. Intelli-vision –
    4. IPsotek –
    5. aimetis –


    Edge Analytics

    Performing data analytics at application level on the edge whole system architecture instead of at the core or data warehouse level. The advantage to the computation at fringes of network instead of a centralized system are faster response times and standalone off grid functionality support .

    The humongous data collected over by IOT devices m machinery  , sensors  , servers , firewalls , routers , switches gateways and all other types of components are increasingly getting analysed and acted upon at the edge ,  independently with machine learning instead of data centers and network operation centers . With the help of feedback loops and deep learning one could add data drive intelligence to how operations are performed at critical machines arts such as autonomous bots or industrial setups.

    Error Recovery and streaming

    To control the incoming data stream , we divide and classify the content into packets , add custom classification header and stream them down to the server. In the event of data congestion of back pressure , some non significant packets are either dropped or added to a new dead queue .  The system  is thus made stable for high availability and BCP / fail-over recovery .




    GStreamer ( LGPL )ia a media handling library written in C for applicatioan such as streaming , recording, playback , mixing and editing attributes etc. Even enhnaced applicaiosn such as tsrancoding , media ormat conversion , streaming servers for embeeded devices ( read more about Gstreamer in RPi in my srticle here).
    It encompases various codecs, filters and is modular with plugins developement to enhance its capabilities. Media Streaming application developers use it as part of their framework at either the broadcaster’s end or as media player.

    gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

    More detailed reading :

    GStreamer-1.8.1 rtsp server and client on ubuntu – Install and configuration for a RTSP Streaming server and Client

    crtmpserver + ffmpeg –

    Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

     attempts of streaming / broadcasting Live Video WebRTC call to non WebRTC supported browsers and media players such as VLC , ffplay , default video player in Linux etc .

    continue : Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

    httontinuation to the attempts / outcomes and problems in building a WebRTC to RTP media framework that successfully stream / broadcast WebRTC content to non webrtc supported browsers ( safari / IE ) / media players ( VLC )

    TO continue with basics of gstreamer keep reading

    To list all packages of Gstreamer

    pkg-config --list-all | grep gstreamer
    • gstreamer-gl-1.0 GStreamer OpenGL Plugins Libraries – Streaming media framework, OpenGL plugins libraries
    • gstreamer-bad-video-1.0GStreamer bad video library – Bad video library for GStreamer elements
    • gstreamer-tag-1.0 GStreamer Tag Library – Tag base classes and helper functions
    • gstreamer-bad-base-1.0 GStreamer bad base classes – Bad base classes for GStreamer elements
    • gstreamer-net-1.0GStreamer networking library – Network-enabled GStreamer plug-ins and clocking
    • gstreamer-sdp-1.0 GStreamer SDP Library – SDP helper functions
    • gstreamer-1.0 GStreamer – Streaming media framework
    • gstreamer-bad-audio-1.0 GStreamer bad audio library, uninstalled – Bad audio library for GStreamer elements, Not Installedgstreamer-allocators-1.0 GStreamer Allocators Library – Allocators implementation
    • gstreamer-player-1.0 GStreamer Player – GStreamer Player convenience library
    • gstreamer-insertbin-1.0 GStreamer Insert Bin – Bin to automatically and insertally link elements
    • gstreamer-plugins-base-1.0 GStreamer Base Plugins Libraries – Streaming media framework, base plugins libraries
    • gstreamer-vaapi-glx-1.0 GStreamer VA-API (GLX) Plugins Libraries – Streaming media framework, VA-API (GLX) plugins librariesgstreamer-codecparsers-1.0 GStreamer codec parsers – Bitstream parsers for GStreamer elementsgstreamer-base-1.0 GStreamer base classes – Base classes for GStreamer elements
    • gstreamer-app-1.0 GStreamer Application Library – Helper functions and base classes for application integration
    • gstreamer-vaapi-drm-1.0 GStreamer VA-API (DRM) Plugins Libraries – Streaming media framework, VA-API (DRM) plugins librariesgstreamer-check-1.0 GStreamer check unit testing – Unit testing helper library for GStreamer modules
    • gstreamer-vaapi-1.0 GStreamer VA-API Plugins Libraries – Streaming media framework, VA-API plugins libraries
    • gstreamer-controller-1.0 GStreamer controller – Dynamic parameter control for GStreamer elements
    • gstreamer-video-1.0 GStreamer Video Library – Video base classes and helper functions
    • gstreamer-vaapi-wayland-1.0 GStreamer VA-API (Wayland) Plugins Libraries – Streaming media framework, VA-API (Wayland) plugins libraries
    • gstreamer-fft-1.0 GStreamer FFT Library – FFT implementation
    • gstreamer-mpegts-1.0 GStreamer MPEG-TS – GStreamer MPEG-TS support
    • gstreamer-pbutils-1.0 GStreamer Base Utils Library – General utility functions
    • gstreamer-vaapi-x11-1.0 GStreamer VA-API (X11) Plugins Libraries – Streaming media framework, VA-API (X11) plugins libraries
    • gstreamer-rtp-1.0 GStreamer RTP Library – RTP base classes and helper functions
    • gstreamer-rtsp-1.0 GStreamer RTSP Library – RTSP base classes and helper functions
    • gstreamer-riff-1.0 GStreamer RIFF Library – RIFF helper functions
    • gstreamer-audio-1.0 GStreamer Audio library – Audio helper functions and base classes
    • gstreamer-plugins-bad-1.0 GStreamer Bad Plugin libraries – Streaming media framework, bad plugins libraries
    • gstreamer-rtsp-server-1.0 gst-rtsp-server – GStreamer based RTSP server

    At the time of writing this article Gstreamer an much early version in 1.X , which was newer than its then stable version 0.x. Since then the library has updated many fold. summarising release highlights for major versions as the blog was updated over time .

    Project : Making and IP survillance system using gstreamer and Janus

    To build a turn-key easily deployable surveillance solution 

    Features :

    1. Paring of Android Mobile with box
    2. Live streaming from Box to Android
    3. Video Recording inside the  box
    4. Auto parsing of recorded video around motion detection 
    5. Event listeners 
    6. 2 way audio
    7. Inbuild Media Control Unit
    8. Efficient use of bandwidth 
    9. Secure session while live-streaming


    1. Authentication ( OTP / username- password)
    2. Livestreaming on Opus / vp8 
    3. Session Security and keepalives for live-streaming sessions
    4. Sync local videos to cloud storage 
    5. Record and playback with timeline and events 
    6. Parsing and restructuring video ( transcoding may also be required ) 
    7. Coturn server for NAT and ICE
    8. Web platform on box ( user interface )+ NoSQL
    9. Web platform on Cloud server ( Admin interface )+ NoSQL
    10.  REST APIs for third party add-ons ( Node based )
    11. Android demo app for receiving the live stream and feeds

    Varrying experiments and working gstreamer commands

    Local Network Stream 

    To create /dev/video0

    modprobe bcm2835-v4l2

    To stream on rtspserver using rpicamsrc using h264 parse

    ./gst-rtsp-server-1.4.4/examples/test-launch --gst-debug=2 '(rpicamsrc num-buffers=5000 ! 'video/x-h264,width=1080,height=720,framerate=30/1' ! h264parse ! rtph264pay name=pay0 pt=96 )'

    ./test-launch “( tcpclientsrc host= port=5000 ! gdpdepay ! rtph264pay name=pay0 pt=96 )”

    pipe raspivid to tcpserversink

    raspivid -t 0 -w 800 -h 600 -fps 25 -g 5 -b 4000000 -vf -n -o - | gst-launch-1.0 -v fdsrc ! h264parse ! gdppay ! tcpserversink host= port=5000;

    Stream Video over local Network with 15 fps

    raspivid -n -ih -t 0 -rot 0 -w 1280 -h 720 -fps 15 -b 1000000 -o - | nc -l -p 5001

    streaming video over local network with 30FPS and higher bitrate

    raspivid -n -t 0 -rot 0 -w 1920 -h 1080 -fps 30 -b 5000000 -o - | nc -l -p 5001


    Audio record to file
    Using arecord :

    arecord -D plughw:1 -c1 -r 48000 -f S16_LE -t wav -v file.wav;

    Using pulse :
    pulseAudio -D

    gst-launch-1.0 -v pulsesrc device=hw:1 volume=8.0 ! audio/x-raw,format=S16LE ! audioconvert ! voaacenc bitrate=48000 ! aacparse ! flvmux ! filesink location = "testaudio.flv";

    Video record to file ( mpg)

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! 'video/x-h264,width=640,height=480’ ! mux. avimux name=mux ! filesink location=testvideo2.mpg;

    Video record to file ( flv )

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! flvmux ! filesink location="testvieo.flv";

    Video record to file ( h264)
    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! filesink location=”raw3.h264″;

    Video record to file ( mp4)

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! mp4mux ! filesink location=video.mp4;

    Audio + Video record to file ( flv)

    gst-launch-1.0 -e /
    rpicamsrc bitrate=500000 ! /
    video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! muxout. /
    pulsesrc volume=8.0 ! /
    queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. /
    flvmux name=muxout streamable=true ! filesink location ='test44.flv';

    Audio + Video record to file ( flv) using pulsesrc

    gst-launch-1.0 -v --gst-debug-level=3 pulsesrc device="alsa_input.platform-asoc-simple-card.0.analog-stereo" volume=5.0 mute=FALSE ! audio/x-raw,format=S16LE,rate=48000,channels=1 ! audioresample ! audioconvert ! voaacenc ! aacparse ! flvmux ! filesink location="voicetest.flv";

    Audio + Video record to file (mp4)

    gst-launch-1.0 -e /
    rpicamsrc bitrate=500000 ! /
    video/x-h264,width=320,height=240,framerate=10/1 !s h264parse ! muxout. /
    pulsesrc volume=4.0 ! /
    queue ! audioconvert ! voaacenc ! muxout. /
    flvmux name=muxout streamable=true ! filesink location = 'test224.mp4';


    stream raw Audio over RTMP to srtmpsink

    gst-launch-1.0 pulsesrc device=hw:1 volume=8.0 ! /
    audio/x-raw,format=S24LE ! audioconvert ! voaacenc bitrate=48000 ! aacparse ! flvmux ! rtmpsink location = “rtmp://”;

    stream AACpparse Audio over RTMP to srtmpsink

    gst-launch-1.0 -v --gst-debug-level=3 pulsesrc device="alsa_input.platform-asoc-simple-card.0.analog-stereo" volume=5.0 mute=FALSE ! audio/x-raw,format=S16LE,rate=48000,channels=1 ! audioresample ! audioconvert ! voaacenc ! aacparse ! flvmux ! rtmpsink location="rtmp://";

    stream Video over RTMP

    gst-launch-1.0 -e rpicamsrc bitrate=500000 ! /
    video/x-h264,width=320,height=240,framerate=6/1 ! h264parse ! /
    flvmux ! rtmpsink location = ‘rtmp:// live=1’;

    stream Audio + video over RTMP from rpicamsrc , framerate 10

    gst-launch-1.0 rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! muxout. pulsesrc volume=8.0 ! queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. flvmux name=muxout streamable=true ! rtmpsink location ='rtmp://';

    stream Audio + video over RTMP from rpicamsrc , framerate 30

    gst-launch-1.0 rpicamsrc bitrate=500000 ! video/x-h264,width=1280,height=720,framerate=30/1 ! h264parse ! muxout. pulsesrc ! queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. flvmux name=muxout ! queue ! rtmpsink location ='rtmp://';

    VOD ( video On Demand )

    Stream h264 file over RTMP

    gst-launch-1.0 -e filesrc location="raw3.h264" ! video/x-h264 ! h264p
    arse ! flvmux ! rtmpsink location = 'rtmp://';

    Stream flv file over RTMP

    gst-launch-1.0 -e filesrc location=”testvieo.flv” ! /
    video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! /
    flvmux ! rtmpsink location = 'rtmp://';

    Github Repo for Livestreaming

    Contains code for Android and ios Publishers , players on various platforms including HLS and Flash , streamings servers , Wowza playing mosules , webrtc broadcast

    Gstreamer 1.8.0 – 24 March 2016

    Features Hardware-accelerated zero-copy video decoding on Android

    New video capture source for Android using the android.hardware.Camera API

    Windows Media reverse playback support (ASF/WMV/WMA)

    tracing system provides support for more sophisticated debugging tools

    high-level GstPlayer playback convenience API

    Initial support for the new Vulkan API

    Improved Opus audio codec support: Support for more than two channels; MPEG-TS demuxer/muxer can handle Opus; sample-accurate encoding/decoding/transmuxing with Ogg, Matroska, ISOBMFF (Quicktime/MP4), and MPEG-TS as container; new codec utility functions for Opus header and caps handling in pbutils library. The Opus encoder/decoder elements were also moved to gst-plugins-base (from -bad), and the opus RTP depayloader/payloader to -good.

    Asset proxy support in the GStreamer Editing Services

    GStreamer 1.16.0 – 19 April 2019.

    GStreamer WebRTC stack gained support for data channels for peer-to-peer communication based on SCTP, BUNDLE support, as well as support for multiple TURN servers.

    AV1 video codec support for Matroska and QuickTime/MP4 containers and more configuration options and supported input formats for the AOMedia AV1 encoder

    Closed Captions and other Ancillary Data in video

    planar (non-interleaved) raw audio

    GstVideoAggregator, compositor and OpenGL mixer elements are now in -base

    New alternate fields interlace mode where each buffer carries a single field

    WebM and Matroska ContentEncryption support in the Matroska demuxer

    new WebKit WPE-based web browser source element

    Video4Linux: HEVC encoding and decoding, JPEG encoding, and improved dmabuf import/export

    Hardware-accelerated Nvidia video decoder gained support for VP8/VP9 decoding, whilst the encoder gained support for H.265/HEVC encoding.

    Improvements to the Intel Media SDK based hardware-accelerated video decoder and encoder plugin (msdk): dmabuf import/export for zero-copy integration with other components; VP9 decoding; 10-bit HEVC encoding; video post-processing (vpp) support including deinterlacing; and the video decoder now handles dynamic resolution changes.

    ASS/SSA subtitle overlay renderer can now handle multiple subtitles that overlap in time and will show them on screen simultaneously

    Meson build feature-complete (with the exception of plugin docs) and it is now the recommended build system on all platforms. The Autotools build is scheduled to be removed in the next cycle.

    GStreamer Rust bindings and Rust plugins module

    GStreamer Editing Services allows directly playing back serialized edit list with playbin or (uri)decodebin

    References :

    crtmpserver + ffmpeg

    This post will show the process of installing , running and using crtmpserver on ubuntu 64 bit machine with gstreamer .

    gcc and cmake

    We shall build gstreamer directly from sources . For this we first need to determine if gcc is installed on the machine .

    If not installed then  run the following command

    GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages( C, C++, Objective-C, Fortran, Java, Ada, Go etc).

    sudo apt-get install build-essential

    once it is isnatlled it can be tested with printing the version

    Screenshot from 2016-06-09 11-24-33.png

    cmake is a software compilation tool.It uses compiler independent configuration files, and generate native makefiles and workspaces that can be used in the differemt compiler environment .


    To get the source code from git install git first . Then clone the project from

    sudo apt-get git
    git clone
    cd crtmpserver/builders/cmake

    Next we create all makefile’s using cmake .

    cmake .

    Output should look as follows

    Screenshot from 2016-06-09 11-47-05

    Run make to do compilation


    Screenshot from 2016-06-09 11-57-19

    Run using following command . If should print out a list of ports and their respecting functions

    ./crtmpserver/crtmpserver crtmpserver/crtmpserver.lua

    | Services|
    | c | ip | port| protocol stack name | application name |
    |tcp|| 1112| inboundJsonCli| admin|
    |tcp|| 1935| inboundRtmp| appselector|
    |tcp|| 8081| inboundRtmps| appselector|
    |tcp|| 8080| inboundRtmpt| appselector|
    |tcp|| 6666| inboundLiveFlv| flvplayback|
    |tcp|| 9999| inboundTcpTs| flvplayback|
    |tcp|| 6665| inboundLiveFlv| proxypublish|
    |tcp|| 8989| httpEchoProtocol| samplefactory|
    |tcp|| 8988| echoProtocol| samplefactory|
    |tcp|| 1111| inboundHttpXmlVariant| vptests|

    If you the following types of errors while pushing a stream to crtmpserver , they just denote they your pipe is not using the correct format.

    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/http/basehttpprotocol.cpp:281 Headers section too long
    /home/altanai/crtmpserver/sources/thelib/src/protocols/http/basehttpprotocol.cpp:153 Unable to read response headers: CTCP(16) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; TCP(13) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; [IHTT(14)] &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; IH4R(15)
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [IH4R(15)]
    /home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [IH4R(15)] unregistered from application: appselector
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/ts/inboundtsprotocol.cpp:211 I give up. I'm unable to detect the ts chunk size
    /home/altanai/crtmpserver/sources/thelib/src/protocols/ts/inboundtsprotocol.cpp:136 Unable to determine chunk size
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [ITS(17)]
    /home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [ITS(17)] unregistered from application: flvplayback
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/rtmp/inboundrtmpprotocol.cpp:77 Handshake type not implemented: 85
    /home/altanai/crtmpserver/sources/thelib/src/protocols/rtmp/basertmpprotocol.cpp:309 Unable to perform handshake
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [IR(19)]
    /home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [IR(19)] unregistered from application: appselector
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;
    /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/inboundliveflvprotocol.cpp:51 _waitForMetadata: 1
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/baseliveflvappprotocolhandler.cpp:45 protocol CTCP(16) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; TCP(20) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; [ILFL(21)] registered to app flvplayback
    /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/inboundliveflvprotocol.cpp:102 Frame too large: 6324058
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
    /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
    /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [ILFL(21)]
    /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/baseliveflvappprotocolhandler.cpp:58 protocol [ILFL(21)] unregistered from app flvplayback


    Download and install ffmpeg from git

     git clone ffmpeg
    cd ffmpeg

    Once the source code is obtained we need to configure , make and make install it .
    We need to have following plugins for muxing and ecoding like libx264 for h264parse , so we configure with the following options

    ./configure \
      --prefix=&amp;amp;amp;amp;amp;quot;$HOME/ffmpeg_build&amp;amp;amp;amp;amp;quot; \
      --pkg-config-flags=&amp;amp;amp;amp;amp;quot;--static&amp;amp;amp;amp;amp;quot; \
      --extra-cflags=&amp;amp;amp;amp;amp;quot;-I$HOME/ffmpeg_build/include&amp;amp;amp;amp;amp;quot; \
      --extra-ldflags=&amp;amp;amp;amp;amp;quot;-L$HOME/ffmpeg_build/lib&amp;amp;amp;amp;amp;quot; \
      --bindir=&amp;amp;amp;amp;amp;quot;$HOME/bin&amp;amp;amp;amp;amp;quot; \
      --enable-gpl \
      --enable-libass \
      --enable-libfreetype \
      --enable-libopus \
      --enable-libtheora \
      --enable-libvorbis \
      --enable-libx264 \
      --enable-libx265 \

    the make and make install

    sudo make install

    Screenshot from 2016-06-09 16-59-49

    Incase of errors  on ffmpeg configure command , you need to install the respective missing / not found library


    sudo apt-get install libass-dev


    sudo apt-get install libmp3lame-dev


    sudo apt-get install autoconf
    sudo apt-get install libtool
    wget -O libaacplus-2.0.2.tar.gz
    tar -xzf libaacplus-2.0.2.tar.gz
    cd libaacplus-2.0.2
    ./ --with-parameter-expansion-string-replace-capable-shell=/bin/bash --host=arm-unknown-linux-gnueabi --enable-static
    sudo make install

    compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channe. It is from the same reank as MPEG4 AAC

    tar -zxvf libvorbis-1.3.2.tar.bz2
    cd libvorbis-1.3.2
    ./configure &amp;amp;amp;amp;&amp;amp;amp;amp; make &amp;amp;amp;amp;&amp;amp;amp;amp; make install

    encoding video streams into the H.264/MPEG-4 AVC compression format, and is released under the terms of the GNU GPL.

    git clone git://
    cd x264
    ./configure --host=arm-unknown-linux-gnueabi --enable-static --disable-opencl
    sudo make install

    libvpx is an emerging open video compression library which is gaining popularity for distributing high definition video content on the internet.

    sudo apt-get install checkinstall
    git clone
    cd libvpx
    sudo checkinstall --pkgname=libvpx --pkgversion=&quot;1:$(date +%Y%m%d%H%M)-git&quot; --backup=no     --deldoc=yes --fstrans=no --default

    librtmp provides support for the RTMP content streaming protocol developed by Adobe and commonly used to distribute content to flash video players on the web.

    sudo apt-get install libssl-dev
    cd /home/pi/src
    git clone git://
    cd rtmpdump
    make SYS=posix
    sudo checkinstall --pkgname=rtmpdump --pkgversion=&quot;2:$(date +%Y%m%d%H%M)-git&quot; --backup=no --deldoc=yes --fstrans=no --default


    Additionally “pkg-config –list-all” command list down all the installed libraries.

    RTMP streaming

    1.start the stream from linux machine using ffmpeg

    ffmpeg -f video4linux2 -s 320x240 -i /dev/video0 -f flv -s qvga -b 750000 -ar 11025 -metadata streamName=aaa "tcp://<hidden_ip>:6666/live";

    Screenshot from 2016-06-11 17-50-02

    2.view the incoming packets and stats on terminal at crtmpserver

    Screenshot from 2016-06-11 17-53-22

    3.playback the livestream from another machine

    using ffplay
    ffplay -i rtmp://server_ip:1935/live/ccc

    Screenshot from 2016-06-09 15-43-58

    RTSP streaming

    1.start the rtsp stream from linux machine using ffmpeg

    here using resolution 320×240 and stream name test

    ffmpeg -f video4linux2 -s 320x240 -i /dev/video0 -an -r 10 -c:v libx264 -q 1 -f rtsp -metadata title=test rtsp://server_ip:5554/flvplayback


    2.view the incoming packets and stats on terminal at crtmpserver

    3.playback the livestream from another machine using


    ffplay rtsp://server_ip:5554/flvplayback/test

    Screenshot from 2016-06-09 18-17-07


    vlc rtsp://server_ip:5554/flvplayback/test



    GStreamer-1.8.1 rtsp server and client on ubuntu

    GStreamer is a streaming media framework, based on graphs of filters which operate on media data.

    Gstreamer is constructed using a pipes and filter architecture.
    The basic structure of a stream pipeline is that you start with a stream source (camera, screengrab, file etc) and end with a stream sink (screen window, file, network etc). The ! are called pads and they connect the filters.

    Data that flows through pads is described by caps (short for capabilities). Caps can be though of as mime-type (e.g. audio/x-raw, video/x-raw) along with mime-type (e.g. width, height, depth).

    Source Code

    Download the latest archives from

    Source code on git :

    Primarily 3 files are required

    1. gstreamer-1.8.1.tar.xz
    2. gst-plugins-base-1.8.1.tar.xz
    3. gst-rtsp-server-1.8.1.tar.xz

    If the destination machine is a ec2 instance one can also scp the tar.xz file there

    To extract the tar.xz files use tar -xf <filename> it will create a folder for each package.



    sudo apt-get install build-essentials



    GLib >= 2.40.0

    GLib package contains low-level libraries useful for providing data structure handling for C, portability wrappers and interfaces for such runtime functionality as an event loop, threads, dynamic loading and an object system.

    sudo apt-get install libglib2.0-dev


    Installing gstreamer 1.8.1 . Gstreamer create a media stream with elements and properties as will be shown on  later sections of this tutorial .

    cd gstreamer-1.8.1
    sudo make install

    Screenshot from 2016-05-19 16-51-29.png

    Screenshot from 2016-05-19 16-55-27.png

    Screenshot from 2016-05-19 16-56-05.png

    after installation  export the path

    export LD_LIBRARY_PATH=/usr/local/lib

    then verify the installation of the gstreamer by


    provides information on installed gstreamer modules ie print out a long list ( about 123 in my case ) plugin that are installed such as coreelements:

    capsfilter: CapsFilter ximagesink: ximagesink: Video sink videorate: videorate: Video rate adjuster typefindfunctions: image/x-quicktime: qif, qtif, qti typefindfunctions: video/quicktime: mov, mp4 typefindfunctions: application/x-3gp: 3gp typefindfunctions: audio/x-m4a: m4a typefindfunctions: video/x-nuv: nuv typefindfunctions: video/x-h265: h265, x265, 265 typefindfunctions: video/x-h264: h264, x264, 264 typefindfunctions: video/x-h263: h263, 263 typefindfunctions: video/mpeg4: m4v typefindfunctions: video/mpeg-elementary: mpv, mpeg, mpg typefindfunctions: application/ogg: ogg, oga, ogv, ogm, ogx, spx, anx, axa, axv typefindfunctions: video/mpegts: ts, mts typefindfunctions: video/mpeg-sys: mpe, mpeg, mpg typefindfunctions: audio/x-gsm: gsm

    gst plugins

    Now build the plugins

    cd gst-plugins-base-1.8.1
    sudo make install


    gst plugins good

    cd gst-plugins-good-1.8.1.tar
    sudo make install

    RTSP Server

    Now make and install the rtsp server

    cd gst-rtsp-server-1.8.1

    last few lines from console traces

    Version : 1.8.1
    Source code location : .
    Prefix : /usr/local
    Compiler : gcc -std=gnu99
    CGroups example : no


    It will compile the examples .

    sudo make install


    stream video test src

    ~/mediaServer/gst-rtsp-server-1.8.1/examples]$./test-launch --gst-debug=0 &quot;( videotestsrc ! video/x-raw,format=(yuv),width=352,height=288,framerate=15/1 ! x264enc ! rtph264pay name=pay0 pt=96 )&quot;
    stream ready at rtsp://


    Manual for developers :

    Simplest pipeline

    gst-launch-1.0 fakesrc ! fakesink
    ➜ ~ gst-launch-1.0 fakesrc ! fakesink Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock
    To stop press ctrl +c ^
    Chandling interrupt. Interrupt: Stopping pipeline ... Execution ended after 0:00:48.004547887 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... [/sourcecode ] or to display to a audiovideosink gst-launch-1.0 videotestsrc ! autovideosink
    Screenshot from 2016-05-20 12-31-18.png To capture webcam
    gst-launch v4l2src ! xvimagesink
    Screenshot from 2016-05-20 13-06-56.png

    Wowza REST APIs and HTTP Providers

    This article show the different ways to make calls to Wowza Media Engine from external applications and environments for various purposes  such as getting server status , listeners , connections , applications and its streams etc .

    HTTP Providers

    HTTP Providers are Java classes that are configured on a per-virtual host basis.

    Some pre packaged HTTP providers that return data in XML  :

    1. HTTPConnectionCountsXML

    Returns connection information like Vhost , application , application instance , message in bytes rate , message out byte rates etc.


    Screenshot from 2015-11-24 20:23:51

    2. HTTPConnectionInfo
    Returns detailed connection information such as



    3. HTTPServerVersion

    Returns the Wowza Media Server version and build number. It’s the default HTTP Provider on port 1935.

    url : http://%5Bwowza-ip-address%5D:1935

    Wowza Streaming Engine 4 Monthly Edition 4.1.1 build13180

    4. HTTPLiveStreamRecord

    gets the web interface to record online streams

    url : http://%5Bwowza-ip-address%5D:8086/livestreamrecord

    Screenshot from 2015-11-24 20:22:16

    5. HTTPServerInfoXML

    Returns server and connection information

    url :http://%5Bwowza-ip-address%5D:8086/serverinfo

    Screenshot from 2015-11-24 20:34:08

    6. HTTPClientAccessPolicy .

    It is used for fetching the Microsoft Silverlight clientaccesspolicy.xml from the conf folder.

    7. HTTPCrossdomain

    To get the Adobe Flash crossdomain.xml file from [install-dir]/conf folder.


    Dynamic method for generating adaptive bitrate manifests and playlists from SMIL data.


    The Stream Manager returns all applications and their stream in web interface.

    url http://%5Bwowza-ip-address%5D:8086/streammanager).

    Screenshot from 2015-11-24 20:38:32

    10 .HTTPTranscoderThumbnail

    Returns a bitmap image from the source stream being transcoded.

    url: http://%5Bwowza-ip-address%5D:8086/transcoderthumbnail?application=%5Bapplication-name%5D&streamname=%5Bstream-name%5D&format=%5Bjpeg or png]&size=[widthxheight]

    Each HTTP provider can be configured with different request filter and authentication method ( none , basic , digest).  We can even create our own substitutes for the HTTP providers as defined in the next section .

    Extending HTTProvider2Base

    The following code snippet describes the process of creating a Wowza Web services that return a json containing all the values .

    Imports to build a HTTP provider

    import com.wowza.wms.application.*;
    import com.wowza.wms.vhost.*;
    import com.wowza.wms.http.*;
    import com.wowza.wms.httpstreamer.model.*;
    //since we want to return in json format
    import org.json.simple.JSONObject;

    The class declaration is as folllows

    public class DCWS extends HTTProvider2Base

    The code to extract application names

    public JSONObject listChannels(){
    JSONObject obj=new JSONObject();
    //get params from virtual host and iterate through it
    List&amp;lt;String&amp;gt; vhostNames = VHostSingleton.getVHostNames();
    Iterator&amp;lt;String&amp;gt; iter = vhostNames.iterator();
    while (iter.hasNext())
    String vhostName =;
    IVHost vhost = (IVHost)VHostSingleton.getInstance(vhostName);
    List&amp;lt;String&amp;gt; appNames = vhost.getApplicationNames();
    Iterator&amp;lt;String&amp;gt; appNameIterator = appNames.iterator();
    int i=0;
    while (appNameIterator.hasNext())
    String applicationName =;
    try {
    String key = &quot;channel&quot;+ (++i);
    obj.put(key, URLEncoder.encode(applicationName, &quot;UTF-8&quot;));
    catch (UnsupportedEncodingException e) {
    return obj;

    The code which responds to HTTP request


    Ref :

    Wowza RTMP Authentication with Third party Token provider over Tiny Encryption Algorithm (TEA)

    this article is focused on  Wowza RTMP Authentication with  Third party Token provider over Tiny Encryption Algorithm (TEA)  and  is a continuation of the previous post about setting up a basic RTMP Authentication module on Wowza Engine above version 4.

    The task is divided into 3 parts .

    1. RTMP Encoder Application
    2. Wowza RTMP Auth module
    3. Third party Authentication Server

    The component diagram is as follows :

    Copy of Publisher App iOS

    The detailed explanation of the components are :

    1.Wowza RTMP Auth module

    The Wowza Server receives a rtmp stream url in the format as :


    It considers the username and pass to be user credentials . RTMP auth Module invokes the getPassword() function inside of deployed application class  passing the username as parameter.  The username is then  encrypted using TEA ( Tiny Encryption algorithm)

    TEA is a block cipher  which is based on symmetric ( private) key encryption . Input is a 64 bit of plain or cipher text with a 128 bit key resulting in output of cipher or plain text respectively.

    The code for encryption  is

    TEA.encrypt( username, sharedSecret );

    The code to make a connection to third party auth server is

     url = new URL(serverTokenValidatorURL);
     URLConnection connection;
     connection = url.openConnection();
    OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
     out.write("clientid=" + TEA.encrypt( username, sharedSecret ););

    The sharedsecret is the common key which is with both the Auth server and wowza server . It must be atleast a 16 digit alphanumeric / special character based key . An example of shared secret is abcdefghijklmnop .The value can be stored as property in Application.xml file.



    The values of serverTokenValidatorURL is the third party auth server listening for REST POST request .

    The code for receiving the incoming  resulting json data is

    	ObjectMapper mapper = new ObjectMapper();
    	JsonNode node = mapper.readTree(connection.getInputStream()); 
    	node = node.get("publisherToken") ;
    	String token = node.asText();
            String token2 =TEA.decrypt(token, sharedSecret);

    2.Third party Authentication Server

    The 3rd party Auth server stores the passwords for users or performs oauth based authentication . It uses a shared secret key to decrypt the token based on TEA as explained in above section .

    The code to decrypt the incoming clientId

    TEA.decrypt(id, sharedSecret);

    Add own custom logic to check files , databases etc for obtaining the password corresponding to the username as decrypted above.

    The code to encrypt the password for the user if exists or send invalid response if non exists is

            try {
                String clientID = TEA.decrypt(id, sharedSecret);
                String token= findUserPassword(clientID);
                 token = TEA.encrypt(token, sharedSecret); 
                return "{\"publisherToken\":\""  + token+ "\"}";
            }catch (Exception ex) {
                return "{\"error\":\"Invalid Client\"}";

    The final callflow thus becomes :

    Copy of Publisher App iOS (1)

    Screenshots :


    Wowza Secure URL params Authentication for streams in an application

    To secure the publishers for a common application through username -password specific for stream names , this post is useful . It  uses Module Core Security to prompt back the user for supplying credentials.

    The detailed code to check the rtmp query-string for parameters  and performs the checks –  is user is allowed to connect and is user allowed to stream on given stream name is given below .

    Initialize the hashmap containing publisher clients and IapplicationInstance

    HashMap <Integer, String> publisherClients =null;
    IApplicationInstance appInstance = null;

    On app start initilaize the IapplicationInstance object .

    public void onAppStart(IApplicationInstance appInstance)
        this.appInstance = appInstance;

    Onconnect is called called when any publisher tries to connects with media server. At this event collect the username and clientId from the client.
    Check if publisherclient contains the userName which client has provided else reject the connection .

    public void onConnect(IClient client, RequestFunction function, AMFDataList params)
    AMFDataObj obj = params.getObject(2);
    AMFData data = obj.get("app");
       String[] paramlist = data.toString().split();
       String[] userParam = paramlist[1].split("=");
       String userName = userParam[1];
           this.publisherClients = new HashMap<Integer, String>();
    } else {

    AMFDataItem: class for marshalling data between Wowza Pro server and Flash client.

    As the event user starts to publish a stream after sucessful connection Onpublishing function is called . It extracts the stream name from the client ( function extractStreamName() )and checks if user is allowed to stream on the given streamname (function isStreamNotAllowed()) .

    public void publish(IClient client, RequestFunction function, AMFDataList params)
    String streamName = extractStreamName(client, function, params);
    if (isStreamNotAllowed(client, streamName))
    sendClientOnStatusError(client, NetStream.Publish.Denied, "Stream name not allowed for the logged in user: "+streamName);
    invokePrevious(client, function, params);

    Function when publisher disconnects from server . It removes the client from publisherClients.

    public void onDisconnect(IClient client)

    The function to extract a streamname is

    public String extractStreamName(IClient client, RequestFunction function, AMFDataList params)
    String streamName = params.getString(PARAM1);
    if (streamName != null)
    String streamExt = MediaStream.BASE_STREAM_EXT;
    String[] streamDecode = ModuleUtils.decodeStreamExtension(streamName, streamExt);
    streamName = streamDecode[0];
    streamExt = streamDecode[1];
    return streamName;

    The fucntion to check if streamname is allowed for the given user

    public boolean isStreamNotAllowed(IClient client, String streamName)
    WMSProperties localWMSProperties = client.getAppInstance().getProperties();
    String allowedStreamName = localWMSProperties.getPropertyStr(this.publisherClients.get(client.getClientId()));
    String sName="";
    sName = streamName.substring(0, streamName.lastIndexOf(&amp;amp;quot;?&amp;amp;quot;));
    sName = streamName;
    return !sName.toLowerCase().equals(allowedStreamName.toLowerCase().toString()) ;

    On adding the application to wowza server make sure that the ModuleCoreSecurity is present under Modules in Application.xml

    <Description>Core Security Module for Applications</Description>

    Also ensure that property securityPublishRequirePassword is present under properties


    Add the user credentials as properties too. For example to give access to testuser with password 123456 to stream on myStream include the following ,


    Also include the mapping of user and password inside of conf/publish.password file

    # Publish password file (format [username][space][password])
    # username password

    testuser 123456

    Wowza RTMP Authenticate Module

    To purpose of the article is the use the RTMP Authentication Module in wowza Engine .  This will enable us to intercept a connect request with username and password to be checked from any outside source like – database , password file , third party token provider , third party oauth etc.  Once the password provided by user is verified with the authentic password form external sources the user is allowed to connect and publish.

    Step 1 : Create a new Wowza Media Server Project in Eclipse .  It is assumed that user has already integrated WowzaIDE into eclipse .

    File -> New -> Wowza Media Server Project  

    Step 2: Give any project name . I named it as “RTMPAuthSampleCode”.

    wowza RTMP Auth
    wowza RTMP Auth

    Step 3 :   Point the location to existing Wowza Engine installed in local environment .

    It is usually in /usr/local/WowzaStreamingEngine/

    Wowza RTMP Auth
    Wowza RTMP Auth

    Step 4 : Proceed with the creation , uncheck the event methods as we are not using them right now .

    Screenshot from 2015-09-17 13:10:24

    Step 5: Put the code in class.

    The class RTMPAuthSampleCode extends AuthenticateUsernamePasswordProviderBase . Its mandatory to define getPassword(String username ) and userExists(String username).  ModuleRTMPAuthenticate will invoke getPassword for connection request from users .

    Screenshot from 2015-09-17 13:11:58

    We can add any source of obtaining password for a given username which will be matched to the password supplied by user . If it matches he will be granted access otherwise we can return null or error message .

    We may use various ways of obtaining user credentials like databse , password files , third part token provider etc . I will be discussing more ways to do RTMP authenticate esp using a third part token provider which using TEA.encrypt and shared secret in the next blog.

    Step 6: Build the project and Run.

    Project-> Build the Project 

    Run -> Run Configurations … -> WowzaMediaServer_RTMPAuthSampleCode

    To modules in my ubuntu 64 bit   version 14.04 system , I also need to provide

    -Dcom.wowza.wms.native.base=”linux” inside of the VM Arguments . Its highlighted in figure below.

    Screenshot from 2015-09-17 13:12:23

    Step 7: Click Run to start the wowza Media Engine

    Step 8 : Open the Manager Console of Wowza.

    web based GUI interface of managing the application and checking for incoming streams . The manager script can be started with

    sudo ./usr/local/WowzaStreamingEngine/manager/bin/

    The console can be opened at

    Screenshot from 2015-09-17 13:53:58

    Also you can see that RTMPAuthSampleCode.jar would have been copied to /usr/local/WowzaStreamingEngine/lib folder.

    Step 9: Add module to applications

    Add folder “RTMPAuthSampleCode” inside /usr/local/WowzaStreamingEngine/applications folder .

    Step 10 : Add conf

    Add folder “RTMPAuthSampleCode” inside /usr/local/WowzaStreamingEngine/conf  folder

    Copy paste Application.xml from conf folder inside RTMPAuthSampleCode folder and make the following changes .

    Add the ModuleRTMPAuthenticate module to Modules

    <Module> <Name>ModuleRTMPAuthenticate</Name> <Description>ModuleRTMPAuthenticate</Description> <Class></Class> </Module>

    and comment ModuleCoreSecurity

    <!--    <Module>
         <Description>Core Security Module for Applications</Description>
    </Module> -->

    Step 11: Add property usernamePasswordProviderClass to Properties .

    usualy present inside Application at the bootom of Application.xml file


    Step 12 : Make Authentication.xml file inside /usr/local/WowzaStreamingEngine/conf folder.

    Note that from wowza 4 and later versions the Authentiocation.xml has come bundled with wms-server.jar which is inside of lib folder .   However for me , without giving a explicit Authentication.xml file the program froze and using my own simple authentication.xml gave problems with the digest . Hence follow the below process to get a working Authentication.xml file inside conf folder

    Expand the archive and  inside the extracted folder wms-server copy the file from location wms-server/com/wowza/wms/conf/Authentication.xml to /usr/local/WowzaStreamingEngine/conf.

    Step 13 : Restart Wowza Media Engine .

    Step 14 : Use any RTMP encoder as Adobe Live Media Encoder or Gocoder or your own app ( could not use this with ffmpeg ) and  try to connect to application RTMPAuthSampleCode with username test and password 1234.

    Step 15 : Observer the logs for incoming streams and traces from getpassword  .

     If you want the user test to have permission to publish stream to this application then return 1234 from getPassword else return null .

    References :

    1. Media security overview
    2. How to integrate Wowza user authentication with external authentication systems (ModuleRTMPAuthenticate)
    3. How to enable username/password authentication for RTMP and RTSP publishing
    4. configuration ref 4.2

    WebRTC Live Stream Broadcast

    WebRTC has the potential to drive the Live Streaming broadcasting area with its powerful no plugin , no installation , open standard  policy . However the only roadblock is the VP8 codec which differs from the traditional H264 codec that is used by almost all the media servers , media control units , etc .

    This post is first in the series of building a WebRTC based broadcasting solution. Note that a p2p session differs from a broadcasting session as Peer-to-peer session applies to bidirectional media streaming where as broadcasting only applies unidirectional media flow.

    Scalable Broadcasting and Live streaming alternatives

    1. WebRTC multi peers

    Since WebRTC is p2p technology , it is convenient to build a  network of webrtc client viewers which can pass on the stream to 3 other peers in different session. In this fashion a fission chain like structure is created where a single stream originated to first peer is replicated to 3 others which is in turn replicated to 9 peers etc .

    WebRTC Scalable Streaming Server -WebRTC multi peers
    WebRTC Scalable Streaming Server -WebRTC multi peers

    Advantages :

    1. Scalable without the investment of media servers
    2. No additional space required at service providers network .

    Disadvantage :

    1. The entire set of end clients to a node get disconnected if a single node is broken .
    2. Since sessions are dynamically created , it is difficult to maintain a map with fallback option in case of service disruption from any single node .
    3. Client incur bandwidth load of 2 Mbps( stream incoming peer ) incoming and 6 Mbps ( for 3 connected peers ) outgoing data .

    2. Torrent based WebRTC chain

    To over come the shortcoming of previous approach of  tree based broadcasting , it is suggested to use a chained broadcasting mechanism .

    WebRTC Scalable Streaming Server v1 (4)
    WebRTC Scalable Streaming Server- Single chain connection

    To improvise on this mechanism for incresing efficieny for slow bandwidth connections we can stop their outgoing stream converting them to only consumers . This way the connection is mapped and arranged in such a fashion that every alternate peer is connected to 2 peers  for stream replication. The slow bandwidth clients can be attached as independent endpoints . WebRTC Scalable Streaming Server v1 (3)

    3. WebRTC Relay nodes for multiple peers

    The aim here is to build a career grade WebRTC stream broadcasting platform , which is capable of using the WebRTC’s mediastream and peerconnection API , along with repeaters to make a scalable broadcasting / live streaming solution using socketio for behavior control and signalling .

    Algorithm :

    At the Publisher’s end

    1. GetUserMedia
    2. Start Room “liveConf”
    3. Add outgoing stream to session “liveConf “ with peer “BR” in 1 way transport .

    1 outgoing audio stream -> 1 MB in 1 RTP port
    1 outgoing video -> 1 MB 1 more RTP port
    Total Required 2 MB and 2 RTP ports

    At the Repeater layer (high upload and download bandwidth )

    4. Peer “BR” opens parallel room “liveConf_1” , “liveConf_2” with 4 other peers “Repeater1 “, “Repeater2” , so on
    5. Repetare1 getRemoteStream from “liveConf_1” and add as localStream to “liveConf_1_1”

    Here the upload bandwidth is high and each repeater is capable of handling 6 outgoing streams . Therefore total 4 repeaters can handle upto 24 streams very easily

    At the Viewer’s end

    6. Viewer Joins room ”liveConf_1_1”
    7. Play the incoming stream on WebRTC browser video element”

    WebRTC Relay nodes for multiple peers
    WebRTC Relay nodes for multiple peers


    1. As 6 viewers can connect to 1 repeater for feed , total of 24 viewers will require only 4 repeaters.
    2. Only 2 MB consumption at publishers end and 2MB at each viewer’s end.

    4. WebRTC  recorder to Broadcasting Media Server VOD

    This process is essentially NOT a live streaming solution but a Video On Demand type of implementation for a recorded webRTC stream .

    Figure shows a WebRTC node which can record the webrtc files as webm . Audio and video can be together  recorded on fireox . With chrome one needs to merge a separately recorded webm ( video) and wav ( audio ) file to make a single webm file containing both audio and video and them store in VOD server’s repo.

    WebRTC Scalable Streaming Server  - WebRTC Chunk recorder to Broadcasting Media Server VOD
    WebRTC Chunk recorder to Broadcasting Media Server VOD

    Although inherently Media Server do not support webm format but few new age lightweight media servers such as Kurento are capable of this .

    Advantages :

    1. Can solve the end goal of broadcasting from a webrtc browser to multiple webrtc browsers without incurring extra load on any client machine ( Obviously assuming that  Media Server handles the distribution of video and load sharing automatically )


    1. It is not livestreaming
    2. For significantly longer recorded stream the delta in delay of streaming increases considerably .  Ideally this delta should be no more than 5 minutes .