Low Latency Media streaming

  1. NTP Synchronization and measuring Latency
  2. Latency reduction
  3. Lip Sync ( Audio Video Synchronization)
  4. Synchronize clocks
  5. Congestion Control in Real Time Systems
    1. Feedback loop
    2. MTU determination
    3. RTCP Feedback
      1. Resolving congestion via RTCP
  6. Transport Protocol Optimization
  7. Fasten session establishment
  8. Trade-Off of Latency vs Quality
    1. Lossy vs Lossless compression
    2. Intra frame vs Inter frame compression
  9. Container Formats for Streaming
    1. TCP based streaming
    2. UDP based streaming
    3. Real-Time Messaging Protocol (RTMP)
    4. RTSP (Real-Time Streaming Protocol)
    5. LL ( Low Latency) – HTTP Live Streaming (HLS)
    6. MPEG -DASH
    7. WebRTC (Web Based Real Time Communication)
    8. Transport protocol
    9. P2P (Peer-to-Peer) Connections
    10. Choice of Codecs
    11. Network Adaptation
    12. CMAF (Common Media Application Format)

Key components of Low latency streaming include systems that can support second or even subsecond latency. To achieve this a system should have

  • Low latency encoders on edge devices, ideally <100ms processing + small Rx buffers
  • Global distribution with many pop ( point of presence) at ideally ~50ms from user. Anycast routing works best for lowest latency
  • Stateless protocols / soft state session tracking
  • Fast codecs on fast transport as UDP

Designing a low-latency streaming protocol and platform involves overcoming significant technical challenges across multiple layers of network stack.

NTP Synchronization and measuring Latency

As latency is critical to maintaining a good Qos in glass to glass streaming and also to implement congestion control, it is critical to measure latency as precisely as possible.

A NTP Time server  measures the number of seconds that have elapsed since January 1, 1970. NTP time stamp can represent time values to an accuracy of ± 0.1 nanoseconds (ns). In RTP spec it is a 64-bit double word where top 32 bits represent seconds, and the bottom 32 bits represent fractions of a second. For measuring latency in RTP however NTP time server is not used, instead the audio capture clock forms the basis for NTP time calculation.

RTP/(RTP sample rate) = (NTP + offset) x scale

The latency is then calculated with accurate mappings between the NTP time and RTP media time stamps by sending RTCP packets for each media stream.

Latency can be induced at various points in the systems

  1. Transmitter Latency in the capture, encoding and/or packetization
  2.  Network latency including gateways, load balancing, buffering
  3.  Media path having TURN servers for Network address traversal, delay due to low bandwidth, transcoding delays in media servers.
  4.  Receiver delays in playback due to buffer delay, playout delay by decoder due to hardware constraints

The delay can be caused by one or many stages of the path in the media stream and would be a cumulative sum of all individual delays. For this reason, TCP is a bad candidate due to its latency incurred in packet reordering and fair congestion control mechanisms. While TCP continues to provide signaling transport, the media is streamed over RTP/UDP.

Latency reduction

Although modern media stacks such as WebRTC are designed to be adaptable for dynamic network condition, the issues of bandwidth unpredictability leads to packet loss and eventually low Qos. Effective techniques to reduce latency :

  • Dynamic network analysis and bandwidth estimation for adaptive streaming ensure low latency stream reception at the remote end.
  • Silence suppression: This is an effective way to save bandwidth
    • (+) typical bandwidth reduction after using silence suppression ~50%
  • Noise filtering and Background blurring are also efficient ways to reduce the network traffic 
  • Forward error correction or redundant encoding, techniques help to recover from packet loss faster than retransmission requests via NACK 
  • Increased Compression can also optimize packetization and transmission of raw data that targets low latency
  • Predictive decoding and end point controlled congestion

Ineffective techniques that do not improve QoS even if reduce latency 

  • minimizing headers in every PDU: Some extra headers such as CSRC or timestamp can be removed to create RTPLite but significant disadvantages include having to functionalities using custom logic as
    • (-) removing timestamp would lead to issues in cross-media sync ( lip-sync) or jitter, packet loss sync
    • (-) Removing contributing source identifiers (SSRC ) could lead to issues in managing source identity in multicast or via media gateway.
  • Too many TURN, STUN servers and candidates collection 
  • lowering resolution or bitrate may achieve low latency but is far from the hi-definition experience that users expect.

Lip Sync ( Audio Video Synchronization)

Many real time communication and streaming platforms have separate audio and video processing pipelines. The output of these two can go out of sync due to differing latency or speed and may may appear out of sync at the playback on receivers end. As the skew increases the viewers perceive it as bad quality.

According to convention, at the input to the encoder, the audio should not lead the video by more than 15 ms and should not lag the video by more than 45 ms. Generally the lip sync tolerance is at +- 15 ms.

Synchronize clocks

The clock helps in various ways to make the streaming faster by calculating delays with precision

  • NTP timestamps help the endpoints measure their delays.
  • Sequence numbers detect losses as it increases by 1 for each packet transmitted. The timestamps increase by the time covered by a packet which is useful for correcting timing order, playout delay compensation etc.

To compute transmission time the sender and receiver clocks need to be synchronized with milliseconds of precision. But this is unlikely in a heterogeneous environment as hosts may have different clock speeds. Clock Synchronization can be achieved in various ways

  1. NTP synchronization : For multimedia conferences the NTP timestamp from RTCP SR is used to give a common time reference that can associate these independant timesatmps witha wall clock shared time. Thei allows media synchronization between sender in a single session. Additionally RFC 3550 specifies one media-timestamp in the RTP data header and a mapping between such timestamp and a globally synchronized clock( NTP), carried as RTCP timestamp mappings.

2. MultiCast synchronization: receivers synchronize to sender’s RTP Header media timestamp

  • (-) good approach for multicast sessions

3. Round Trip Time measurement as a workaround to calculate clock sync. A round trip propagation delay can help the host sync its clock with peers.

roundtrip_time = send timestamp - reflected timestamp
tramission_time = roundtrip_time / 2
  • (-) this approach assumes equal time for sending and receiving packet which is not the case in cellular networks. thus not suited for networks which are time asymmetrical.
  • (-) transmission time can also vary with jitter. ( some packets arrive in bursts and some be delayed)
  • (-) subjected to packet loss

4. Adjust playout delay via Marker Bit : Marker bit indicates beginning of talk spurt. This lets the receiver adjust the playout delay to compensate fir the different click rates between sender and receiver and/or network delay jitter.

Marker Bit Header in RTP with GSM payload

Receivers can perform delay adaption using marker bit as long as the reordering time of market bit packer with respect to other packets is less than the playout delay. Else the receiver waits for the next talkspurt.

Sequence of RTP with GSM payload

Similar examples from WebRTC RTP dumps

Congestion Control in Real Time Systems

Congestion is when we have reached the peak limit the network can support and path can handle. There could be many reasons for congestion as limits by ISP, high usage at certain time, failure on some network resources causing other relay to be overloaded so on. Congestion can result in

  • dropping excess packets => high packet loss
  • increased buffering -> overloaded packets will queue and cause erratic delivery -> high jitter
  • progressively increasing round trip time
  • congestion control algorithms send explicit notification which trigger other nodes to activate congestion control.

A real time communication system maybe efficient performing encoding/decoding but will be eventually limited by the network. Sending congestion dynamically helps the platform to adapt and ensure a satisfactory quality without losing too many packet at network path. There has been extensive research on the subject of congestion control in both TCP and UDP transports. Simplistic methods use ACK’s for packet drop and OWD (one way delay) to derive that some congestion may be occurring and go into avoidance mode by reducing the bitrate and/or sending window.

UDP/RTP streams have the support of well designed RTCP feedback to proactively deduce congestion situation before it happens. Some WebRTC approaches work around the problem of congestion by providing simulcast , SVC(Temporal/frame rate, spatial/picture size , SNR/Quality/Fidelity ) , redundant encoding etc. Following attributes can help to infer congestion in a network

  • increasing RTT ( Round Trip Time)
  • increasing OWD( One Way Delay)
  • occurance of Packet Loss
  • Queing Delay Gradient = queue length in bits / Capacity of bottleneck link

Feedback loop

A feedback loop between video encoder and congestion controller can significantly help the host from experiencing bad QoS.

MTU determination

Maximum transmission Unit ( MTU) determine how large a packet can be to be send via a network. MTU differs on the path and path discovery is an effective way to determine the largest packet size that can be sent.

RTCP Feedback

To avoid having to expend on retransmission and faulty gaps in playback, the system needs to cumulatively detect congestion. The ways to detect congestion are looking at self’s sending buffer and observing receivers feedback. RTP supports mechanisms that allow a form of congestion control on longer time scales.

Resolving congestion via RTCP

Some popular approaches to overcome congestion are limiting speed and sending less

  • Throttling video frame acquisition at the sender when the send buffer is full
  • change the audio/video encoding rate
  • Reduce video frame rate, or video image size at the transmitter
  • modifying the source encoder

The aim of these algorithms is usually a trade-off between Throughout and Latency. Hence maximizing throughout and penalizing delays is a formula use often to come up with more modern congestion control algorithms.

  • LEDBAT
  • NADA ( Network Assisted Dynamic Adaption) which uses a loss vs delay algorithm using OWD,
  • SCREAM ( Sefl Clocked Rate Adaptation for multimedia)
  • GCC ( google congestion control) uses kalman filter on end to end OWD( one way delay) and compares against an adaptive threhsold to throttle the sending rate.

Transport Protocol Optimization

A low latency transport such as UDP is most appropriate for real time transmission of media packet, due to smaller packet and ack less operation. A TCP transport for such delay sensitive environments is not ideal. Some points that show TCP unsuited for RTP are :

  • Missing packet discovery and retrasmission will take atleast one round trip time using ack which either results in audible gap in playout or the retransmitted packet being discarded by altogether by decoder buffer.
  • TCP packets are heavier than UDP.
  • TCP cannot support multicast
  • TCP congestion control is inapplicable to real time media as it reduces the congestion window when packet loss is detected. This is unsuited to codecs which have a specific sampling like PCM is 64 kb/s + header overhead.

Fasten session establishment

Lower layer protocols are always required to have control over resources in switches, routers or other network relay points. However, RTP provides ways to ensure a fast real-time stream and its feedback including timestamps and control synchronization of different streams to the application layer.

  1. Mux the stream: RTP and RTCP share the same socket and connection, instead of using two separate connections.
  • (+) Since the same port is used fewer ICE candidates gathering is required in WebRTC.

2. Prioritize PoP ( points of presence) under quality control over open internet relays points.

3. Parallelize AAA ( authentication and authorization) with session establishing instead of serializing it.

Trade-Off of Latency vs Quality

To achieve reliable transmission the media needs to be compressed ( made smaller) which may lead to loss of certain picture quality.

Lossy vs Lossless compression

Loss Less Compression will incur higher latency as compared to lossy compression.

Loss Less CompressionLossy Compression
(+) Better picture quality(-) lower picture quality
(-) higher power consumption(+) lower power consumption at encoder and decoder
suited for file storagesuited for real time streaming

Intra frame vs Inter frame compression

Intra frameInter frame
Intra frame compression reduce bits to decribe a single frame ( lie JPEG)Reduce the bit to decode a series of frame by removing duplicate information
Type:
I – frame : a complete picture without any loss
P – frame : partical picture with delat from previous frame
B – frame : a partical pictureusing modification from previous and future pictures.
suited for images

Container Formats for Streaming

Some time ago Flash + RTMP was the popular choice for streaming. Streaming involves segmenting a audio or audio inot smaller chunks which can be easily transmitted over networks. Container formats contain an encoded video and audio track in a single file which is then streamed using the streaming protocol. It is encoded in different resolutions and bitrates. Once received it is again stored in a container format (MP4, FLV).

ABR formats are HTTP-based, media-streaming communications protocols. As the connection gets slower, the protocol adjusts the requested bitrate to the available bandwidth. Therefore, it can work on different network bandwidths, such as 3G or 4G.

TCP based streaming

  • (-) slow start due to three way handshake , TCP Slow Start and congestion avoidance phase
  • (+) SACK to overcoming resending the whole chain for a lost packet

UDP based streaming

One of the first television broadcasting techniques, such as in IPTV over fibre with repeaters , was multicast broadcasting with MPEG Transport Stream content over UDP.  This is suited for internal closed networks but not as much for external networks with issues such as interference, shaping, traffic congestion channels, hardware errors, damaged cables, and software-level problems. In this case, not only low latency is required, but also retransmission of lost packets.

  • (-) needs FEC for overcoming lost packets which causes overheads

Real-Time Messaging Protocol (RTMP)

Developed by Macromedia and acquired by Adobe in 2005. Originally developed to support Flash streaming, RTMP enables segmented streaming. RTMP Codecs

  • Audio Codecs: AAC, AAC-LC, HE-AAC+V1 and V2, OPUS, MP3, SPEEX, VORBIS
  • Video Codecs: H.264, VP6, VP8

RTMP is widely used for ingest and usually has latency ~5s. RTMP works on TCP using default port 1935. RTMP uses UDP by replaced chunked stream. It has many variations such as

  • (-) HTTP incompatible.
  • (-) may get blocked by some firewalls
  • (-) delay from 2-3 s , upto 30 s
  • (+) multicast supported
  • (+) Low buffering
  • (-) non support for vp9/HEVC/AV1
  • (-) Medium latency 2-5s

RTMP forms several virtual channels on which audio, video, metadata, etc. are transmitted.

RTMP Body
RTMP packet

RTSP (Real-Time Streaming Protocol)

RTSP is primarily used for streaming and controlling media such as audio and video on IP networks. It relies on external codecs and security. It is not typically used for low-latency streaming because of its design and the way it handles data transfer. RTSP uses TCP (Transmission Control Protocol) as its transport protocol. This protocol is designed to provide reliability and error correction, but it also introduces additional overhead and latency compared to UDP.
RTSP cannot adjust the video and audio bitrate, resolution, and frame rate in real-time to minimize the impact of network congestion.

  • (-) legacy requires system software
  • (+) often used by Surveillance system or IOT system with higher latency tolerance
  • (-) mostly compatible with android mobile clients

LL ( Low Latency) – HTTP Live Streaming (HLS)

While traditional HLS has large chunk of segments 6-10 seconds, LLHLS has 3-5 seconds or less. Additionally LLHLS also has partial segments which allow HTTP player to play them before full segment is received. Additionally leveraging HTTP2 and HTTP3 makes LLHLS even faster and better in performance, allowing faster switching in quality.

HLS can adapt the bitrate of the video to the actual speed of the connection using container format such as mp4. Apple’s HLS protocol used the MPEG transport stream container format or .ts (MPEG-TS)

  • Audio Codecs: AAC-LC, HE-AAC+ v1 & v2, xHE-AAC, FLAC, Apple Lossless
  • Video Codecs: H.265, H.264

The sub 2 second latency using fragmented 200ms chunks .

  • (+) relies on ABR to produce ultra high quality streams
  • (+) widely compatible even HTML5 video players
  • (+) secure with HTTPS and DRM ( Digital Rights Management)
  • (-) higher latency ( sub 2 second) than WebRTC
  • (-) propertiary : HLS capable encoders are not accessible or affordable

MPEG -DASH

MPEG DASH (MPEG DASH (Dynamic Adaptive Streaming over HTTP)) is primarily used for streaming on-demand video and audio content over HTTP. A HTTP based streaming protocol by MPEG, as an alternative to HLS. While MPEG is an open standard, HLS is widely used by apple devices.

MPEG-DASH and HLS are much similar as both protocols run over HTTP, use TCP as their transport protocol, breaking video into segments with an accompanying manifest index file, and offer adaptive bitrate streaming. MPEG-DASH uses .mp4 containers as HLS streams are delivered in .ts format.

MPEG DASH uses HTTP (Hypertext Transfer Protocol) as its transport protocol, which allows for better compatibility with firewalls and other network devices, but it also introduces additional overhead and latency, hence unsuitable for low latency streaming

  • (+) supports adaptive streaming (ABR)
  • (+) open source
  • (-) incompatible to apple devices
  • (-) not provided out of box with browsers, requires system software for playback
  • (-) high latency ~10-30s

WebRTC (Web Based Real Time Communication)

Low latency is imperative for use cases that require mission-critical communication such as the emergency call for first responders, interactive collaboration and communication services, real-time remote object detection etc. Other use cases where low latency is essential are banking communication, financial trading communication, VR gaming etc. When low latency streaming is combined with high definition (HD) quality, the complication grows tenfold. Some instance where good video quality is as important as sensitivity to delay is telehealth for patient-doctor communication.  WebRTC is the most used p2p Web bsed streaming with sub second latency. Inbuild Latency control. However it is not suited to multicast streaming usecases.

Transport protocol

WebRTC uses UDP as its transport protocol, which allows for faster and more efficient data transfer compared to TCP (Transmission Control Protocol). UDP does not require the same level of error correction and retransmission as TCP, which results in lower latency.

P2P (Peer-to-Peer) Connections

WebRTC allows for P2P connections between clients, which reduces the number of hops that data must travel through and thus reduces latency. WebRTC also use Data channel API which can transmit data p2p swiftly such as state changes or messages.

Choice of Codecs

Most WebRTC providers use vp9 codec ( successor to vp8) for video compression which is great at providing high quality with reduced data.

Network Adaptation

WebRTC adapts to networks in realtime by adjusting bitrate, resolution, framework with changes in networks conditions. This auto adjusting quality also helps WebRTC mitigate the losses under congestion buildup.

  • (+) open source and support for other open source standardized techs such as Vp8, Vp9 , Av1 video codecs , OPUS audio codec. Also supported H264 certain profiles and other telecom codecs
  • (+) secure E2E SRTP
  • (-) evolving still

CMAF (Common Media Application Format)

Format to enable HTTP based streaming. It is compatible with DASH and HLS players by using a new uniform transport container file. Apple and Microsoft proposed CMAF to the Moving Pictures Expert Group (MPEG) in 2016 . CMAF features

  • (+) Simpler
  • (+) chunked encoder and chunked transfer
  • (-) 3-5 s of latency
  • (+) increases CDN efficiency

CMAF Addressable Media Objects : CMAF header , CMAF segment , CMAF chunk , CMAF track file. Logical components of CMAF

  • CMAF Track: contains encoded media samples, including audio, video, and subtitles. Media samples are stored in a CMAF specified container. Tracks are made up of a CMAF Header and one or more CMAF Fragments.
  • CMAF Switching Set: contains alternative tracks that can be switched and spliced at CMAF Fragment boundaries to adaptively stream the same content at different bit rates and resolutions.
  • Aligned CMAF Switching Set: two or more CMAF Switching Sets encoded from the same source with alternative encoding; for example, different codecs, and time aligned to each other.
  • CMAF Selection Set: a group of switching sets of the same media type that may include alternative content (for example, different languages or camera angles) or alternative encoding (for example,different codecs).

CMAF Presentation: one or more presentation time synchronized selection sets.


Ref :

Media Architecture, RTP topologies

  1. Point to Point
  2. Point to Point via Middle-box
    1. Translator
    2. Transport/Relay Anchoring
    3. Transport translator
    4. Media translator
    5. Back-To-Back RTP Session
  3. Point to Point using Multicast
    1. Any-Source Multicast (ASM)
    2. Source-Specific Multicast (SSM)
  4. Point to Multipoint using Mesh
  5. Point to Multipoint + Translator
    1. Media Mixing Mixer
    2. Media Switching Mixer
  6. SFU ( Selective Forwarding Unit)
  7. Simulcast
  8. SVC ( Scalable Video Coding)
  9. Hybrid Topologies
    1. Hybrid model of forwarding and mixed streaming
    2. Serverless models
    3. Point to Multipoint Using Video-Switching MCUs
    4. Cascaded SFUs
    5. Multipath RTP
  10. Transport Protocols
  11. Audio PCAP storage and Privacy constraints for Media Servers

With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferencing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution. This article is about media server setup to provide mid to high scale conferencing solution over SIP to various endpoints including SIP softphones , PBXs , Carrier/PSTN and WebRTC.

Point to Point

Endpoints communicating over unicast. RTP and RTCP traffic is private between sender and receiver even if the endpoints contains multiple SSRC’s in RTP session.

Advantages of P2p Disadvantages of p2p
(+) Facilitates private communication between the parties (-) Only limitation to number of stream between the participants are the physical limitations such as bandwidth, num of available ports

Point to Point via Middle-box

Same as above but with a middle-box involved. Middle Box type are :

Translator

Mostly used interoperability for non-interoperable endpoints such as transcoding the codecs or transport convertion. This does not use an SSRC of its own and keeps the SSRC for an RTP stream across the translation.

Subtypes of Multibox :

Transport/Relay Anchoring

Roles like NAT traversal by pinning the media path to a public address domain relay or TURN server.

Middleboxes for auditing or privacy control of participant’s IP

Other SBC ( Session Border Gateways) like characteristics are also part of this topology setup

Transport translator

They work in interconnecting networks like multicast to unicast. Provide media packetization to allow other media to connect to the session like non-RTP protocols

Media translator

Modifies the media inside of RTP streams commonly known as transcoding. It can do up to full encoding/decoding of RTP streams. In many cases it can also act on behalf of non-RTP supported endpoints, receiving and responding to feedback reports ad performing FEC ( forward error corrected )

Back-To-Back RTP Session

Mostly like middlebox like translator but establishes separate legs RTP session with the endpoints, bridging the two sessions. Takes complete responsibility of forwarding the correct RTP payload and maintain the relation between the SSRC and CNAMEs

Advantages of Back-To-Back RTP SessionDisadvantages of Back-To-Back RTP Session
(+) B2BUA / media bridge take responsibility tpo relay and manages congestion(-) It can be subjected to MIM attack or have a backdoor to eavesdrop on conversations

Point to Point using Multicast

Any-Source Multicast (ASM)

Traffic from any particpant sent to the multicat group address reaches all other participants

Source-Specific Multicast (SSM)

Selective Sender stream to the multicast group which streams it to the receivers

Point to Multipoint using Mesh

many unicast RTP streams making a mesh

Point to Multipoint + Translator

Some more variants of this topology are Point to Multipoint with Mixer

Media Mixing Mixer

receives RTP streams from several endpoints and selects the stream(s) to be included in a media-domain mix. The selection can be through

static configuration or by dynamic, content-dependent means such as voice activation. The mixer then creates a single outgoing RTP stream from this mix.

Media Switching Mixer

RTP mixer based on media switching avoids the media decoding and encoding operations in the mixer, as it conceptually forwards the encoded media stream.

The Mixer can reduce bitrate or switch between sources like active speakers.

SFU ( Selective Forwarding Unit)

Middlebox can select which of the potential sources ( SSRC) transmitting media will be sent to each of the endpoints. This transmission is set up as an independent RTP Session.

Extensively used in videoconferencing topologies with scalable video coding as well as simulcasting.

Advantages of SFUDisadvantages of SFU
(+) Low latency and low jitter buffer requirnment by avoiding re enconding
(+) saves on encoding decoding CPU utilization at server
(-) unable to manage network and control bitrate
(-) creates higher load on receiver when compared with MCU

On a high level, one can safely assume that given the current average internet bandwidth, for count of peers between 3-6 mesh architectures make sense however any number above it requires centralized media architecture.

Among the centralized media architectures, SFU makes sense for atmost 6-15 people in a conference however is the number of participants exceed that it may need to switch to MCU mode.

Simulcast

Encode in multiple variation and let SFU decide which endpoint should receive which stream type

Advantages of SFU +SimulcastDisadvantages of SFU +Simulcast
(+) Simulcast can ensure endpoints receive media stream depending on their requirnment/bandwidth/diaply(-) Uplink bandwidth reuirnment is high
(-) CPU intensive for sender for encoding many variations of outgoing stream

SVC ( Scalable Video Coding)

Encodes in multiple layers based on various modalities such as

  • Signal to noise ration
  • temporal
  • Spatial
Advantages of SFU +SimulcastDisadvantages of SFU +Simulcast
(+) Simulcast can ensure endpoints receive media stream depending on their requirement/bandwidth/delay(-) Uplink bandwidth requirement is high
(-) CPU intensive for sender for encoding many variations of outgoing stream

Hybrid Topologies

There are various topologies for multi-endpoint conferences. Hybrid topologies include forward video while mixing audio or auto-switching between the configuration as load increases or decreases or by a paid premium of free plan

Hybrid model of forwarding and mixed streaming

Some endpoints receive forwarded streams while others receive mixed/composited streams.

Serverless models

Centralized topology in which one endpoint serves as an MCU or SFU. Used by Jitsi and Skype

Point to Multipoint Using Video-Switching MCUs

Much like MCU but unlike MCU can switch the bitrate and resolution stream based on the active speaker, host or presenter, floor control like characteristics.

This setup can embed the characteristics of translator, selector and can even do congestion control based on RTCP

To handle a multipoint conference scenario it acts as a translator forwarding the selected RTP stream under its own SSRC, with the appropriate CSRC values and modifies the RTCP RRs it forwards between the domains

Cascaded SFUs

SFU chained reduces latency while also enabling scalability however takes a toll on server network as well as endpoint resources.

Multipath RTP

Transport Protocols

RTP (Real-time Transport Protocol) is designed for delivering real-time audio, video, and other time-sensitive data over IP networks. However, RTP itself does not handle transport-layer delivery—it relies on underlying transport protocols. 

TCP is a reliable connection-oriented protocol that sends REQ and receives ACK to establish a connection between communicating parties. It sequentially ends packets which can be resent individually when the receiver recognizes out of order packets. It is thus used for session creation due to its errors correction and congestion control features.

Once a session is established it automatically shifts to RTP over UDP. UDP even though not as reliable, not guarantying non-duplication and delivery error correction, is used due to its tunneling methods where packets of other protocols are encapsulated inside of UDP packet. This suits RTP packet encapsulated. However to provide E2E security other methods for Auth and encryption are used.

SCTP could be a underlying transport for RTP theoretically but not used as such due to its limited support by network middle boxes , OS and embedded devices . Therefore only used for browser WebRTC data channels (alongside RTP for media).

QUIC is another modern alternative for RTP streams, which has build in encryption ( TLS 1.3). With multiplexing it avoids head of line blocking too.

ProtocolLatencyReliabilityCongestion ControlMulticastUse Case
UDP✅ Very Low❌ No❌ No inbuilt but GCC and BBR etc can be used✅ YesLive streaming, VoIP
TCP❌ High✅ Yes✅ Yes❌ NoVideo calls over unstable networks
SCTP⚠ Medium✅ Configurable✅ Yes❌ NoWebRTC data, telecom
QUIC⚠ Medium✅ Yes✅ Yes❌ NoWebRTC, video conferencing

Audio PCAP storage and Privacy constraints for Media Servers

A Call session produces various traces for offline monitoring and analysis which can include

CDR ( Call Detail Records ) – to , from numbers , ring time , answer time , duration etc

Signaling PCAPS – collected usually from SIP application server containing the SIP requests, SDP and responses. It shows the call flow sequences for example, who sent the INVITE and who send the BYE or CANCEL. How many times the call was updated or paused/resumed etc .

Media Stats – jitter , buffer , RTT , MOS for all legs and avg values

Audio PCAPS – this is the recording of the RTP stream and RTCP packets between the parties and requires explicit consent from the customer or user . The VoIP companies complying with GDPR cannot record Audio stream for calls and preserve for any purpose like audit , call quality debugging or an inspection by themselves.

Throwing more light on Audio PCAPS storage, assuming the user provides explicit permission to do so , here is the approach for carrying out the recording and storage operations.

Further more , strict access control , encryption and annonymisation of the media packets is necessary to obfuscate details of the call session.

References :

To learn about the difference between Media Server topologies

  • centralized vs decentralised,
  • SFU vs MCU ,
  • multicast vs unicast ,

Read – SIP conferecning and Media Bridge

SIP conferencing and Media Bridges

SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

To read more about building a scalable VoIP Server Side architecture and

  • Clustering the Servers with common cache for High availability and prompt failure recovery
  • Multitier architecture ie separation between Data/session and Application Server /Engine layer
  • Micro service based architecture ie diff between proxies like Load balancer, SBC, Backend services , OSS/BSS etc
  • Containerization and Autoscalling

Read – VoIP/ OTT / Telecom Solution startup’s strategy for Building a scalable flexible SIP platform

VoIP/ OTT / Telecom Solution startup’s strategy for building a scalable flexible SIP platform

Scalable and Flexible platform. Let’s go in-depth to discuss how can one go about achieving scalability in SIP platforms. ulti geography Scaled via Universal Router, Cluster SIP telephony Server for High Availability, Multi-tier cluster architecture, Role Abstraction / Micro-Service based architecture, uted Event management and Event Driven architecture , Containerization, autoscaling , security , policies…

Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1


Compression algorithms differ from media containers since they involves compressing the information in raw stream to reduce the size for streaming applications while media files are containers which are just used for playback from a set location.

Examples of Codecs: H.261, H.263, VC-1, MPEG-1, MPEG-2, MPEG-4, AVS1, AVS2, AVS3, VP8, VP9, AV1, AVC/H.264, HEVC/H.265, VVC/H.266, EVC, LCEVC

Examples of containers inlude :MPEG-1 System Stream, MPEG-2 Program Stream, MPEG-2 Transport Stream, MP4, MOV, MKV, WebM, AVI, FLV, IVF, MXF, HEIC so on.

MPEG 2

MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
Generic coding of moving pictures and associated audio information
Combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

MPEG2 is better than MPEG 1

Evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo), No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.

Application

  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard

MPEG -4

Video coding standards :-
MPEG-4 Part 2 Visual (ISO/IEC 14496-2) released in 1999 as MPEG-4 video codec
MPEG-4 Part 10 Advanced Video Coding (ISO/IEC 14496-10) released in 2003 as AVC/H.264 video codec;
MPEG-4 Part 14 (ISO/IEC 14496-14) MP4 file format is a media container. rather than a Codec ( compression algorithm).

H264

Introduced in 2004 as Advanced Video Coding (AVC)/H.264 or MPEG-4 AVC or ITU-T H.264/MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC). It is a widely supported vendor agnostic solution.

MPEG-4 Part 10 AVC/H.264 is better than MPEG2

  • 40-50% bit rate reduction compared to MPEG-2
  • Resolution support 4K (4,096×2,304) and 59.94 fps
  • 21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )

Applications of H264

  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat

H265

High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC. Streams high-quality videos in congested network environments or bandwidth constrained mobile networks.

  • (+) 2 times the video compression with the same video quality as H264.
  • (-) higher processing power required

Introduced in Jan 2013 as product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

H265 is better than H264

Overcome shortage of bandwidth, spectrum, storage and performs bandwidth savings of approx. 45% over H.264 encoded content

  • Resolutions up to 8192×4320, including 8K UHD
  • Supports up to 300 fps
  • 3 approved profiles, draft for additional 5 ; 13 levels

Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

Multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

  1. Enhanced Hybrid spatial-temporal prediction model
  2. CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures
  3. Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction). Individual rectangular regions that divide the image are independent
  4. Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.
  5. Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
  6. 33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
  7. Entropy coding is only CABAC

Applications of H265

  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

Reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4  
                                            
ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   
libavutil      56. 22.100 / 56. 22.100   
libavcodec     58. 35.100 / 58. 35.100   
libavformat    58. 20.100 / 58. 20.100   
libavdevice    58.  5.100 / 58.  5.100   
libavfilter     7. 40.101 /  7. 40.101   
libavresample   4.  0.  0 /  4.  0.  0   
libswscale      5.  3.100 /  5.  3.100   
libswresample   3.  3.100 /  3.  3.100   
libpostproc    55.  3.100 / 55.  3.100 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   
...

If you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support

HEVC bitstream is an ordered sequence of the syntax elements. Each syntax element is placed into a logical packet called a NAL (network abstraction layer) Unit. There are 64 different NAL Unit types. They can be grouped into 10 classes:

  1. VPS – Video parameter set
  2. SPS – Sequence parameter set
  3. PPS – Picture parameter set
  4. Slice (different types)
  5. AUD – Access unit delimiter signals the start of video frame
  6. EOS – End of sequence
  7. EOB – End of bitstream
  8. FD – Filler data for bitrate smoothening
  9. SEI – Supplemental enhancement information such as picture timing, color space information, etc.
  10. Reserved and unspecified

AV1

Realtime High quality video encoder , product of the Alliance for Open Media (AOM). Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC).

Av1 is better than H265

  • (+) AV1 is royalty free and overcomes the patent complexities around H265/HVEC

Applications

  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

SVC ( scalable video encoding)

SVC standardises the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. Asubset bitsteam can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal ( dropped packet) compared to the base stream it is derieved from.

  • Temporal (frame rate) scalability is enabled through structuring motion compensation dependencie so that complete pictures (i.e. their associated packets) can be dropped from the bitstream.
  • Spatial (picture size) scalability is enabled with video coded at multiple spatial resolutions
  • SNR/Quality/Fidelity scalability: video is coded at a single spatial resolution but at different qualities. The data and decoded samples of lower qualities can be used to predict data or samples of higher qualities in order to reduce the bit rate to code the higher qualities.
  • Combined scalability: a combination of the 3 scalability modalities described above.

Not all codecs sypport all modes. While the Av1 and VP9 support majority of modes defined in the table, VP8 only supports temporal scalability (e.g. “L1T2”, “L1T3”);H.264/SVC supports both temporal and spatial scalability but only permits transport of simulcast on distinct SSRCs.

Vp8

{
      "clockRate": 90000,
      "mimeType": "video/VP8",
      "scalabilityModes": [
        "L1T1",
        "L1T2",
        "L1T3"
      ]
    },
 const stream = await navigator.mediaDevices.getUserMedia(constraints);
    selfView.srcObject = stream;
    pc.addTransceiver(stream.getAudioTracks()[0], {direction: 'sendonly'});
    pc.addTransceiver(stream.getVideoTracks()[0], {
      direction: 'sendonly',
      sendEncodings: [
        {rid: 'q', scaleResolutionDownBy: 4.0, scalabilityMode: 'L1T3'}
        {rid: 'h', scaleResolutionDownBy: 2.0, scalabilityMode: 'L1T3'},
        {rid: 'f', scalabilityMode: 'L1T3'},
      ]    
    });


References :

Audio and Acoustic Signal Processing

Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

Application of audio Signal processing in general

  • storage
  • data compression
  • music information retrieval
  • speech processing ( emotion recognition/sentiment analysis , NLP)
  • localization
  • acoustic detection
  • Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
  • noise cancellation
  • acoustic fingerprinting
  • sound recognition ( speaker Identification , biometric speech verification , voice commands )
  • synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
  • enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

Effects for audio streams processing

  • delay or echo
    To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
    Implemented using tape delays or bucket-brigade devices.
  • flanger
    delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
    signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
  • phaser
    signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
  • chorus
    delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
  • equalization
    frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
    overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
  • pitch shift
    shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
  • time stretching
    changing the speed of an audio signal without affecting its pitch.
  • resonators
    emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
  • modulation
    change the frequency or amplitude of a carrier signal in relation to a predefined signal.
  • compression
    reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
  • 3D audio effects
    place sounds outside the stereo basis
  • reverse echo
    swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
  • wave field synthesis
    spatial audio rendering technique for the creation of virtual acoustic environments

ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

  • Acoustic echo control
    aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
  • Noise control
    microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
  • Gain control
    how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
  • Linear filtering
    ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
  • Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

ASP applications in music playback

AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

  • Post-processing
    techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
  • Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

ASP for virtual assitants

Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

  • Speech enhancement
    multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
  • Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
  • Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

Other areas of ASP

  • Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

Ref :
wikipedia – https://en.wikipedia.org/wiki/Audio_signal_processing
IEEE – https://signalprocessingsociety.org/publications-resources/blog/audio-and-acoustic-signal-processing%E2%80%99s-major-impact-smartphones

RTCP Reports and QoE metric calculation

RTCP works alongside RTP to monitor and control media streams with QoS feedback, synchronization and session management . This writeup describes the key format and functions of this protocol

  1. RTCP (Real-Time Transport Control Protocol )
    1. RTCP Control and Management
    2. Gathers statistics on media connection
    3. SR: Sender Report RTCP Packet
    4. RR: Receiver Report RTCP Packet
    5. SDES: Source Description RTCP Packet
    6. BYE: Goodbye RTCP Packet
    7. APP: Application-Defined RTCP Packet
  2. RTCP XR (Extended Reports) 
  3. Extended RTP Profile for RTCP Based Feedback (RTP/AVPF)
  4. RTCP operation modes
  5.  RTCP for multicast sessions with unicast feedback
  6. RTCP Extensions for Multiplexed Media Streams

    RTCP (Real-Time Transport Control Protocol )

    Real-time Transport Control Protocol (RTCP) defined in RFC 3550, is used to send control packets and feedback on QoS to participants in a call along with RTP which sends actual media packets. RTCP provides monitoring of the data delivery, qos in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.

    RTCP is typically on port RTP+1, e.g., RTP=5004 → RTCP=5005.
    Also RTCP uses 5% of total session bandwidth (RTP uses 95%). One can also adjust RTCP report intervals to avoid congestion.

    RTCP Control and Management

    RTCP provides feedback on the quality of the data distribution, congestion control, fault diagnosis, control of adaptive encoding. It is a periodic transmission of control packets. Since the control traffic is not self-limiting, the RR (Receiver Reports) from participants should be rate adjusted to limit traffic to the sender thus we observe for number of participants to estimate RTCP Transmission Interval for scaling up. This allows the communication system to monitor multimedia delivered on large multicast networks with hundreds of receivers. The underlying protocol must provide multiplexing of the data and control packets to convey minimal session control information such as Bytes sent, packets sent, lost packets, jitter, feedback and round trip delay.

    Gathers statistics on media connection

    Some metrics gathered by RTCP reports are :

    • timestamps
      • tp : last RTCP transmit time 
      • tr : curr time
      • tn : next schedule RTCP transmission time 
      • pmembers : last estimated count of members 
      • members: a current estimate of the number of session members 
      • senders 
      • rtcp_bw
      • avg_rtcp_size
    • flags as 
      • initial 
      • we_sent if the participant is sender ie has sent an RTP packet 
    • constants 
      • n is set to the number of receivers = members – senders
      • C : If sender then C =avg_rtcp_size / 25% of rtcp_bw else C=avg_rtcp_size / 75% of rtcp_bw

    Time Intervals calculation should be random but should give at least 25 of bw to senders and if sender >25% of members then split equally. This is done so that it is uniformly distributed and should avoid unintended synchronization or burst of RTCP packets to the sender. 

    • step 1 : Tmin=2.5 seconds if not yet sent an RTCP packet, else Tmin=5 seconds.
    • step 2 : Td = max(Tmin, n*C)
    • step 3 : T = 0.5 or 1.5 times Td.  
    • step 4 : resulting T is divided by e-3/2=1.21828 to compensate for the fact that the timer reconsideration algorithm converges to a value of the RTCP bandwidth below the intended average.

    Application may use this information to increase the quality of service, perhaps by limiting flow or using a different codec. 

    RTCP often uses the next consecutive port( odd number) as RTP( even number). Example screenshot shows port 20720 for RTP

    And next consecutive port 20721 for RTCP

    When RTCP is not being used or the CNAME identifier corresponding to a synchronization source has not been received yet, the participant associated with a synchronization source is not known.

    • (+) RTCP helps in monitoring the quality of service for every session
    • (+) RTCP sender and receiver reports allow the implementation of adaptive streaming where senders scale their bandwidth consumption based on network load.
    • (+) RCP SDES contains additional information like CNAME which helps in tracing of troublesome multimedia sources.( via email , phone number etc )

    Types of RTCP packet

    1. SR: Sender report, for transmission and reception statistics from participants that are active senders
    2. RR: Receiver report, for reception statistics from participants that are not active senders and in combination with SR for active senders reporting on more than 31 sources
    3. SDES: Source description items, including CNAME,email or phone
    4. BYE: Indicates end of participation
    5. APP: Application-specific functions

    SR is issued if a site has sent any data packets during the interval since issuing the last report or the previous one, otherwise the RR is issued.

    SR: Sender Report RTCP Packet

    Sender Report RTCP Packet.

    Expanded Sender Report RTCP Packet has sender information is 20 octets long and is present in every sender report packet. It summarizes the data transmissions from this sender.

    • NTP timestamp 64 bit
    • RTP Timestamp 32 bit
    • sender’s packet count: 32 bits, total number of RTP data packets transmitted by the sender since starting transmission up until the time this SR packet was generated.
    • sender’s octet count: 32 bits
    SR Report in RTCP

    Explanation for some attributes

    • highest sequence number received: 32 bits
    • fraction lost: 8 bits, fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent
    • cumulative number of packets lost: 24 bits size, total number of RTP data packets from source SSRC_n that have been lost since the beginning of reception.
    • interarrival jitter: 32 bits, estimate of the statistical variance of the RTP data packet interarrival time, measured in timestamp unit. Jitter J is mean deviation (smoothed absolute value) of the difference D is packet spacing at the receiver compared to the sender for a pair of packets.
    RTCP SR ( Senders Report)

    Synchronization and exposing delays using RTCP : For multimedia conferences the NTP timestamp from RTCP SR is used to give a common time reference that can associate these independent timestamps with a wall clock shared time. the NTP timestamps also help the endpoints measure their delays.

    RR: Receiver Report RTCP Packet

    Snapshot

    SDES: Source Description RTCP Packet

    abbrev. name value

    • END end of SDES list 0
    • CNAME canonical name 1
    • NAME user name 2
    • EMAIL user’s electronic mail address 3
    • PHONE user’s phone number 4
    • LOC geographic user location 5
    • TOOL name of application or tool 6
    • NOTE notice about the source 7
    • PRIV private extensions 8

    BYE: Goodbye RTCP Packet

    APP: Application-Defined RTCP Packet

    Intended for experimental use

    Instance of RTCP sender and receiver reports on transmission and reception statistics

    Real-time Transport Control Protocol (Receiver Report)
        [Stream setup by SDP (frame 4)]
            [Setup frame: 4]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 0001 = Reception report count: 1
        Packet type: Receiver Report (201)
        Length: 7 (32 bytes)
        Sender SSRC: 0x796dd0d6 (2037240022)
        Source 1
            Identifier: 0x00000000 (0)
            SSRC contents
                Fraction lost: 0 / 256
                Cumulative number of packets lost: 1
            Extended highest sequence number received: 6534
                Sequence number cycles count: 0
                Highest sequence number received: 6534
            Interarrival jitter: 0
            Last SR timestamp: 0 (0x00000000)
            Delay since last SR timestamp: 0 (0 milliseconds)
    Real-time Transport Control Protocol (Source description)
        [Stream setup by SDP (frame 4)]
            [Setup frame: 4]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 0001 = Source count: 1
        Packet type: Source description (202)
        Length: 6 (28 bytes)
        Chunk 1, SSRC/CSRC 0x796DD0D6
            Identifier: 0x796dd0d6 (2037240022)
            SDES items
                Type: CNAME (user and domain) (1)
                Length: 8
                Text: 796dd0d6
                Type: NOTE (note about source) (7)
                Length: 5
                Text: telecomorg
                Type: END (0)

    Negative Acknowledgment (NACK) packets can be used to explicitly indicate that packets have not been received.

    Full Intra Request (FIR) and Picture Loss Indication (PLI) packets are used for video to indicate that there is a need for the sender to produce a refresh point( key frame) in the stream.

    Receiver-Estimated Maximum Bitrate (REMB) feedback packets signal to a sender the maximum bitrate a receiver wishes to receive.

    Transport-wide Congestion Control (TCC) feedback packets are used to provide detailed packet-by-packet reception information from a receiver to the sender.

    RTCP XR (Extended Reports) 

    The purpose of the extended reporting format is to convey information that supplements the six statistics that are contained in the report blocks used by RTCP’s Sender Report (SR) and Receiver Report (RR) packets.

        0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT | type-specific | block length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : type-specific block contents :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Categories

    1. packet-by-packet reports on received or lost RTP packets

    Loss RLE Report Block (1)

        0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT=1 | rsvd. | T | block length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | SSRC of source |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | begin_seq | end_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | chunk 1 | chunk 2 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ... :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | chunk n-1 | chunk n |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Duplicate RLE Report Block ( 2)

    Packet Receipt Times Report Block (3)

        0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT=3 | rsvd. | T | block length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | SSRC of source |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | begin_seq | end_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Receipt time of packet begin_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Receipt time of packet (begin_seq + 1) mod 65536 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ... :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Receipt time of packet (end_seq - 1) mod 65536 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    1. reference time information between RTP participants

    Receiver Reference Time Report Block : Receiver-end wallclock timestamps(4)
    DLRR Report Block : delay since the last Receiver Reference Time Report Block was received (5)

    3. metrics relating to packet receipts, that are summary in nature

    Statistics summary block (6)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT=6 |L|D|J|ToH|rsvd.| block length = 9 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | SSRC of source |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | begin_seq | end_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | lost_packets |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | dup_packets |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | min_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | max_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | mean_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | dev_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | min_ttl_or_hl | max_ttl_or_hl |mean_ttl_or_hl | dev_ttl_or_hl |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    VOIP metric report block (7)

          0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | BT=7 | reserved | block length = 8 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | SSRC of source |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | loss rate | discard rate | burst density | gap density |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | burst duration | gap duration |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | round trip delay | end system delay |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | signal level | noise level | RERL | Gmin |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R factor | ext. R factor | MOS-LQ | MOS-CQ |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | RX config | reserved | JB nominal |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | JB maximum | JB abs max |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Extended RTP Profile for RTCP Based Feedback (RTP/AVPF)

      RTP provides continuous feedback about the overall reception quality from all receivers — thereby allowing the sender(s) in the mid-term to adapt their coding scheme and transmission behavior to the observed network quality of service (QoS).

      RTP makes no provision for timely feedback that would allow a sender to repair the media stream immediately: through retransmissions, retroactive Forward Error Correction (FEC) control, or media-specific mechanisms for some video codecs, such as reference picture selection.

      Components of RTCP based feedback

      • Status reports contained in sender report (SR)/received report (RR) packet transmitted at regular intervals . Can also contain SDES
      • FB ( Feedback ) messages . Indicate loss or reception of particular pieces of a media stream

      Types of RTCP Feedback packet

      Minimal compound RTCP feedback packet

      This is to minimize the size of the RTCP packet transmitted to convey feedback and maximize the frequency at which feedback can be provided. MUST contain only the mandatory information :

      • encryption prefix if necessary,
      • exactly one RR or SR,
      • exactly one SDES with only the CNAME item present, and
      • FB message(s)

      Full compound RTCP feedback packet

      MAY contain any additional number of RTCP packet

      RTCP operation modes

      1. Immediate Feedback mode
      2. Early RTCP mode
      3. Regular RTCP Mode

      The Application specific feedback threshold is a function of a number of parameters including (but not necessarily limited to):

      • type of feedback used (e.g., ACK vs. NACK),
      • bandwidth,
      • packet rate,
      • packet loss
      • probability and distribution,
      • media type,
      • codec,
      • (worst case or observed) frequency of events to report (e.g., frame received, packet lost).

      Payload specific Feedback messages

      Three payload-specific FB messages are defined so far plus an application layer FB message. They are identified by means of the FMT parameter as follows:

      • 0: unassigned
      • 1: Picture Loss Indication (PLI)
      • 2: Slice Loss Indication (SLI)
      • 3: Reference Picture Selection Indication (RPSI)
      • 4-14: unassigned
      • 15: Application layer FB (AFB) message
      • 16-30: unassigned
      • 31: reserved for future expansion of the sequence number space

       RTCP for multicast sessions with unicast feedback

      Single-source multicast sessions (e.g., IPTV, live streaming) where receivers provide feedback via unicast RTCP. These feedbacks can quickly overwhelm the sender as it will receive as many feedback as the number of viewers which is many folds in case of large scale deployments for example webinars.

      To mitigate this feedback implosion, sender aggregates reports instead of processing individual multicast feedback in Multicast Acquisition Report (MAR)

      a=rtcp-fb:* x-mar unicast   // Supports MAR over unicast
      a=rtcp-fb:* nack unicast // NACKs sent via unicast

      RTCP Extensions for Multiplexed Media Streams

      For multiplexed media streams , where different kinds of media share a common port, we use payload type and SSRC to distinguish streams. we can mux the rtcp too. The Offer/Answer Negotiation has the following attribute in SDP

      a=rtcp-mux
      FeaturesRTP and RTCP on different port RTP , RTCP mux
      Number of ports21
      NAT simplicity complex with per stream overhead for ICE candidates gatheringrelatively simple , only 1 pinhole

      References :

      • RFC 3611 RTP Control Protocol Extended Reports (RTCP XR)
      • RFC 4585 Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)
      • RFC 7002 RTP Control Protocol (RTCP) Extended Report (XR) Block for Discard Count Metric Reporting
      • RFC 7003 RTP Control Protocol (RTCP) Extended Report (XR) Block for Burst/Gap Discard Metric Reporting

      RealTime Transport protocol (RTP) and supporting protocols


      RTP is a protocol for delivering media stream end-to-end in real time over an IP network. Its applications include VoIP with SIP/XMPP, push to talk, WebRTC and teleconf, IOT media streaming, audio/video or simulation data, over multicast or unicast network services so on.

      RTSP provides stream control features to an RTP stream along with session management.

      RTCP is also a companion protocol to RTP, used for feedback and inter-frame synchronization.

      • Receiver Reports (RRs) include information about the packet loss, interarrival jitter, and a timestamp allowing computation of the round-trip time between the sender and receiver.
      • Sender Reports( SR) include the number of packets and bytes sent, and a pair of timestamps facilitating inter-stream synchronization.

      SRTP provides security by end-to-end encryption while SDP provides session negotiation capabilities.

      In this article I will be going over RTP and its associated protocols in depth to show the inner workings in a RTP media streaming session.

      RTP (Real-time Transport Protocol)

      RTP handles realtime multimedia transport between end to end network components . RFC 3550. RTP is extensible in headers format and simplified the application integration ( encryption , padding) and even use of proxies, mixers, translators etc.

      Image result for RTP packet structure
      Packet structure of RTP     
      Image result for RTP header structure
      RTP Header contain timestamp , name of media source , codec type and sequence number .

      RTP is independent of the underlying transport and network layers and can be described as an application layer protocol dealing with IP networks. While RTP was originally adapted from VAT( now obsolete) it was designed to be protocol independent ie it can be used with non-IP protocols like ATM, AAL5 as well as IP protocols IPV4, and IPv6. It does not address resource reservations and does not guarantee the quality of service for real-time services. However, it does provide services like payload type identification, sequence numbering, timestamping and delivery monitoring.

      RTP Packet via Wireshark
      RTP Packet Headers and position in packet

      The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence. Usage : Multimedia Multi particpant conferences, Storage of continuous data, Interactive distributed simulation, active badge, control and measurement applications.

      RTP Session

      Real-Time Transport Protocol
          [Stream setup by SDP (frame 554)]
              [Setup frame: 554]
              [Setup Method: SDP]
          10.. .... = Version: RFC 1889 Version (2)
          ..0. .... = Padding: False
          ...0 .... = Extension: False
          .... 0000 = Contributing source identifiers count: 0
          0... .... = Marker: False
          Payload type: ITU-T G.711 PCMU (0)
          Sequence number: 39644
          [Extended sequence number: 39644]
          Timestamp: 2256601824
          Synchronization Source identifier: 0x78006c62 (2013293666)
          Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

      Ordering via Timestamp (TS) and Sequence Number (SN)

      • TS ( Timestamp) used to order packets in correct timing order,
      • SN ( Sequence Number ) is used to detect packet loss

      For a video frame that spans multiple packets – TS is same but SN is different.

      Payload

      RTP payload type is a 7-bit numeric identifier that identifies a payload format. 

      Audio

      • 0 PCMU
      • 1 reserved (previously FS-1016 CELP)
      • 2 reserved (previously G721 or G726-32)
      • 3 GSM
      • 4 G723
      • 8 PCMA
      • 9 G722
      • 12 QCELP
      • 13 CN
      • 14 MPA
      • 15 G728
      • 18 G729
      • 19 reserved (previously CN)

      Video

      • 25 CELB
      • 26 JPEG
      • 28 nv
      • 31 H261
      • 32 MPV
      • 33 MP2T
      • 34 H263
      • 72-76 reserved
      • 77–95 unassigned
      • dynamic H263-1998, H263-2000
      • dynamic (or profile) H264 AVC, H264 SVC , H265theora , iLBC , PCMA-WB ( G711 a law) , PCMU-WB ( G711 u law)G718, G719, G7221, vorbis , opus , speex , VP8 , VP9, raw , ac3 , eac3,

      Note : difference between PCMA ( G711 alaw) and PCMU ( G711 u law)G.711 μ-law tends to give more resolution to higher range signals while G.711 A-law provides more quantization levels at lower signal levels.

      Dynamic Payloads

      Dynamic payload in RTP A/V Profile , unlike static ones above, are not assigned by IANA. They are assigned by means outside of the RTP profile or protocol specifications.

      Tones

      • dynamic tone
      • telephone event ( DTMF)

      These codes were initially specified in RFC 1890, “RTP Profile for Audio and Video Conferences with Minimal Control” (AVP profile), superseded by RFC 3550, and are registered as MIME types in RFC 3555.  Now registering static payload types is now considered a deprecated practice in favor of dynamic payload type negotiation.

      Session identifiers

      SSRC was designed for distinguishing several sources by labelling them differently. In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |V=2|P|X|  CC   |M|     PT      |       sequence number         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                           timestamp                           |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |           synchronization source (SSRC) identifier            |
         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
         |            contributing source (CSRC) identifiers             |
         |                             ....                              |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      

      Synchronization source (SSRC) is a 32-bit numeric SSRC identifier for the source of a stream of RTP packets. This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. All packets from a synchronisation source form part of the exact timing and sequence number space, so a receiver groups packets by synchronisation source for playback.  The binding of the SSRC identifiers is provided through RTCP. If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

      Contributing source (CSRC) – A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the SSRC identifiers of the sources, called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet. An example application is – audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

      Timestamp calculation

      Timesatmp is picked independantly from other streams in session. It is incemeneted based on packetization interval times the sampling rate. For example

      • audio 8000 Hz sampled at 20 ms, has blocks timestamps t1:160, t2:t1+160.. actual sampling may differ sligntly form this nominal rate.
      • video clock rate 90 kHz, frame rate 30 f/s would have blocks timestamps t1:3000 , t2:t1+3000.. For 25 fps t1:3600, t2:t1+3600. In some software coders timestamps can also be computer from the system clock such as gettimeofday()

      Cross media synchronization using timestamp: RTP timestamp and NTP timestamps form a pair that identify the absolute time of a particular sample in the stream.

      NTP timestamp in RTCP Sender Report SR
      Numerical timestamp in RTP packet in the same session

      UDP provides best-effort delivery of datagrams for point-to-point as well as for multicast communications.

      Threading and Queues by RTP stacks

      Reception and transmission queues handled are by the RTP stack.

      Packet Reception – Application does not directly read packets from sockets but gets them from a reception queue. RTP stack is responsuble for updating this queue.

      Packt transmission – Packets are not directly written to sockets but inserted in a transmission queue handled by the stack.

      Incoming packet queue takes care of functions such as packet reordering or filtering out duplicate packets.

      Threading model – Most libraries uses separate execution thread for each RTP session handling the queues.

      RTSP (Real-Time Streaming Protocol)

      RTSP is is Streaming Session protocol using RTP. It is also a network control protocol which uses TCP to maintain an end-to-end connection. Session protocols are actually negotiation/session establishment protocols that assist multimedia applications.

      Applications : control real-time streaming media applications such as live audio and HD video streaming.
      RTSP establishes a media session between RTSP end-points ( can be 2 RTSP media servers too) and initiates RTP streams to deliver the audio and video payload from the RTSP media servers to the clients.

      Flow for RTSP stream between client and Server

      1. Initialize the RTP stack on Server and Client – Can be done by calling the constructor for object and ind initilaizing object with arguments

      At Server

      Server rtspserver = new Server();

      At client

      Client rtsplient = new Client();

      2. Initiate TCP connection with the client and server respectively (via socket ) for the RTSP session

      At Server

      ServerSocket listenSocket = new ServerSocket(RTSPport);
      rtspserver.RTSPsocket = listenSocket.accept();
      rtspserver.ClientIPAddr = rtspserver.RTSPsocket.getInetAddress();

      At Client

      rtspclient.RTSPsocket = new Socket(ServerIPAddr, RTSP_server_port);

      3. Set input and output stream filters

      RTSPBufferedReader = new BufferedReader(new InputStreamReader(theServer.RTSPsocket.getInputStream()));
      RTSPBufferedWriter = new BufferedWriter(new OutputStreamWriter(theServer.RTSPsocket.getOutputStream()));

      4. Parse and Reply to RTSP commands

      ReadLine from RTSPBufferedReader and parse tokens to get the RTSP request type

      request = rtspserver.parse_RTSP_request();

      On receiving each request send the appropriate response using RTSPBufferedWriter

      rtspserver.send_RTSP_response();

      Request Can be either of DESCRIBE, SETUP , PLAY, PAUSe , TEARDOWN

      4. TEARDOWN RTSP Command

      Either calls destructor which release the resources and end the session or call the BYE explicietly and close sockets

      rtspserver.RTSPsocket.close();
      rtspserver.RTPsocket.close();

      RTP processing

      1. At Transmitter ( Server) – packetization of the video data into RTP packets.

      This involves creating the packet, set the fields in the packet header, and copy the payload (i.e., one video frame) into the packet.

      Get next frame to send from the video and build the RTP packet

      RTPpacket rtp_packet = new RTPpacket(MJPEG_TYPE, imagenb, imagenb * FRAME_PERIOD, buf, video.getnextframe(buf));

      RTP header formation from above accept parameters – PType, SequenceNumber, TimeStamp , buffer byte[] data and data_length of next frame in buffer go in the packet

      3. TransmitterRetrieve the packet bitstream and store it in an array of bytes and send it as Datagram packet over UDP socket

      senddp = new DatagramPacket(packet_bits, packet_length, ClientIPAddr, RTP_dest_port);
      RTPsocket.send(senddp);

      4. At Receiver – construct a new DatagramSocket to received RTP packets, on client’s RTP port

      rcvdp = new DatagramPacket(buf, buf.length);
      RTPsocket.receive(rcvdp);

      5. Receiver RTP packet header and payload retrival

      RTPpacket rtp_packet = new RTPpacket(rcvdp.getData(), rcvdp.getLength());
      rtp_packet.getsequencenumber(); 
      rtp_packet.getpayload(payload); // payload is bitstreams

      6. Decode the payload as image/ video frame / audio segment and send for consumption by player or file or socket etc.

      SRTP (Secure Real-time Transport Protocol)

      Neither RTP or RTCP provide any flow encryption or authentication means, which is where SRTP comes into picture. SRTP is the security layer which resides between the RTP/RTCP application layer and the transport layer. It provides confidentiality, message authentication, and replay protection for both unicast and multicast RTP and RTCP streams.

      SRTP Packet

      Cryptographic context includes includes

      • session key used directly in encryption/message authentication
      • master key securely exchanged random bit string used to derive session keys
      • other working session parameters ( master key lifetime, master key identifier and length, FEC parameters, etc)
        it must be maintained by both the sender and receiver of these streams.

      Salting keys” are used to protect against pre-computation and time-memory tradeoff attacks.

      To learn more about SRTP specifically visit : https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/

      RTP in a VoIP Communication system and Conference streaming

      Simulcast

      Client encodes the same audio/video stream twice in different resolutions and bitrates and sending these to a router who then decides who receives which of the streams.

      Multicast Audio Conference

      Assume obtaining a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets. The audio conferencing application used by each conference participant sends audio data in small chunks of ms duration. Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.

      The RTP header indicates what type of audio encoding (such as PCM, ADPCM or LPC) is contained in each packet so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.

      Every packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. Thus RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source. The sequence number can also be used by the receiver to estimate how many packets are being lost.

      For QoS, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP(control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits.

      A site sends the RTCP BYE packet when it leaves the conference.

      Audio and Video Conference

      Audio and video media are transmitted as separate RTP sessions, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name in the RTCP packets for both so that the sessions can be associated.

      Synchronized playback of a source’s audio and video is achieved using timing information carried in the RTCP packets

      Layered Encodings

      In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
      Rate-adaptation should be done by a layered encoding with a layered transmission system.

      In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

      Mixers, Translators and Monitors

      Note that in a VOIP system, where SIP is a signaling protocol , a SIP signalling proxy never participates in the media flow, thus it is media agnostic.

      Mixer

      An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

      Example of Mixer for hi-speed to low-speed packet stream conversion . In conference cases where few participants are connected through a low-speed link where other have hi-speed link, instead of forcing lower-bandwidth, reduced-quality audio encoding for all, an RTP-level relay called a mixer may be placed near the low-bandwidth area.

      This mixer resynchronises incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed links.

      All data packets originating from a mixer will be identified as having the mixer as their synchronization source.

      RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.

      Translator

      An intermediate system that forwards RTP packets with their synchronization source identifier intact.

      Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls.

      Translator for Firewall Limiting IP packet pass

      Some of the intended participants in the audio conference may be connected with high bandwidth links but might not be directly reachable via IP multicast, for reasons such as being behind an application-level firewall that will not let any IP packets pass. For these sites, mixing may not be necessary, in which case another type of RTP-level relay called a translator may be used.

      Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through asecure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site’s internal network.

      Other cases :

      video mixers can scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

      Translator usage when connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.

      Monitor

      An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics.

      Layered Encodings

      In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion. Rate-adaptation should be done by a layered encoding with a layered transmission system.

      In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

      Multiplexing RTP Sessions

      In RTP, multiplexing is provided by the destination transport address (network address and port number) which is different for each RTP session ( seprate for audio and video ). This helps in cases where there is chaneg in encodings , change of clockrates , detection of packet loss suffered and RTCP reporting .
      Moreover RTP mixer would not be able to combine interleaved streams of incompatible media into one stream.

      Interleaving packets with different RTP media types but using the same SSRC would introduce several problems. But multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions.

      REMB ( Receiver Estimated Maximum Bitrate)

      RTCP message used to provide bandwidth estimation in order to avoid creating congestion in the network. Support for this message is negotiated in the Offer/Answer SDP Exchange. Contains total estimated available bitrate on the path to the receiving side of this RTP session (in mantissa + exponent format). REMB is used by

      • sender to configure the maximum bitrate of the video encoding.
      • notify the available bandwidth in the network and by media servers to limit the amount of bitrate the sender is allowed to send.

      In Chrome it is deprecated in favor of the new sender side bandwidth estimation based on RTCP Transport Feedback messages.

      Session Description Protocol (SDP) Capability Negotiation

      SDP Offer/Answer flow

      RTP can carry multiple formats.For each class of application (e.g., audio, video), RTP defines a profile and associated payload formats. Session Description Protocol used to specify the parameters for the sessions.

      Usually in voIP systems SDP packets describing a session with codecs , open ports , media formats etc are embedded in a SIP request such as INVITE .

      SDP can negotiate use of one out of several possible transport protocols. The offerer uses the expected least-common-denominator (plain RTP) as the actual configuration, and the alternative transport protocols as the potential configurations.

      m=audio 53456 RTP/AVP 0 18
      a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF
      

      plain RTP (RTP/AVP)
      Secure RTP (RTP/SAVP)
      RTP with RTCP-based feedback (RTP/AVPF)
      Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)

      Adaptive bitrate control

      Adapt the audio and video codec bitrates to the available bandwidth, and hence optimize audio & video quality. For video, since resolution is chosen at the start only , encoder use bitrate and frame-rate attributes only during runtime to adapt.

      RTCP packet called TMMBR (Temporary Maximum Media Stream Bit Rate Request) is sent to the remote client.


      References:

      SIP conferencing and Media Bridges


      SIP is the most popular signaling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario, yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must handle multimedia streams. It should also provide conference control for building communication and collaboration apps. These solutions should be new and customisable.

      SIP Recap

      Apart from VoIP, it is used in other multimedia technologies like online games, video conferencing, instant messaging and other services. SIP is an IETF-defined signaling protocol for controlling communication sessions over IP. It is an application layer protocol, which runs on TCP, UDP and SCTP. SIP is based on the Web protocol Hypertext Transfer Protocol (HTTP) and is a request/response protocol.

      SIPv1 :SIPv1 was text-based. It used Session Description Protocol (SDP) to describe the sessions and UDP as a transport protocol. SIPv1 only handled session establishment and did not handle Mid-conference controls.

      SIPv2 :Simple Conference Invitation Protocol( SCIP) utilized Transmission Control Protocol (TCP) as the transport protocol to manage conferences. Ur was based on HTTP and used e-mail addresses as identifiers for users. SIP v2 was also text-based but based on HTTP and could use both UDP and TCP as transport protocols. The combination of SIPv1 and SCIP resulted in Session Initiation Protocol. 

      SIP is used to distribute session descriptions among potential participants. Once the session description is distributed, SIP can be used to negotiate and modify session parameters. SIP can also terminate the session. Role of SIP in conference involves

      • initiating confs
      • inviting participants
      • enabling them to join conf
      • leave conf
      • terminate conf
      • expel participants
      • configure media flow
      • control activities in conf

      Mesh vs star topology

      Mesh has p2p streaming thus has maximum data privacy and low cost for service provider because there arnt any media stream to take care of. Infact it just comes out of the box with WebRTC peerconnections. But ofcourse you cant scale a p2p mesh based architecture . Although the communication provider is now indifferent to the media stream traffic , the call quality of session is entirely dependent of the end clients processing and their bandwidths which in my experience cannot accommodate more than 20-25 participants in a call even above average bandwidth of 30-40 Mbps uplink , downlink both.
      On the other hand in a star topolgy the participants only need to communicate with the media server , irrespective of the network conditions of the receivers .

      Centralised (star) structure

      In a Centralized ( star) signalling model , all communication flows via a centralized control point

      Applications of star topology could be MCU and SFU.

      Centralised Media / MCU

      Centralized Media / MCU

      Multipoint Control Unit (MCU) uses mixer found in video conferencing bridges.

      • (+) proven interworking with legacy systems
      • (+) single point to manage transcoding
      • (+) energy efficient mode of operation , keeping cleint side stream management low
      • (+) single point for DTMF inband/signaling processing
      • (-) CPU and resource intensive on server side
      • (-) adds latency for traversal via media server
      • (-) self managed scaling , heavy tarffic and resource maintained
      • (-) possible security vulnerability as server decryptes media packets

      Centralised Media via SFU

      SFU + simulcast

      Single Forwarding Unit ( SFU) is a neew topology where the centralized media server only forwards or proxies the streams without mixing.

      • (+) scales for low latency video streaming
      • (+) less CPU consumption of server
      • (+) can control output stream for each peers based on their network capabilities
      • (-) still susceptable to security vulnerability at the focal point.

      Decentralised structure

      In a decentralized ( mesh) signaling structure, participants can communicate p2p

      Decentralized media, Multi unicast streaming

      Decentralized media, Multicast streaming

      Mesh based communication

      Limitations of WebRTC mesh Architecture

      WebRTC is intrinsically a p2p system and as more participants join the session , the network begins to resemble a mesh. Audio and textual data being the lighter option from heavy video media streams can still adjust to the difficult conditions without much noticible lag. However video streams take a hit when peers are on difficult bandwidth and use differnt qualities of video sources.

      Lets assume 3 different clients communication on WebRTC mesh session

      1. WebRTC browser on high resolution system ( desktop , laptop , kiosk) – this client will likely have high quality stream and would like to consume high quality as well
      2. Mobile browser of native WebRTC client – this will have average quality stream and may fluctuate owing to telecom network handover or instability in moving beween locations
      3. Embedded system like Raspberry pi with camera module – since this is an embedded system likley part of IoT surveillance system , it will try to restrict the outgoing usuage and incoming stream consumption to minimal.

      Some issue with WebRTC mesh conference include

      • Unmatched quality of stream for individual p2p streams in mesh make it difficult to have a homogeneous session quality.
      • Often video packet go out of sync with audio packets leading to delay or freezing due to packet loss.
      • Pixelating video when resolution of incoming video does not match the viewers display mode eg : low quality 320×280 pixel video viewed on desktop monitor with 1080×720 resolution.
      • Different source encoders at peers WebRTC client behave different . eg : webrtc stream from an embedded system like Rpi will be different from that of a WebRTC browser like Safari or mozilla or a mobile browser like chrome on Android.

      Although with auto adjustments in WebRTC’s media stack , combinations of bitrate and resolution are manipulated in realtime based on feedback packets to adjust the qualities of your video streaming to bandwidth constraints of your own and the peer, there exist many difficulties to have large number of partcipants ( in order of few tens to hundreds) to join the mesh session. Even with an excellent connection and great scale of bandwidth of 5G networks it is just not feasible to host even upto 100 users on a common mesh based video system.

      Unicast, BroastCast and Multicast media distribution

      UnicastBroadCastMultiCast
      one-to-one transmissionone-to-all within a range

      Its types are
      – limited broadcast and
      – Direct broadcast
      one-to-many

      servers direct single streams towards any listener who wants to connect. the stream is replicated many times across the network.
      usage : RTC over the networks between two specific endpointsusage : conference streamingusage : IPTV that distributes to hundres or thousands of viewers

      Inspite of both being a star topology, SFU/Selective Forwarding Unit is different from MCU as in contrast to MCU it does not do any heavy duty processing on media streams , it only fetches the stream and routes them to other peers .

      On the other hand MCU ( Multipoint Control Unit ) media servers need a lot of computational strength to perform many operations on RTP stream such as mixing , multiplexing, filytering echo /noise etc.

      Scalable Video Coding (SVC) for large groups

      while simulcast streams multiple versions of the same stream with differenet qualities like resolutions where the SFU can pick the appropriate one for the destination. SFU can also forward different framerates to different destinations absed on their bandwidth. Some of the Conference Bridge types :-

      1. Bridge

      Centralised entity to book conf , start conf , leave conf . Therefore single point of failure potentially .

      • To create conf : conf created on a bridge URL , bridge registers on SIP Server, participants join the conf on the bridge using INVITES
      • To stop conf : either participant can Leave with BYE or conf can terminate by sending BYE to all

      2. Endpoints as Mixer

      Endpoints handle stream , decentralized media , therefore adhoc suited

      (-) mixer UAs cannot leave untill conf finishes

      3. Mesh

      complex and more processing power on each UA required

      • (+) no single point of failure
      • (-) high network congestion and endpoint processing
      • (-) endpoints have to handle NATIng

      Large scale multiparticipant WebRTC sessions

      A MCU ( Media control Unit) which acts as a bridge between all particpants is a traditionally used system to host large conferences. However a MCU limits or lowers the bandwidth usuage by packing the streams together .A SFU ( Single Forwarding Unit ) on the other hand simply forwards the stream.

      This setup is usualy designed with heavy bandwidth and upload rates in mind and are more scalable and resilient to bad quality stream than p2p type mesh setups. As these media gateways servers scale to accommodate more simultaneous real time users , their bandwidth consumption is heavy and expensive( some thing to be kept in mind while buying instances fro cloud providers like azure or AWS).

      Some of the many options to make SFU (single forwarding unit setup) for WebRTC mediastreams are listed below :-

      Kurento

      Opensource (Apache 2.0) WebRTC gateways that has buildin integration to OpenCV.

      Pipeline Architecture Design Pattern

      Features in KMS ( Kurento Media Server) include Augmentation, face reciognition, filetrs, Object tracking and even virtual fencing.

      Other features like mixing , transcoding, recording as well as client APIs make it suitable for integration into rich multimedia applications.

      • (+) It can function as both MCU and SFU.
      • (+) Added Media processing and transformations – Augmented Reality , Blending , Mixing , Analyzing ..
      • (+) ML freindly + openCV filter ( samples provided )
      • (+) pipeline used with computer vision

      Nightly build, good docuemntion and developer gtraction make this a good choice. Latest version at the time of writing this article is Kurento 6.15.0 release on november 2020.

      Licode

      Opensource (MIT) WebRTC Comm platform by Lynckia.

      Simple and starightforward to build from source . Latest release is v8 on sep 2019.

      Erizo, its WebRTC core, is by default is SFU but also can be switched to MCU for more features like output streaming, transcoding.

      It is written in c++ and uses nodejs API to communicate with server.

      Supports modules like recording which can be added.

      Jitsi

      Opensource (Aapache 2.0)Video conferencing called Jitsi Video Bridge ( jvb).

      JITSI Components

      • Jitsi VideoBridge – SFU
      • Jicofo – “Focus” component that initiates Jingle on behalf of JVB
      • Jigasi – SIP to XMPP signaling and media gateway
      • Jirecon – Component that allows for recording of RTP streams
      • Jibri – New live recording component

      Other client side components and SDK

      • lib-jitsi-meet/Strophe.js – Javascript (Browser/Node.js)
      • XMPPFramework/MeetRTC_iOS – iOS
      • Smack – Java/Android
      Jits conferencing ( SFU)
      • (+) Supports high capacity SFU. Provides tools ( jibri) for recording and/or streaming.
      • (+) Has Android and iOS SDKs.
      • (-) Low sip support ( more on XMPP) Orignally uses XMPP signalling but can communicate with SIP platfroms using a gateway which is part of Jitsi project .

      It is best used as a binary package on debina / ubuntu instead of self Maven compile. The most recent release is 2.0.5390 release on 12 Jan 2021.

      MediaSoup

      Opensource ( ISC) SFU conferecing server for both WebRTc and plain non secured RTP.

      Producer consumer archietecture design pattern.

      • (+) It is signalling agnostic
      • (+) nodejs module in server ( media handling in cpp)
      • (+) Provides JS and c++ client libraries
      • (+) audio/video consumers or consumers can be Gstreamer and FFMPEG scripts

      Relatively new with less documentation however simpleistic and minimilistic deisgn make it easy to grasp and run.

      Janus

      WebRTc gateway is also opensource ( GNU GPL v3)

      Build on C. It does have ability to switch between SFU and MCU and provides pligins on top like recording.

      By default uses a Websocket based protocol , HTTP/JSON and XMPP but can communicate with SIP platforms too using plugin.

      Asterisk SFU

      MCU based pure SIP signalling and media server ( GNU GPL v2 ) from Sangoma Technologies.

      Powerful server core to many OTT / VOIP providers and call centre platfroms.

      • (+) Can be modified to any role using combination of hundres of modules.
      • (-) Project does not provide client SDK.

      Red5

      live streaming with SDK for native (ios , android) and html5. cutom server side application.

      • (+) supports Ip camera , drone , RTSP, RTMP , hardware encoders ( many client instances )
      • (+) failover to HLS and flash

      JANUS as WebRTC SFU

      This article is to introduce Janus based SFU media architecture for WebRTC systems. It also aims at evaluating the performance of the VP9 simulcast p2p model with Janus based SFU system.


      Janus

      We know Janus is a popular small footprint gateway/media Server with support for WebRTC features like JSEP/SDP, ICE, DTLS-SRTP, DataChannels. Its plugins expose Janus API over different transports – HTTP, WS, rabbitMQ, Unix sockets, MQTT so on. Other plugins provide deeper modifications such as Video SFU, Audio MCU, SIP/RTSP gateways, broadcasting etc.

      Centralised media conferencing / mixing service using Janus

      Streaming Video via CDN

      Tip : use mixing for Audio and Relay for video in large participant sessions.

      Server Selection and Setup from source

      Janus Setup on public ec2 isntance. Since the a media server can clog the server space make sure to choose alarge instance like c5.large. Sample log file size

      Install prerequisite libraries

      sudo apt-get install libmicrohttpd-dev libjansson-dev libnice-dev libssl-dev libsrtp-dev libsofia-sip-ua-dev libglib2.0-dev libopus-dev libogg-dev libini-config-dev libcollection-dev libwebsockets-dev pkg-config gengetopt automake libtool doxygen graphviz git libconfig-dev cmake

      Make Sure libsrtp v2.x is installed instead of existing 1.4 in debain /ubuntu default such as

      libsrtp0-dev is already the newest version (1.4.5~20130609~dfsg-2ubuntu1).

      Manually installing libsrtp2 from https://github.com/cisco/libsrtp

      wget https://github.com/cisco/libsrtp/archive/v2.2.0.tar.gz
      tar xfv v2.2.0.tar.gz
      cd libsrtp-2.2.0
      ./configure --prefix=/usr --enable-openssl
      make shared_library && sudo make install

      Since we are going to use websocket as signalling, ensure lws (libwebsocket) >= 2.0.0 is installed

      git clone https://libwebsockets.org/repo/libwebsockets
      cd libwebsockets
      mkdir build
      cd build
      cmake -DLWS_MAX_SMP=1 -DLWS_WITHOUT_EXTENSIONS=0 -DCMAKE_INSTALL_PREFIX:PATH=/usr -DCMAKE_C_FLAGS="-fpic" 
      make && sudo make install

      Also libnice (at least v0.1.16) need to be installed

      Get Janus source code and configure

      git clone git://github.com/meetecho/janus-gateway.git
      cd janus-gateway
      sh autogen.sh
      ./configure 
      --disable-data-channels 
      --disable-docs 
      --prefix=/opt/janus LDFLAGS="-L/usr/local/lib -Wl,-rpath=/usr/local/lib" CFLAGS="-I/usr/local/include"

      At this point Janus prinst a config detail such as below and asks if its ok . Ensure Websocket is yes. One can start the make process if it is

      Compiler:                  gcc
      libsrtp version:           2.x
      SSL/crypto library:        OpenSSL
      DTLS set-timeout:          not available
      Mutex implementation:      GMutex (native futex on Linux)
      DataChannels support:      no
      Recordings post-processor: no
      TURN REST API client:      no
      Doxygen documentation:     no
      Transports:
          REST (HTTP/HTTPS):     yes
          WebSockets:            yes
          RabbitMQ:              no
          MQTT:                  no
          Unix Sockets:          yes
          Nanomsg:               no
      Plugins:
          Echo Test:             yes
          Streaming:             yes
          Video Call:            yes
          SIP Gateway:           yes
          NoSIP (RTP Bridge):    yes
          Audio Bridge:          yes
          Video Room:            yes
          Voice Mail:            yes
          Record&Play:           yes
          Text Room:             yes
          Lua Interpreter:       no
          Duktape Interpreter:   no
      Event handlers:
          Sample event handler:  no
          WebSocket ev. handler: yes
          RabbitMQ event handler:no
          MQTT event handler:    no
          Nanomsg event handler: no
          GELF event handler:    yes
      External loggers:
          JSON file logger:      no
      JavaScript modules:        no
      

      Make and install

      make && sudo make install

      Making configs

      sudo make configs

      For details refer to actual README on Janus git repo https://github.com/meetecho/janus-gateway

      Start

      By default, Janus starts in foreground, as a console application that stop with Ctrl + c command

      ./janus

      Starting janus as backvroudn or daemon service

      sudo ./janus -N -d 0 --daemon --log-file /var/log/janus

      tailing the log file

      > tail -f /var/log/janus
      ---------------------------------------------------
        Starting Meetecho Janus (WebRTC Server) v0.11.1
      ---------------------------------------------------
      
      Checking command line arguments...
      Debug/log level is 0
      Debug/log timestamps are disabled
      Debug/log colors are enabled

      references : https://janus.conf.meetecho.com/docs/service.html

      Issues with installation of janus

      Before resolving any issue, it is important to verify whether you are runing multiple versions of janus on your system

      altanai@xx:~/janus-gateway$ janus --version
      Janus commit: bed081c03ad4854f3956898646198100a8ed82cf
      Compiled on:  Fri Apr  2 07:28:14 UTC 2021
      
      janus 0.11.1
      
      altanai@xx:~/janus-gateway$ ./janus --version
      Janus commit: c970528037d46d52a4bc1fd7db2a44c900c2baf2
      Compiled on:  Thu Apr  8 18:01:36 UTC 2021
      
      janus 0.10.8

      If this is so, delete the one you dont require. I’d go with rmeoove the one installed globally.

      Cmake version error

      If you see an error in rabbitmq instalation , which is part of janus installation

      CMake 3.12...3.18 or higher is required.  You are running version 3.10.2

      then proceed to remove existing version and install updates ne by first removing tghe existing cmake version

      sudo apt remove cmake
      wget https://github.com/Kitware/CMake/releases/download/v3.20.1/cmake-3.20.1-linux-x86_64.sh
      sh cmake-3.20.1-linux-x86_64.sh

      check version

      >> cmake-3.20.1-linux-x86_64/bin/cmake --version
      cmake version 3.20.1
      

      lib websocket error in libjanus_websockets.so

      Couldn't load transport plugin 'libjanus_websockets.so': /opt/janus/lib/janus/transports/libjanus_websockets.so: undefined symbol: lws_hdr_custom_length

      Libwebsockets (LWS) is a flexible, lightweight pure C library for implementing modern network protocols easily with a tiny footprint, using a nonblocking event loop.

      Fisrt check the version of libwebsockets u have on ur machine . If it is lesse that 2.0.0 version

      >> apt show libwebsockets-dev
      
      Package: libwebsockets-dev
      Version: 2.0.3-3build1
      Priority: optional
      Section: universe/libdevel
      Source: libwebsockets
      Origin: Ubuntu
      Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
      Original-Maintainer: Laszlo Boszormenyi (GCS) <gcs@debian.org>
      Bugs: https://bugs.launchpad.net/ubuntu/+filebug
      Installed-Size: 502 kB
      Depends: libwebsockets8 (= 2.0.3-3build1), libev-dev, libssl-dev, libuv1-dev, zlib1g-dev
      Homepage: https://libwebsockets.org/
      Download-Size: 133 kB
      APT-Manual-Installed: yes
      APT-Sources: http://us-west-1.ec2.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages
      Description: lightweight C websockets library - development files
       Libwebsockets is a lightweight pure C library for both websockets
       clients and servers built to use minimal CPU and memory resources
       and provide fast throughput in both directions.
       .
       This package contains the header files needed for developing programs
       using libwebsockets and a static library.
      

      Follow the steps to install libwebsocket ,  libwebsockets >= 2.0.0, the current v3.2-stable is good http://libwebsockets.org/

      Failures due to ICE

      ICE failed for component 1 in stream 1… 
      No WebRTC media anymore
       WebRTC resources freed;

      Self Signed SSL certs

      Make a new directory for SSL certs

      mkdir /home/ubuntu/ssl
      
      generate certs 
      
      openssl req -new -newkey rsa:4096 -nodes -keyout key.pem -out cert.csr
      openssl x509 -req -sha256 -days 365 -in cert.csr -signkey key.pem -out cert.pem
      chmod 600 cert.csr
      chmod 600 cert.pem
      chmod 600 key.pem

      Remember to allow advanced access if the ssl are self signed and not from a varified CA

      Update janus config

      Path for Janus configuration is usually /opt/janus/etc/janus/janus.jcfg as given by the prefix during instalation. some of the types of config are

      >> vi /opt/janus/etc/janus/janus.jcfg
      general: {
         configs_folder = "/opt/janus/etc/janus" # Configuration files folder
         plugins_folder = "/opt/janus/lib/janus/plugins" # Plugins folder
         transports_folder = "/opt/janus/lib/janus/transports" # Transports folder
         events_folder = "/opt/janus/lib/janus/events" # Event handlers folder
         loggers_folder = "/opt/janus/lib/janus/loggers" # External loggers folder
      ....
      }
      • SSL certs for DTLS can also be specified under certificates
      • Restricting media ports between a range . Comes in handy when u have to selectively open ports ona cloud instance such as AWS instance
      media: {
              ipv6 = false
              #min_nack_queue = 500
              rtp_port_range = "20000-40000"
              #dtls_mtu = 1200
              #no_media_timer = 1
              #slowlink_threshold = 4
              #twcc_period = 100
              #dtls_timeout = 500
      ....
      }
      • changing NAT for STUN/TURN servers to gather ICE

      using 1:1 NAT( suited to Amzon EC2) requires specifying public address

      nat_1_1_mapping = "1.1.1.1"
      • To change transport http , set http to true from false

      goto installation location for janus config

      cd /opt/janus/etc/janus/
      > vi /opt/janus/etc/janus/janus.transport.http.jcfg
      
      general: {
          #events = true                 # Whether to notify event handlers about transport events (default=true)
          json = "indented"              # Whether the JSON messages should be indented (default),
                                         # plain (no indentation) or compact (no indentation and no spaces)
          base_path = "/janus"           # Base path to bind to in the web server (plain HTTP only)
          http = true                    # Whether to enable the plain HTTP interface
          port = 8088                    # Web server HTTP port
          #interface = "eth0"            # Whether we should bind this server to a specific interface onlya
          #ip = "x.x.x.x"                # Whether we should bind this server to a specific IP address (v4 or v6) only
          https = true                   # Whether to enable HTTPS (default=false)
          secure_port = 8089             # Web server HTTPS port, if enabled
          #secure_interface = "eth0"     # Whether we should bind this server to a specific interface only
          #secure_ip = "192.168.0.1"     # Whether we should bind this server to a specific IP address (v4 or v6) only
          #acl = "127.,192.168.0."       # Only allow requests coming from this comma separated list of addresses
      }

      Change path for certs and key pem files

      > vi janus.jcfg
      
      certificates: {
              cert_pem = "/home/ubuntu/ssl/cert.pem"
              cert_key = "/home/ubutu/ssl/key.pem"
              #cert_pwd = "secretpassphrase"
              #dtls_accept_selfsigned = false
              #dtls_ciphers = "your-desired-openssl-ciphers"
              #rsa_private_key = false
            #ciphers = "PFS:-VERS-TLS1.0:-VERS-TLS1.1:-3DES-CBC:-ARCFOUR-128"
      }

      There are options for other ciphers and passphrass , leave them for now.

      • To change transport ws for websocket
      general: {
              #events = true            # Whether to notify event handlers about transport events (default=true)
              json = "indented"         # Whether the JSON messages should be indented (default),
                                        # plain (no indentation) or compact (no indentation and no spaces)
              #pingpong_trigger = 30    # After how many seconds of idle, a PING should be sent
              #pingpong_timeout = 10    # After how many seconds of not getting a PONG, a timeout should be detected
      
              ws = true                 # Whether to enable the WebSockets API
              ws_port = 8188            # WebSockets server port
              #ws_interface = "eth0"    # Whether we should bind this server to a specific interface only
              #ws_ip = "192.168.0.1"    # Whether we should bind this server to a specific IP address only
              wss = true                # Whether to enable secure WebSockets
              wss_port = 8989           # WebSockets server secure port, if enabled
              #wss_interface = "eth0"   # Whether we should bind this server to a specific interface only
              #wss_ip = "192.168.0.1"   # Whether we should bind this server to a specific IP address only
              #ws_logging = "err,warn"  # libwebsockets debugging level as a comma separated list of things
                                        # to debug, supported values: err, warn, notice, info, debug, parser,
                                        # header, ext, client, latency, user, count (plus 'none' and 'all')
              #ws_acl = "127.,192.168.0." # Only allow requests coming from this comma separated list of addresses
      }

      Simillar to http transport conf above , also update the certs section with valid path for ssl certs

      Check Janus Running

      Running ports check ( http )

      > sudo lsof -i  | grep janus
      janus     26450          ubuntu    7u  IPv6 418550      0t0  UDP *:rfe 
      janus     26450          ubuntu    8u  IPv6 418551      0t0  UDP *:5004 
      janus     26450          ubuntu   17u  IPv6 418557      0t0  TCP *:omniorb (LISTEN)
      janus     26450          ubuntu   20u  IPv6 672518      0t0  TCP *:8089 (LISTEN)
      janus     26450          ubuntu   24u  IPv4 672519      0t0  TCP *:8188 (LISTEN)
      janus     26450          ubuntu   25u  IPv4 672520      0t0  TCP *:8989 (LISTEN) 

      Janus INFO

      Going to deployed janus URL like http://x.x.x.x:8088/janus/info

         "janus": "server_info",
         "transaction": "rGuo8HNI4wu",
         "name": "Janus WebRTC Gateway",
         "version": 26,
         "version_string": "0.2.6",
         "author": "Meetecho s.r.l.",
         "commit-hash": "not-a-git-repo",
         "compile-time": "Wed Feb 28 07:02:15 UTC 2018",
         "log-to-stdout": true,
         "log-to-file": false,
         "data_channels": false,
         "session-timeout": 60,
         "server-name": "MyJanusInstance",
         "local-ip": "x.x.x.x",
         "ipv6": false,
         "ice-lite": false,
         "ice-tcp": false,
         "api_secret": false,
         "auth_token": false,
         "event_handlers": false,
         "transports": {
            "janus.transport.http": {
               "name": "JANUS REST (HTTP/HTTPS) transport plugin",
               "author": "Meetecho s.r.l.",
               "description": "This transport plugin adds REST (HTTP/HTTPS) support to the Janus API via libmicrohttpd.",
               "version_string": "0.0.2",
               "version": 2
            },
            "janus.transport.websockets": {
               "name": "JANUS WebSockets transport plugin",
               "author": "Meetecho s.r.l.",
               "description": "This transport plugin adds WebSockets support to the Janus API via libwebsockets.",
               "version_string": "0.0.1",
               "version": 1
            }
         },
         "events": {},
         "plugins": {
            "janus.plugin.voicemail": {
               "name": "JANUS VoiceMail plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a plugin implementing a very simple VoiceMail service for Janus, recording Opus streams.",
               "version_string": "0.0.6",
               "version": 6
            },
            "janus.plugin.audiobridge": {
               "name": "JANUS AudioBridge plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a plugin implementing an audio conference bridge for Janus, mixing Opus streams.",
               "version_string": "0.0.10",
               "version": 10
            },
            "janus.plugin.echotest": {
               "name": "JANUS EchoTest plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a trivial EchoTest plugin for Janus, just used to showcase the plugin interface.",
               "version_string": "0.0.7",
               "version": 7
            },
            "janus.plugin.recordplay": {
               "name": "JANUS Record&amp;Play plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a trivial Record&amp;Play plugin for Janus, to record WebRTC sessions and replay them.",
               "version_string": "0.0.4",
               "version": 4
            },
            "janus.plugin.textroom": {
               "name": "JANUS TextRoom plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a plugin implementing a text-only room for Janus, using DataChannels.",
               "version_string": "0.0.2",
               "version": 2
            },
            "janus.plugin.videoroom": {
               "name": "JANUS VideoRoom plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a plugin implementing a videoconferencing SFU (Selective Forwarding Unit) for Janus, that is an audio/video router.",
               "version_string": "0.0.9",
               "version": 9
            },
            "janus.plugin.sipre": {
               "name": "JANUS SIPre plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a simple SIP plugin for Janus (based on libre instead of Sofia), allowing WebRTC peers to register at a SIP server and call SIP user agents through the gateway.",
               "version_string": "0.0.1",
               "version": 1
            },
            "janus.plugin.videocall": {
               "name": "JANUS VideoCall plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a simple video call plugin for Janus, allowing two WebRTC peers to call each other through the gateway.",
               "version_string": "0.0.6",
               "version": 6
            },
            "janus.plugin.streaming": {
               "name": "JANUS Streaming plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a streaming plugin for Janus, allowing WebRTC peers to watch/listen to pre-recorded files or media generated by gstreamer.",
               "version_string": "0.0.8",
               "version": 8
            },
            "janus.plugin.nosip": {
               "name": "JANUS NoSIP plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a simple RTP bridging plugin that leaves signalling details (e.g., SIP) up to the application.",
               "version_string": "0.0.1",
               "version": 1
            },
            "janus.plugin.sip": {
               "name": "JANUS SIP plugin",
               "author": "Meetecho s.r.l.",
               "description": "This is a simple SIP plugin for Janus, allowing WebRTC peers to register at a SIP server and call SIP user agents through the gateway.",
               "version_string": "0.0.7",
               "version": 7
            }
         }
      }

      ECHO Test

      An Echo test is part of Janus and can be done via reached via deployed server url https://x.x.x.x:8084/echotest.html

      For a valid SDP message notice the contact headers in chrome://webrtc-internals

      Local Description (mungled)

      O line represents origin
      
      type: offer, sdp: v=0
      o=- 1331622865522329207 2 IN IP4 127.0.0.1
      
      
      Audio Media stream in m line 
      
      m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
      c=IN IP4 0.0.0.0
      
      Video media stream in m line 
      
      m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123
      c=IN IP4 0.0.0.0
      
      
      Data
      
      m=application 9 UDP/DTLS/SCTP webrtc-datachannel
      c=IN IP4 0.0.0.0
      

      Remote Description

      Origin Line
      
      type: answer, sdp: v=0
      o=- 6397148042001786590 2 IN IP4 54.193.51.199
      
      Audio Stream
      
      m=audio 9 UDP/TLS/RTP/SAVPF 111 9 0 8 106 105 13 110 112 113
      c=IN IP4 54.193.51.199
      
      Video Stream
      
      m=video 9 UDP/TLS/RTP/SAVPF 96 98 100 102 127 125 108 97 99 101 122 121 107 109
      c=IN IP4 54.193.51.199
      
      Data 
      
      m=application 0 UDP/DTLS/SCTP 0
      c=IN IP4 54.193.51.199

      VideoCall – https://x.x.x.x:8084/videocalltest.html

      Run

      ./opt/janus/bin/janus

      Plugin (above ) and then procceeds to join the room ( screenshot below)

      After this Janus procced to do JSEP offer / answer signalling with ICE checks and then does keepalive checks for the duraion of the session .

      Stream Types

      Unicast

      a=sendonly
      a=recvonly
      a=sendrecv
      a=inactive

      Simulcast

      https://x.x.x.x:8084/videocalltest.html?simulcast=true

      LD Libbary Path Exception

      If you see an missing encryption exception , start janus with LD_LIBRARY_PATH

      export LD_LIBRARY_PATH=/usr/lib && /opt/janus/bin/janus --debug-level=7 --stun-server=stun.l.google.com:19302 --event-handlers --full-trickle

      ICE resolution in Runtime

      Trickle ICE Command Line params give console output as

      [janus.cfg]
          general: {
              debug_level: 7
          }
          certificates: {
          }
          nat: {
              stun_server: stun.l.google.com
              stun_port: 19302
              full_trickle: true
          }
          media: {
          }
          transports: {
          }
          plugins: {
          }
          events: {
              broadcast: yes
          }
          loggers: {
          }

      With NAT 1:1 mapping

      export LD_LIBRARY_PATH=/usr/lib && /opt/janus/bin/janus --debug-level=7 --nat-1-1=54.193.51.199 --stun-server=stun.l.google.com:19302 --rtp-port-range=20000-50000 

      1 to 1 NAT mpapping coordinates with amazon ec2

      [janus.cfg]
          general: {
              debug_level: 7
          }
          certificates: {
          }
          nat: {
              nat_1_1_mapping: 54.193.51.199
          }
          media: {
              rtp_port_range: 20000-50000
          }
          transports: {
          }
          plugins: {
          }
          events: {
          }
          loggers: {
          }

      WebRTC Client for Janus System

      Test client

      Using the html sample clients

      cd html
      admin.html            citeus.html      e2etest.html   index.html      nosiptest.html          siptest.html        textroomtest.html   voicemailtest.html audiobridgetest.html  demos.html       echotest.html  multiopus.html  recordplaytest.html     streamingtest.html  videocalltest.html  vp9svctest.html canvas.html           devicetest.html  footer.html    navbar.html     screensharingtest.html  support.html        videoroomtest.html

      check lsb release and then Install Nodejs to host client page

      >> lsb_release -a
      curl -sL https://deb.nodesource.com/setup_15.x | sudo -E bash -
      sudo apt-get install -y nodejs

      Http WebServer

      npm init
      npm install http-server
      node_modules/http-server/bin/http-server

      To be able to compile native addons from npm you’ll need to install the development tools:

      sudo apt install build-essential

      open the client in webrtc supported browser at http://x.x.x.x:8080/videoroomtest.html

      If you see a issues

       No WebRTC media anymore; 0x7fbe20001a40 0x7fbe20002440
       WebRTC resources freed; 0x7fbe20001a40 0x7fbe20001320
       Handle and related resources freed; 0x7fbe20001a40 0x7fbe20001320

      Then it is likley a NAT issues and you should look at jcfg config file to see whether the interface is specified corectly

      # Generic settings
      interface = "x.x.x.x"                                     # Interface to use (will be used in SDP)
      server_name = "WebrtcMCUJanus"                                  # Public name of this Janus instance

      Security consideration before taking janus live

      Hide the dependencies

      hide_dependencies = true  

      By default, a call to the “info” endpoint of either the Janus or Admin API now als returns the versions of the main dependencies (e.g., libnice, libsrtp, which crypto library is in use and so on). Should you want that info not to be disclose, set ‘hide_dependencies’ to true.

      Common spam attack. source https://groups.google.com/g/meetecho-janus/c/IcJd3e4V1F8

      Issues : Janus is throwing an error Invalid url /ws/v1/cluster/apps/new-application

      Possible solution : changing port of http transport

      References


      To read more about Media Archietctures and topologies such as point-point , point-to-multipoint , multicast , translators, mixers and SFU goto :

      Media Architecture, RTP topologies

      With the sudden onset of Covid-19 and building trend of working-from-home , the demand for building scalable conferncing solution and virtual meeting room has skyrocketed . Here is my advice if you are building a auto- scalable conferencing solution

      To read more on the topologies of conferencing systems ( Mesh vs Star , Unicast vs Multicast and SVC) read :

      SIP conferencing and Media Bridges

      SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

      Video analytics

      Today , there are billions of  of video cameras in our homes,  phones , ATMs ,  baby monitors , laptops , smart watches , traffic monitoring , IOT devices , bots , you name it. The underlying purpose of most of them  is to capture media streams and optimize the content for further processing.

      Stages of video Analytics-

      1.Data Acquisition

      Data gathered from multiple camera sourced need to be streamed in an aggregated manner to data processing unit in an analytics engine  or for archiving . It may be subjected to overall monitoring such as sunlight hours etc  or detailed low level object identification  such as facial recognition of passengers

      2.Transformation to Data Sets

      The assimilated data is grouped to operable entities. The identification and classification is done by adding attributes to recognizable shapes ,movements , edges and patterns  .

      3.Calculate deviation or compliance

      A trained model recognizes normal behavior and differentiation from the same is calculated .

       

      Video content Analytics in surveillance 

      Considering the use case for monitoring and surveillance cameras  , there is a growing need for realtime video analytics  , to ” detect and determine the temporal and spatial events “. Consider Surveillance cam recordings as forensic evidence or just monitoring incidents and reporting the specific crucial segments of video , both these usecases involve filtering a vast amount of recorded or steaming media to filter out the exact parts that the authorities are looking for. This involves custom identification and recognition of objects in frames .

      There is growing research into extracting real time events of interest , minimizing the search time and maximizing accuracy from computer vision .

      Consider following use-cases :

      1. Surveillance cam in solar farms or home based setups to predict sun light hours and forecast energy generation value  . Described in greater details here .
      2. Traffic monitoring cameras :
      • Automatic license / number plate recognition – surveillance cams for traffic need to record vehicle plate number to identify and tag the vehicles as they pass by  .
      • Car  dashboard cams for investigative purposes  post accidents and insurance claims
      • Motion tracking – Mapping the vehicle movement to detect any wrong turns , overtakes , parking etc
      • Scan for QR codes and passes at toll gates.
      • Identifying over-speeding vehicles

      3. Security and Law enforcement

      • Trigger alarms or lockdowns  on suspicious activity or intrusion into safe facility
      • Virtual fencing and perimeter breach – Map facial identification from known suspects
      • Detection of left items and acceleration of emergency response

      Communication based video analytics 

      Unified enterprise communication , conferences , meeting , online webcasts , webinars , social messengers , online project demos are extensively using video analytics for building intuitive use cases and boost innovation around their platform . Few examples of vast number of usecases are

      1. Sentiment Analysis : Capturing emotions by mapping key words to ascertain whether the meeting went , happy , positive , productive or  sad , complaining , negative
      2. Augmented Reality for overlaying information such as interactive manual or a image . Areas for current usage include , e-learning and customer support .
      3. Dynamic masking for privacy

      Autonomous Robot purpose Video analytics

      Self driving drones , cars and even bots extensive use the feed from wide angle / fish eye lens cameras to create a 3D model of their movement in given space of 3 dimensional coordinates system.

      Key technologies includes :

      1. Ego-motion estimation – mapping a 3D space with captured camera feed
      2. Deep Learning ( part of AI) from continuous feed from video cameras to find route around obstacles
      3. Remote monitoring for an unmanned vehicle
      4. Sterile monitoring for a unreachable or hazardous area example war-zone , outer  territorial objects as moon , mars , satellites

      Bio-metrics based Video analytics 

      Often video feed is now used for advanced search, redaction and facial recognition , which leads to features such as

      • unlocking laptop or phone
      • performing click with blink of eyes
      • creating concentration maps of webpage based on where eyes focused

      Read more about role of webrtc in bio- metrics here

      Video analytics in Industrial and Retail applications 

      Application of video analytics in Industrial landscape are manyfold . On one hand it can be for intelligence and  information gathering such as worker foot count . Machine left unattended etc while on the other hand by using specific image optimization techniques can also audit automation testing of engines , machine parts , rotation counts etc .

      1. Flame and Smoke Detection – images from video streams are analysed for color chrominance, flickering ratio, shape, pattern and moving direction to ascertain a fire hazard.
      2. Collect demographics of the area with people counting
      3. Ensure quality control and procedural compliance
      4. Identify tail-gateing or loitering

       

      List of few companies focusing on Video Analytics :

      1. Avigilon -http://avigilon.com/products/video-analytics/video-analytics/
      2. 3VR – http://www.3vr.com/products/videoanalytics
      3. Intelli-vision – https://www.intelli-vision.com/
      4. IPsotek – https://www.ipsotek.com
      5. aimetis – http://www.aimetis.com/

       

      Edge Analytics

      Performing data analytics at application level on the edge whole system architecture instead of at the core or data warehouse level. The advantage to the computation at fringes of network instead of a centralized system are faster response times and standalone off grid functionality support .

      The humongous data collected over by IOT devices m machinery  , sensors  , servers , firewalls , routers , switches gateways and all other types of components are increasingly getting analysed and acted upon at the edge ,  independently with machine learning instead of data centers and network operation centers . With the help of feedback loops and deep learning one could add data drive intelligence to how operations are performed at critical machines arts such as autonomous bots or industrial setups.

      Error Recovery and streaming

      To control the incoming data stream , we divide and classify the content into packets , add custom classification header and stream them down to the server. In the event of data congestion of back pressure , some non significant packets are either dropped or added to a new dead queue .  The system  is thus made stable for high availability and BCP / fail-over recovery .

       

       

      Gstreamer

      GStreamer ( LGPL )is a media handling library written in C for applications such as streaming, recording, playback, mixing and editing attributes so on. Even enhanced applications such as transcoding , media format conversion , streaming servers for embedded devices ( read more about Gstreamer in RPi in my article here).

      It is a pipeline based multimedia framework.It encompas various codecs, filters and is modular with plugins development to enhance its capabilities. Media Streaming application developers use it as part of their framework at either the broadcaster’s end or as a media player.

      Pipeline simple example1

      gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

      ex2

      gst-launch-1.0 filesrc location=file_example_MP4_480_1_5MG.mp4 ! qtdemux ! h264parse ! avdec_h264 ! autovideosink

      Key Features of GStreamer include

      1. Modular and Plugin-Based Architecture:
        • allowing developers to extend its functionality by adding new plugins.
        • Plugins provide support for various codecs, formats, filters, and input/output devices.
      2. Pipeline-Based Processing:
        • GStreamer uses a pipeline model where multimedia data flows through a series of interconnected elements (e.g., sources, filters, sinks).
        • Each element performs a specific task, such as decoding, encoding, or rendering.
      3. Cross-Platform Support:
        • GStreamer runs on Linux, Windows, macOS, Android, and other platforms, making it versatile for different environments.
      4. Wide Range of Supported Formats:
        • GStreamer supports a vast array of audio and video formats, including MP3, AAC, H.264, VP9, and more.
        • It can handle both compressed and uncompressed media streams.
      5. Real-Time Streaming:
        • GStreamer is capable of real-time streaming, making it suitable for applications like video conferencing, IP cameras, and live broadcasting.
      6. Customizable and Extensible GStreamer’s API allows for fine-grained control over media processing.
        • C: The native API for GStreamer.
        • Python: Bindings for Python .
        • C++: Bindings for C++.
        • Rust: Bindings for Rust.
      7. Integration with Other Technologies, frameworks and libraries, such as OpenCV for computer vision, PulseAudio for audio, and WebRTC for real-time communication.

      Core Concepts:

      • Elements: The basic building blocks of a GStreamer pipeline. Examples include:
        • Source elements: Capture or generate data (e.g., reading from a file or a webcam).
        • Filter elements: Process data (e.g., decoding, encoding, or applying effects).
        • Sink elements: Output data (e.g., rendering video to a screen or writing to a file).
      • Pads: Connection points on elements where data flows in (sink pads) or out (source pads).
      • Pipeline: A container for elements that defines the flow of media data.
      • Bus: A communication channel for messages (e.g., errors, state changes) between the pipeline and the application.

      Some Use Cases:

      Server-Side Processing: GStreamer can be used for transcoding, mixing, or analyzing media on servers.

      Media Players: GStreamer is used in media players like Totem and Rhythmbox.

      gst-launch-1.0 playbin uri=file://($pwd)/file_example_MP4_480_1_5MG.mp4

      Also you can play with Debug options

      GST_DEBUG=2 gst-launch-1.0 filesrc location=yourfile.mp4 ! qtdemux name=demux\ 
      demux.video_0 ! queue ! decodebin ! autovideosink \
      demux.audio_0 ! queue ! decodebin ! autoaudiosink

      Video Editing: It can be used for non-linear video editing and processing.

      Streaming: GStreamer is used in applications like IPTV, video conferencing, and live streaming.
      One can use any sample source fot testruns

      • videotestsrc: Generates test video patterns.
      • audiotestsrc: Generates test audio tones.
      • v4l2src: Captures video from a webcam (Linux).
      • ximagesrc: Captures the screen (Linux).

      Embedded Systems: It is popular in embedded devices for multimedia playback and capture.

      GStreamer-1.8.1 rtsp server and client on ubuntu – Install and configuration for a RTSP Streaming server and Client

      Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

       Attempts of streaming / broadcasting Live Video WebRTC call to non WebRTC supported browsers and media players such as VLC , ffplay , default video player in Linux etc .

      Code Snippets

      To list all packages of Gstreamer

      pkg-config --list-all | grep gstreamer
      • gstreamer-gl-1.0 GStreamer OpenGL Plugins Libraries – Streaming media framework, OpenGL plugins libraries
      • gstreamer-bad-video-1.0GStreamer bad video library – Bad video library for GStreamer elements
      • gstreamer-tag-1.0 GStreamer Tag Library – Tag base classes and helper functions
      • gstreamer-bad-base-1.0 GStreamer bad base classes – Bad base classes for GStreamer elements
      • gstreamer-net-1.0GStreamer networking library – Network-enabled GStreamer plug-ins and clocking
      • gstreamer-sdp-1.0 GStreamer SDP Library – SDP helper functions
      • gstreamer-1.0 GStreamer – Streaming media framework
      • gstreamer-bad-audio-1.0 GStreamer bad audio library, uninstalled – Bad audio library for GStreamer elements, Not Installedgstreamer-allocators-1.0 GStreamer Allocators Library – Allocators implementation
      • gstreamer-player-1.0 GStreamer Player – GStreamer Player convenience library
      • gstreamer-insertbin-1.0 GStreamer Insert Bin – Bin to automatically and insertally link elements
      • gstreamer-plugins-base-1.0 GStreamer Base Plugins Libraries – Streaming media framework, base plugins libraries
      • gstreamer-vaapi-glx-1.0 GStreamer VA-API (GLX) Plugins Libraries – Streaming media framework, VA-API (GLX) plugins librariesgstreamer-codecparsers-1.0 GStreamer codec parsers – Bitstream parsers for GStreamer elementsgstreamer-base-1.0 GStreamer base classes – Base classes for GStreamer elements
      • gstreamer-app-1.0 GStreamer Application Library – Helper functions and base classes for application integration
      • gstreamer-vaapi-drm-1.0 GStreamer VA-API (DRM) Plugins Libraries – Streaming media framework, VA-API (DRM) plugins librariesgstreamer-check-1.0 GStreamer check unit testing – Unit testing helper library for GStreamer modules
      • gstreamer-vaapi-1.0 GStreamer VA-API Plugins Libraries – Streaming media framework, VA-API plugins libraries
      • gstreamer-controller-1.0 GStreamer controller – Dynamic parameter control for GStreamer elements
      • gstreamer-video-1.0 GStreamer Video Library – Video base classes and helper functions
      • gstreamer-vaapi-wayland-1.0 GStreamer VA-API (Wayland) Plugins Libraries – Streaming media framework, VA-API (Wayland) plugins libraries
      • gstreamer-fft-1.0 GStreamer FFT Library – FFT implementation
      • gstreamer-mpegts-1.0 GStreamer MPEG-TS – GStreamer MPEG-TS support
      • gstreamer-pbutils-1.0 GStreamer Base Utils Library – General utility functions
      • gstreamer-vaapi-x11-1.0 GStreamer VA-API (X11) Plugins Libraries – Streaming media framework, VA-API (X11) plugins libraries
      • gstreamer-rtp-1.0 GStreamer RTP Library – RTP base classes and helper functions
      • gstreamer-rtsp-1.0 GStreamer RTSP Library – RTSP base classes and helper functions
      • gstreamer-riff-1.0 GStreamer RIFF Library – RIFF helper functions
      • gstreamer-audio-1.0 GStreamer Audio library – Audio helper functions and base classes
      • gstreamer-plugins-bad-1.0 GStreamer Bad Plugin libraries – Streaming media framework, bad plugins libraries
      • gstreamer-rtsp-server-1.0 gst-rtsp-server – GStreamer based RTSP server

      At the time of writing this article Gstreamer an much early version in 1.X , which was newer than its then stable version 0.x. Since then the library has updated many fold. summarising release highlights for major versions as the blog was updated over time .

      Ex1 : Capture video from your webcam and save it as an MP4 file

      > v4l2-ctl --list-devices
      Integrated_Webcam_HD: Integrate (usb-0000:00:14.0-5):
      /dev/video0
      gst-launch-1.0 v4l2src ! videoconvert ! x264enc ! mp4mux ! filesink location=webcam_output.mp4

      Ex 2: Record Screen and Save as MP4

      gst-launch-1.0 ximagesrc ! videoconvert ! x264enc ! mp4mux ! filesink location=screen_record.mp4

      After interrupt

      Project : Making and IP survillance system using gstreamer and Janus

      To build a turn-key easily deployable surveillance solution 

      Features :

      1. Paring of Android Mobile with box
      2. Live streaming from Box to Android
      3. Video Recording inside the  box
      4. Auto parsing of recorded video around motion detection 
      5. Event listeners 
      6. 2 way audio
      7. Inbuild Media Control Unit
      8. Efficient use of bandwidth 
      9. Secure session while live-streaming

      Modules

      1. Authentication ( OTP / username- password)
      2. Livestreaming on Opus / vp8 
      3. Session Security and keepalives for live-streaming sessions
      4. Sync local videos to cloud storage 
      5. Record and playback with timeline and events 
      6. Parsing and restructuring video ( transcoding may also be required ) 
      7. Coturn server for NAT and ICE
      8. Web platform on box ( user interface )+ NoSQL
      9. Web platform on Cloud server ( Admin interface )+ NoSQL
      10.  REST APIs for third party add-ons ( Node based )
      11. Android demo app for receiving the live stream and feeds

      Varrying experiments and working gstreamer commands

      Local Network Stream 

      To create /dev/video0

      modprobe bcm2835-v4l2

      To stream on rtspserver using rpicamsrc using h264 parse. Adjust the pipeline based on your specific requirements (e.g., resolution, bitrate, or codec).

      ./gst-rtsp-server-1.4.4/examples/test-launch --gst-debug=2 '(rpicamsrc num-buffers=5000 ! 'video/x-h264,width=1080,height=720,framerate=30/1' ! h264parse ! rtph264pay name=pay0 pt=96 )'

      ./test-launch “( tcpclientsrc host=127.0.0.1 port=5000 ! gdpdepay ! rtph264pay name=pay0 pt=96 )”

      pipe raspivid to tcpserversink

      raspivid -t 0 -w 800 -h 600 -fps 25 -g 5 -b 4000000 -vf -n -o - | gst-launch-1.0 -v fdsrc ! h264parse ! gdppay ! tcpserversink host=127.0.0.1 port=5000;

      Stream Video over local Network with 15 fps

      raspivid -n -ih -t 0 -rot 0 -w 1280 -h 720 -fps 15 -b 1000000 -o - | nc -l -p 5001

      streaming video over local network with 30FPS and higher bitrate

      raspivid -n -t 0 -rot 0 -w 1920 -h 1080 -fps 30 -b 5000000 -o - | nc -l -p 5001

      Recording

      Audio record to file
      Using arecord :

      arecord -D plughw:1 -c1 -r 48000 -f S16_LE -t wav -v file.wav;

      Using pulse :
      pulseAudio -D

      gst-launch-1.0 -v pulsesrc device=hw:1 volume=8.0 ! audio/x-raw,format=S16LE ! audioconvert ! voaacenc bitrate=48000 ! aacparse ! flvmux ! filesink location = "testaudio.flv";

      Video record to file ( mpg)

      gst-launch-1.0 -e rpicamsrc bitrate=500000 ! 'video/x-h264,width=640,height=480’ ! mux. avimux name=mux ! filesink location=testvideo2.mpg;

      Video record to file ( flv )

      gst-launch-1.0 -e rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! flvmux ! filesink location="testvieo.flv";

      Video record to file ( h264)
      gst-launch-1.0 -e rpicamsrc bitrate=500000 ! filesink location=”raw3.h264″;

      Video record to file ( mp4)

      gst-launch-1.0 -e rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! mp4mux ! filesink location=video.mp4;

      Audio + Video record to file ( flv)

      gst-launch-1.0 -e /
      rpicamsrc bitrate=500000 ! /
      video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! muxout. /
      pulsesrc volume=8.0 ! /
      queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. /
      flvmux name=muxout streamable=true ! filesink location ='test44.flv';

      Audio + Video record to file ( flv) using pulsesrc

      gst-launch-1.0 -v --gst-debug-level=3 pulsesrc device="alsa_input.platform-asoc-simple-card.0.analog-stereo" volume=5.0 mute=FALSE ! audio/x-raw,format=S16LE,rate=48000,channels=1 ! audioresample ! audioconvert ! voaacenc ! aacparse ! flvmux ! filesink location="voicetest.flv";

      Audio + Video record to file (mp4)

      gst-launch-1.0 -e /
      rpicamsrc bitrate=500000 ! /
      video/x-h264,width=320,height=240,framerate=10/1 !s h264parse ! muxout. /
      pulsesrc volume=4.0 ! /
      queue ! audioconvert ! voaacenc ! muxout. /
      flvmux name=muxout streamable=true ! filesink location = 'test224.mp4';

      Streaming

      stream raw Audio over RTMP to srtmpsink

      gst-launch-1.0 pulsesrc device=hw:1 volume=8.0 ! /
      audio/x-raw,format=S24LE ! audioconvert ! voaacenc bitrate=48000 ! aacparse ! flvmux ! rtmpsink location = “rtmp://192.168.0.3:1935/live/test”;

      stream AACpparse Audio over RTMP to srtmpsink

      gst-launch-1.0 -v --gst-debug-level=3 pulsesrc device="alsa_input.platform-asoc-simple-card.0.analog-stereo" volume=5.0 mute=FALSE ! audio/x-raw,format=S16LE,rate=48000,channels=1 ! audioresample ! audioconvert ! voaacenc ! aacparse ! flvmux ! rtmpsink location="rtmp://www.altani.com:1935/voice/1/test";

      stream Video over RTMP

      gst-launch-1.0 -e rpicamsrc bitrate=500000 ! /
      video/x-h264,width=320,height=240,framerate=6/1 ! h264parse ! /
      flvmux ! rtmpsink location = ‘rtmp://52.66.125.31:1935/live/test live=1’;

      stream Audio + video over RTMP from rpicamsrc , framerate 10

      gst-launch-1.0 rpicamsrc bitrate=500000 ! video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! muxout. pulsesrc volume=8.0 ! queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. flvmux name=muxout streamable=true ! rtmpsink location ='rtmp://www.altanai.com/live/test44';

      stream Audio + video over RTMP from rpicamsrc , framerate 30

      gst-launch-1.0 rpicamsrc bitrate=500000 ! video/x-h264,width=1280,height=720,framerate=30/1 ! h264parse ! muxout. pulsesrc ! queue ! audioconvert ! voaacenc bitrate=65536 ! aacparse ! muxout. flvmux name=muxout ! queue ! rtmpsink location ='rtmp://www.altanai.com/live/test44';

      VOD ( video On Demand )

      Stream h264 file over RTMP

      gst-launch-1.0 -e filesrc location="raw3.h264" ! video/x-h264 ! h264p
      arse ! flvmux ! rtmpsink location = 'rtmp://www.altanai.com/live/test';

      Stream flv file over RTMP

      gst-launch-1.0 -e filesrc location=”testvieo.flv” ! /
      video/x-h264,width=320,height=240,framerate=10/1 ! h264parse ! /
      flvmux ! rtmpsink location = 'rtmp://192.168.0.3:1935/live/test';

      Github Repo for Livestreaming

      https://github.com/altanai/Livestreaming

      Debugging Error while runnign Gstreamer pipeline

      Keep Ubuntu updated

      Keep Gstreamer and its plugin updated

      sudo apt update
      sudo apt install gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly


      References :

      https://gstreamer.freedesktop.org

      crtmpserver + ffmpeg

      This post will show the process of installing , running and using crtmpserver on ubuntu 64 bit machine with gstreamer .

      gcc and cmake

      We shall build gstreamer directly from sources . For this we first need to determine if gcc is installed on the machine .

      If not installed then  run the following command

      GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages( C, C++, Objective-C, Fortran, Java, Ada, Go etc).

      sudo apt-get install build-essential
      

      once it is isnatlled it can be tested with printing the version

      Screenshot from 2016-06-09 11-24-33.png

      cmake is a software compilation tool.It uses compiler independent configuration files, and generate native makefiles and workspaces that can be used in the differemt compiler environment .

      Crtmpserver

      To get the source code from git install git first . Then clone the project from https://github.com/j0sh/crtmpserver

      sudo apt-get git
      git clone https://github.com/j0sh/crtmpserver.git
      cd crtmpserver/builders/cmake
      

      Next we create all makefile’s using cmake .

      cmake .
      

      Output should look as follows

      Screenshot from 2016-06-09 11-47-05

      Run make to do compilation

      make
      

      Screenshot from 2016-06-09 11-57-19

      Run using following command . If should print out a list of ports and their respecting functions

      ./crtmpserver/crtmpserver crtmpserver/crtmpserver.lua
      

      +—————————————————————————–+
      | Services|
      +—+—————+—–+————————-+————————-+
      | c | ip | port| protocol stack name | application name |
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 1112| inboundJsonCli| admin|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 1935| inboundRtmp| appselector|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 8081| inboundRtmps| appselector|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 8080| inboundRtmpt| appselector|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 6666| inboundLiveFlv| flvplayback|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 9999| inboundTcpTs| flvplayback|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 6665| inboundLiveFlv| proxypublish|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 8989| httpEchoProtocol| samplefactory|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 8988| echoProtocol| samplefactory|
      +—+—————+—–+————————-+————————-+
      |tcp| 0.0.0.0| 1111| inboundHttpXmlVariant| vptests|
      +—+—————+—–+————————-+————————-+

      If you the following types of errors while pushing a stream to crtmpserver , they just denote they your pipe is not using the correct format.

      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:55524 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:8080
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/http/basehttpprotocol.cpp:281 Headers section too long
      /home/altanai/crtmpserver/sources/thelib/src/protocols/http/basehttpprotocol.cpp:153 Unable to read response headers: CTCP(16) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; TCP(13) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; [IHTT(14)] &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; IH4R(15)
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [IH4R(15)]
      /home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [IH4R(15)] unregistered from application: appselector
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:44964 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:9999
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/ts/inboundtsprotocol.cpp:211 I give up. I'm unable to detect the ts chunk size
      /home/altanai/crtmpserver/sources/thelib/src/protocols/ts/inboundtsprotocol.cpp:136 Unable to determine chunk size
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [ITS(17)]
      /home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [ITS(17)] unregistered from application: flvplayback
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:37754 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:1935
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/rtmp/inboundrtmpprotocol.cpp:77 Handshake type not implemented: 85
      /home/altanai/crtmpserver/sources/thelib/src/protocols/rtmp/basertmpprotocol.cpp:309 Unable to perform handshake
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [IR(19)]
      /home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [IR(19)] unregistered from application: appselector
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:48368 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:6666
      /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/inboundliveflvprotocol.cpp:51 _waitForMetadata: 1
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/baseliveflvappprotocolhandler.cpp:45 protocol CTCP(16) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; TCP(20) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; [ILFL(21)] registered to app flvplayback
      /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/inboundliveflvprotocol.cpp:102 Frame too large: 6324058
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
      /home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
      /home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [ILFL(21)]
      /home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/baseliveflvappprotocolhandler.cpp:58 protocol [ILFL(21)] unregistered from app flvplayback
      

      ffmpeg

      Download and install ffmpeg from git

       git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg
      cd ffmpeg
      

      Once the source code is obtained we need to configure , make and make install it .
      We need to have following plugins for muxing and ecoding like libx264 for h264parse , so we configure with the following options

      ./configure \
        --prefix=&amp;amp;amp;amp;amp;quot;$HOME/ffmpeg_build&amp;amp;amp;amp;amp;quot; \
        --pkg-config-flags=&amp;amp;amp;amp;amp;quot;--static&amp;amp;amp;amp;amp;quot; \
        --extra-cflags=&amp;amp;amp;amp;amp;quot;-I$HOME/ffmpeg_build/include&amp;amp;amp;amp;amp;quot; \
        --extra-ldflags=&amp;amp;amp;amp;amp;quot;-L$HOME/ffmpeg_build/lib&amp;amp;amp;amp;amp;quot; \
        --bindir=&amp;amp;amp;amp;amp;quot;$HOME/bin&amp;amp;amp;amp;amp;quot; \
        --enable-gpl \
        --enable-libass \
        --enable-libfreetype \
        --enable-libopus \
        --enable-libtheora \
        --enable-libvorbis \
        --enable-libx264 \
        --enable-libx265 \
        --enable-nonfree
      

      the make and make install

      make
      sudo make install
      

      Screenshot from 2016-06-09 16-59-49

      Incase of errors  on ffmpeg configure command , you need to install the respective missing / not found library

      libass

      sudo apt-get install libass-dev
      

      lamemp3

      sudo apt-get install libmp3lame-dev
      

      libaacplus

      sudo apt-get install autoconf
      sudo apt-get install libtool
      
      wget -O libaacplus-2.0.2.tar.gz http://tipok.org.ua/downloads/media/aacplus/libaacplus/libaacplus-2.0.2.tar.gz
      tar -xzf libaacplus-2.0.2.tar.gz
      cd libaacplus-2.0.2
      ./autogen.sh --with-parameter-expansion-string-replace-capable-shell=/bin/bash --host=arm-unknown-linux-gnueabi --enable-static
      
      make
      sudo make install
      

      libvorbis
      compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channe. It is from the same reank as MPEG4 AAC

      wget http://downloads.xiph.org/releases/vorbis/libvorbis-1.3.2.tar.bz2
      tar -zxvf libvorbis-1.3.2.tar.bz2
      cd libvorbis-1.3.2
      ./configure &amp;amp;amp;amp;&amp;amp;amp;amp; make &amp;amp;amp;amp;&amp;amp;amp;amp; make install
      

      libx264
      encoding video streams into the H.264/MPEG-4 AVC compression format, and is released under the terms of the GNU GPL.

      git clone git://git.videolan.org/x264
      cd x264
      ./configure --host=arm-unknown-linux-gnueabi --enable-static --disable-opencl
      make
      sudo make install
      

      libvpx
      libvpx is an emerging open video compression library which is gaining popularity for distributing high definition video content on the internet.

      sudo apt-get install checkinstall
      git clone https://chromium.googlesource.com/webm/libvpx
      cd libvpx
      ./configure
      make
      sudo checkinstall --pkgname=libvpx --pkgversion=&quot;1:$(date +%Y%m%d%H%M)-git&quot; --backup=no     --deldoc=yes --fstrans=no --default
      

      librtmp
      librtmp provides support for the RTMP content streaming protocol developed by Adobe and commonly used to distribute content to flash video players on the web.

      sudo apt-get install libssl-dev
      cd /home/pi/src
      git clone git://git.ffmpeg.org/rtmpdump
      cd rtmpdump
      make SYS=posix
      sudo checkinstall --pkgname=rtmpdump --pkgversion=&quot;2:$(date +%Y%m%d%H%M)-git&quot; --backup=no --deldoc=yes --fstrans=no --default
      

      Reference:
      http://www.videolan.org/developers/x265.html
      https://trac.ffmpeg.org/wiki/CompilationGuide/RaspberryPi
      http://wiki.serviio.org/doku.php?id=howto:linux:install:raspbian
      http://lame.sourceforge.net/

      Additionally “pkg-config –list-all” command list down all the installed libraries.


      RTMP streaming

      1.start the stream from linux machine using ffmpeg

      ffmpeg -f video4linux2 -s 320x240 -i /dev/video0 -f flv -s qvga -b 750000 -ar 11025 -metadata streamName=aaa "tcp://<hidden_ip>:6666/live";
      

      Screenshot from 2016-06-11 17-50-02

      2.view the incoming packets and stats on terminal at crtmpserver

      Screenshot from 2016-06-11 17-53-22

      3.playback the livestream from another machine

      using ffplay
      ffplay -i rtmp://server_ip:1935/live/ccc
      

      Screenshot from 2016-06-09 15-43-58

      RTSP streaming

      1.start the rtsp stream from linux machine using ffmpeg

      here using resolution 320×240 and stream name test

      ffmpeg -f video4linux2 -s 320x240 -i /dev/video0 -an -r 10 -c:v libx264 -q 1 -f rtsp -metadata title=test rtsp://server_ip:5554/flvplayback
      

      crtmp2

      2.view the incoming packets and stats on terminal at crtmpserver

      3.playback the livestream from another machine using

      ffplay

      ffplay rtsp://server_ip:5554/flvplayback/test
      

      Screenshot from 2016-06-09 18-17-07

      VLC

      vlc rtsp://server_ip:5554/flvplayback/test
      

       

       

      GStreamer-1.8.1 rtsp server and client on ubuntu

      GStreamer is a streaming media framework, based on graphs of filters which operate on media data.

      Gstreamer is constructed using a pipes and filter architecture.
      The basic structure of a stream pipeline is that you start with a stream source (camera, screengrab, file etc) and end with a stream sink (screen window, file, network etc). The ! are called pads and they connect the filters.

      Data that flows through pads is described by caps (short for capabilities). Caps can be though of as mime-type (e.g. audio/x-raw, video/x-raw) along with mime-type (e.g. width, height, depth).

      Source Code

      Download the latest archives from https://gstreamer.freedesktop.org/src/

      Source code on git : https://github.com/GStreamer

      Primarily 3 files are required

      1. gstreamer-1.8.1.tar.xz
      2. gst-plugins-base-1.8.1.tar.xz
      3. gst-rtsp-server-1.8.1.tar.xz

      If the destination machine is a ec2 instance one can also scp the tar.xz file there

      To extract the tar.xz files use tar -xf <filename> it will create a folder for each package.

      Prerequisites

      build-essentials

      sudo apt-get install build-essentials

      bison

      flex

      GLib >= 2.40.0

      GLib package contains low-level libraries useful for providing data structure handling for C, portability wrappers and interfaces for such runtime functionality as an event loop, threads, dynamic loading and an object system.

      sudo apt-get install libglib2.0-dev

      gstreamer

      Installing gstreamer 1.8.1 . Gstreamer create a media stream with elements and properties as will be shown on  later sections of this tutorial .

      cd gstreamer-1.8.1
      ./configure
      make
      sudo make install
      

      Screenshot from 2016-05-19 16-51-29.png

      Screenshot from 2016-05-19 16-55-27.png

      Screenshot from 2016-05-19 16-56-05.png

      after installation  export the path

      export LD_LIBRARY_PATH=/usr/local/lib
      

      then verify the installation of the gstreamer by

      gst-inspect-1.0
      

      provides information on installed gstreamer modules ie print out a long list ( about 123 in my case ) plugin that are installed such as coreelements:

      capsfilter: CapsFilter ximagesink: ximagesink: Video sink videorate: videorate: Video rate adjuster typefindfunctions: image/x-quicktime: qif, qtif, qti typefindfunctions: video/quicktime: mov, mp4 typefindfunctions: application/x-3gp: 3gp typefindfunctions: audio/x-m4a: m4a typefindfunctions: video/x-nuv: nuv typefindfunctions: video/x-h265: h265, x265, 265 typefindfunctions: video/x-h264: h264, x264, 264 typefindfunctions: video/x-h263: h263, 263 typefindfunctions: video/mpeg4: m4v typefindfunctions: video/mpeg-elementary: mpv, mpeg, mpg typefindfunctions: application/ogg: ogg, oga, ogv, ogm, ogx, spx, anx, axa, axv typefindfunctions: video/mpegts: ts, mts typefindfunctions: video/mpeg-sys: mpe, mpeg, mpg typefindfunctions: audio/x-gsm: gsm

      gst plugins

      Now build the plugins

      cd gst-plugins-base-1.8.1
      ./configure
      make
      sudo make install
      

       

      gst plugins good

      cd gst-plugins-good-1.8.1.tar
      ./configure
       make
      sudo make install
      

      RTSP Server

      Now make and install the rtsp server

      cd gst-rtsp-server-1.8.1
      ./configure
      

      last few lines from console traces

      Configuration
      Version : 1.8.1
      Source code location : .
      Prefix : /usr/local
      Compiler : gcc -std=gnu99
      CGroups example : no

      make
      

      It will compile the examples .

      sudo make install
      

       

      stream video test src

      ~/mediaServer/gst-rtsp-server-1.8.1/examples]$./test-launch --gst-debug=0 &quot;( videotestsrc ! video/x-raw,format=(yuv),width=352,height=288,framerate=15/1 ! x264enc ! rtph264pay name=pay0 pt=96 )&quot;
      stream ready at rtsp://127.0.0.1:8554/test
      

      Ref:

      Manual for developers : https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-rtsp-server/html/index.html


      Simplest pipeline

      gst-launch-1.0 fakesrc ! fakesink
      
      
      ➜ ~ gst-launch-1.0 fakesrc ! fakesink Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock
      
      To stop press ctrl +c ^
      Chandling interrupt. Interrupt: Stopping pipeline ... Execution ended after 0:00:48.004547887 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... [/sourcecode ] or to display to a audiovideosink gst-launch-1.0 videotestsrc ! autovideosink
      
      Screenshot from 2016-05-20 12-31-18.png To capture webcam
      gst-launch v4l2src ! xvimagesink
      
      Screenshot from 2016-05-20 13-06-56.png

      Wowza REST APIs and HTTP Providers

      This article show the different ways to make calls to Wowza Media Engine from external applications and environments for various purposes  such as getting server status , listeners , connections , applications and its streams etc .

      HTTP Providers

      HTTP Providers are Java classes that are configured on a per-virtual host basis.

      Some pre packaged HTTP providers that return data in XML  :

      1. HTTPConnectionCountsXML

      Returns connection information like Vhost , application , application instance , message in bytes rate , message out byte rates etc.

      http://%5Bwowza-ip-address%5D:8086/connectioncounts

      Screenshot from 2015-11-24 20:23:51

      2. HTTPConnectionInfo
      Returns detailed connection information such as

      http://%5Bwowza-ip-address%5D:8086/connectioninfo

      server=1

      3. HTTPServerVersion

      Returns the Wowza Media Server version and build number. It’s the default HTTP Provider on port 1935.

      url : http://%5Bwowza-ip-address%5D:1935

      Wowza Streaming Engine 4 Monthly Edition 4.1.1 build13180

      4. HTTPLiveStreamRecord

      gets the web interface to record online streams

      url : http://%5Bwowza-ip-address%5D:8086/livestreamrecord

      Screenshot from 2015-11-24 20:22:16

      5. HTTPServerInfoXML

      Returns server and connection information

      url :http://%5Bwowza-ip-address%5D:8086/serverinfo

      Screenshot from 2015-11-24 20:34:08

      6. HTTPClientAccessPolicy .

      It is used for fetching the Microsoft Silverlight clientaccesspolicy.xml from the conf folder.

      7. HTTPCrossdomain

      To get the Adobe Flash crossdomain.xml file from [install-dir]/conf folder.

      8.HTTPProviderMediaList

      Dynamic method for generating adaptive bitrate manifests and playlists from SMIL data.

      9.HTTPStreamManager

      The Stream Manager returns all applications and their stream in web interface.

      url http://%5Bwowza-ip-address%5D:8086/streammanager).

      Screenshot from 2015-11-24 20:38:32

      10 .HTTPTranscoderThumbnail

      Returns a bitmap image from the source stream being transcoded.

      url: http://%5Bwowza-ip-address%5D:8086/transcoderthumbnail?application=%5Bapplication-name%5D&streamname=%5Bstream-name%5D&format=%5Bjpeg or png]&size=[widthxheight]

      Each HTTP provider can be configured with different request filter and authentication method ( none , basic , digest).  We can even create our own substitutes for the HTTP providers as defined in the next section .

      Extending HTTProvider2Base

      The following code snippet describes the process of creating a Wowza Web services that return a json containing all the values .

      Imports to build a HTTP provider

      
      import com.wowza.wms.application.*;
      import com.wowza.wms.vhost.*;
      import com.wowza.wms.http.*;
      import com.wowza.wms.httpstreamer.model.*;
      
      //since we want to return in json format
      
      import org.json.simple.JSONObject;
      
      

      The class declaration is as folllows

      
      public class DCWS extends HTTProvider2Base
      {
      
      ....
      
      }
      
      

      The code to extract application names

      
      public JSONObject listChannels(){
      
      JSONObject obj=new JSONObject();
      
      //get params from virtual host and iterate through it
      List&amp;lt;String&amp;gt; vhostNames = VHostSingleton.getVHostNames();
      Iterator&amp;lt;String&amp;gt; iter = vhostNames.iterator();
      while (iter.hasNext())
      {
      String vhostName = iter.next();
      IVHost vhost = (IVHost)VHostSingleton.getInstance(vhostName);
      List&amp;lt;String&amp;gt; appNames = vhost.getApplicationNames();
      Iterator&amp;lt;String&amp;gt; appNameIterator = appNames.iterator();
      
      int i=0;
      while (appNameIterator.hasNext())
      {
      String applicationName = appNameIterator.next();
      
      try {
      String key = &quot;channel&quot;+ (++i);
      obj.put(key, URLEncoder.encode(applicationName, &quot;UTF-8&quot;));
      }
      
      catch (UnsupportedEncodingException e) {
      e.printStackTrace();
      }
      }
      }
      return obj;
      }
      
      

      The code which responds to HTTP request

      TBD..

      Ref :

      Wowza RTMP Authentication with Third party Token provider over Tiny Encryption Algorithm (TEA)

      this article is focused on  Wowza RTMP Authentication with  Third party Token provider over Tiny Encryption Algorithm (TEA)  and  is a continuation of the previous post about setting up a basic RTMP Authentication module on Wowza Engine above version 4.

      The task is divided into 3 parts .

      1. RTMP Encoder Application
      2. Wowza RTMP Auth module
      3. Third party Authentication Server

      The component diagram is as follows :

      Copy of Publisher App iOS

      The detailed explanation of the components are :

      1.Wowza RTMP Auth module

      The Wowza Server receives a rtmp stream url in the format as :

      rtmp://username:pass@wowzaip:1935/Application/stteamname

      It considers the username and pass to be user credentials . RTMP auth Module invokes the getPassword() function inside of deployed application class  passing the username as parameter.  The username is then  encrypted using TEA ( Tiny Encryption algorithm)

      TEA is a block cipher  which is based on symmetric ( private) key encryption . Input is a 64 bit of plain or cipher text with a 128 bit key resulting in output of cipher or plain text respectively.

      The code for encryption  is

      
      TEA.encrypt( username, sharedSecret );
      
      

      The code to make a connection to third party auth server is

      
       url = new URL(serverTokenValidatorURL);
       
       URLConnection connection;
       connection = url.openConnection();
       connection.setDoOutput(true);
      
      OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
       out.write("clientid=" + TEA.encrypt( username, sharedSecret ););
       out.close(); 
      

      The sharedsecret is the common key which is with both the Auth server and wowza server . It must be atleast a 16 digit alphanumeric / special character based key . An example of shared secret is abcdefghijklmnop .The value can be stored as property in Application.xml file.

      <Property>
      <Name>secureTokenSharedSecret</Name>
      <Value><![CDATA[abcdefghijklmnop]]></Value>
      </Property>

      <Property>
      <Name>serverTokenValidatorURL</Name>
      <Value>http://127.0.0.1:8080/TokenProvider/authentication/token</Value&gt;
      </Property>

      The values of serverTokenValidatorURL is the third party auth server listening for REST POST request .

      The code for receiving the incoming  resulting json data is

      
      	ObjectMapper mapper = new ObjectMapper();
      	JsonNode node = mapper.readTree(connection.getInputStream()); 
      	node = node.get("publisherToken") ;
      	String token = node.asText();
              String token2 =TEA.decrypt(token, sharedSecret);
      
      

      2.Third party Authentication Server

      The 3rd party Auth server stores the passwords for users or performs oauth based authentication . It uses a shared secret key to decrypt the token based on TEA as explained in above section .

      The code to decrypt the incoming clientId

      
      TEA.decrypt(id, sharedSecret);
      
      

      Add own custom logic to check files , databases etc for obtaining the password corresponding to the username as decrypted above.

      The code to encrypt the password for the user if exists or send invalid response if non exists is

      
              try {
      
                  String clientID = TEA.decrypt(id, sharedSecret);
                  
                  String token= findUserPassword(clientID);
                  
                   token = TEA.encrypt(token, sharedSecret); 
                              
                  return "{\"publisherToken\":\""  + token+ "\"}";
                  
              }catch (Exception ex) {
      
                  return "{\"error\":\"Invalid Client\"}";
              }
      
      

      The final callflow thus becomes :

      Copy of Publisher App iOS (1)

      Screenshots :

      Screenshot_2015-09-16-20-22-37Screenshot_2015-09-17-18-36-23Screenshot_2015-09-16-20-22-42Screenshot_2015-09-16-20-23-30

      Wowza Secure URL params Authentication for streams in an application

      To secure the publishers for a common application through username -password specific for stream names , this post is useful . It  uses Module Core Security to prompt back the user for supplying credentials.

      The detailed code to check the rtmp query-string for parameters  and performs the checks –  is user is allowed to connect and is user allowed to stream on given stream name is given below .

      Initialize the hashmap containing publisher clients and IapplicationInstance

      HashMap <Integer, String> publisherClients =null;
      IApplicationInstance appInstance = null;
      

      On app start initilaize the IapplicationInstance object .

      public void onAppStart(IApplicationInstance appInstance)
      {
          this.appInstance = appInstance;
      }
      

      Onconnect is called called when any publisher tries to connects with media server. At this event collect the username and clientId from the client.
      Check if publisherclient contains the userName which client has provided else reject the connection .

      public void onConnect(IClient client, RequestFunction function, AMFDataList params)
      {
      
      AMFDataObj obj = params.getObject(2);
      AMFData data = obj.get("app");
      
      if(data.toString().contains("?")){
      
         String[] paramlist = data.toString().split();
         String[] userParam = paramlist[1].split("=");
         String userName = userParam[1];
      
         if(this.publisherClients==null){
             this.publisherClients = new HashMap<Integer, String>();
      }
      
      if(this.publisherClients.get(client.getClientId())==null){
          this.publisherClients.put(client.getClientId(),userName);
      } else {
          client.rejectConnection();
      }
      }
      }
      

      AMFDataItem: class for marshalling data between Wowza Pro server and Flash client.

      As the event user starts to publish a stream after sucessful connection Onpublishing function is called . It extracts the stream name from the client ( function extractStreamName() )and checks if user is allowed to stream on the given streamname (function isStreamNotAllowed()) .

      public void publish(IClient client, RequestFunction function, AMFDataList params)
      {
      String streamName = extractStreamName(client, function, params);
      if (isStreamNotAllowed(client, streamName))
      {
      sendClientOnStatusError(client, NetStream.Publish.Denied, "Stream name not allowed for the logged in user: "+streamName);
      client.rejectConnection();
      }
      else{
      invokePrevious(client, function, params);
      }
      
      }
      

      Function when publisher disconnects from server . It removes the client from publisherClients.

      public void onDisconnect(IClient client)
      {
      if(this.publisherClients!=null){
      this.publisherClients.remove(client.getClientId());
      }
      }
      

      The function to extract a streamname is

      
      public String extractStreamName(IClient client, RequestFunction function, AMFDataList params)
      {
      String streamName = params.getString(PARAM1);
      if (streamName != null)
      {
      String streamExt = MediaStream.BASE_STREAM_EXT;
      
      String[] streamDecode = ModuleUtils.decodeStreamExtension(streamName, streamExt);
      streamName = streamDecode[0];
      streamExt = streamDecode[1];
      }
      
      return streamName;
      }
      

      The fucntion to check if streamname is allowed for the given user

      
      public boolean isStreamNotAllowed(IClient client, String streamName)
      {
      WMSProperties localWMSProperties = client.getAppInstance().getProperties();
      String allowedStreamName = localWMSProperties.getPropertyStr(this.publisherClients.get(client.getClientId()));
      String sName="";
      if(streamName.contains("?"))
      sName = streamName.substring(0, streamName.lastIndexOf(&amp;amp;quot;?&amp;amp;quot;));
      else
      sName = streamName;
      return !sName.toLowerCase().equals(allowedStreamName.toLowerCase().toString()) ;
      }
      

      On adding the application to wowza server make sure that the ModuleCoreSecurity is present under Modules in Application.xml

      <Module>
      <Name>ModuleCoreSecurity</Name>
      <Description>Core Security Module for Applications</Description>
      <Class>com.wowza.wms.security.ModuleCoreSecurity</Class>
      </Module>
      
      
      
      
      
      

      Also ensure that property securityPublishRequirePassword is present under properties

      <Property>
      <Name>securityPublishRequirePassword</Name>
      <Value>true</Value>
      <Type>Boolean</Type>
      </Property>
      

      Add the user credentials as properties too. For example to give access to testuser with password 123456 to stream on myStream include the following ,

      <Property>
      <Name>testUser</Name>
      <Value>myStream</Value>
      <Type>String</Type>
      </Property>
      

      Also include the mapping of user and password inside of conf/publish.password file

      # Publish password file (format [username][space][password])
      # username password

      testuser 123456