Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1

MPEG 2

MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
generic coding of moving pictures and associated audio information
combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

better than MPEG 1

evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo) , No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.

Application

  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard:

H264

Advanced Video Coding (AVC), or H.264 or aka MPEG-4 AVC or ITU-T H.264 / MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC)
introduced in 2004

Better than MPEG2

40-50% bit rate reduction compared to MPEG-2

Support Up to 4K (4,096×2,304) and 59.94 fps
21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )

Applications

  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat

H265

High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC
video compression standard designed to substantially improve coding efficiency
stream high-quality videos in congested network environments or bandwidth constrained mobile networks
Jan 2013
product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

better than H264

overcome shortage of bandwidth, spectrum, storage
bandwidth savings of approx. 45% over H.264 encoded content

resolutions up to 8192×4320, including 8K UHD
Supports up to 300 fps
3 approved profiles, draft for additional 5 ; 13 levels
Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

Enhanced Hybrid spatial-temporal prediction model
CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures

Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction)
Individual rectangular regions that divide the image are independent

Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.

Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
Entropy coding is only CABAC

Applications

  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4                                              ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   libavutil      56. 22.100 / 56. 22.100   libavcodec     58. 35.100 / 58. 35.100   libavformat    58. 20.100 / 58. 20.100   libavdevice    58.  5.100 / 58.  5.100   libavfilter     7. 40.101 /  7. 40.101   libavresample   4.  0.  0 /  4.  0.  0   libswscale      5.  3.100 /  5.  3.100   libswresample   3.  3.100 /  3.  3.100   libpostproc    55.  3.100 / 55.  3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     creation_time   : 2019-06-23T04:58:13.000000Z   Duration: 00:00:29.84, start: 0.000000, bitrate: 1047 kb/s     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480, 1046 kb/s, 25 fps, 25 tbr, 25k tbn, 50k tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1 Codec AVOption b (set bitrate (in bits/s)) specified for output file #0 (output.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping:   Stream #0:0 -> #0:0 (h264 (native) -> hevc (libx265)) Press [q] to stop, [?] for help x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9 x265 [info]: build info [Mac OS X][clang 10.0.1][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main profile, Level-3 (Main tier) x265 [info]: Thread pool created using 4 threads x265 [info]: Slices                              : 1 x265 [info]: frame threads / pool features       : 2 / wpp(8 rows) x265 [warning]: Source height < 720p; disabling lookahead-slices x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3 x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2 x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0 x265 [info]: References / ref-limit  cu / depth  : 3 / off / on x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60 x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra x265 [info]: tools: strong-intra-smoothing deblock sao Output #0, mp4, to 'output.mp4':   Metadata:     major_brand     : isom     minor_version   : 1     compatible_brands: isomavc1     encoder         : Lavf58.20.100     Stream #0:0(und): Video: hevc (libx265) (hev1 / 0x31766568), yuv420p, 640x480, q=2-31, 25 fps, 12800 tbn, 25 tbc (default)     Metadata:       creation_time   : 2019-06-23T04:58:13.000000Z       handler_name    : h264@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-3+deb9u1       encoder         : Lavc58.35.100 libx265 frame=  746 fps= 64 q=-0.0 Lsize=     606kB time=00:00:29.72 bitrate= 167.2kbits/s speed=2.56x     video:594kB audio:0kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 2.018159% x265 [info]: frame I:      3, Avg QP:27.18  kb/s: 1884.53  x265 [info]: frame P:    179, Avg QP:27.32  kb/s: 523.32   x265 [info]: frame B:    564, Avg QP:35.17  kb/s: 38.69    x265 [info]: Weighted P-Frames: Y:5.6% UV:5.0% x265 [info]: consecutive B-frames: 1.6% 3.8% 9.3% 53.3% 31.9%  encoded 746 frames in 11.60s (64.31 fps), 162.40 kb/s, Avg QP:33.25

if you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support

AV1

Realtime High quality video encoder
product of product of the Alliance for Open Media (AOM)
Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC)

better than H265

AV1 is royalty free and overcomes the patent complexities around H265/HVEC

Applications

  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

RealTime Transport protocol (RTP) & RTP control protocol (RTCP )

In a VOIP system, where SIP is a signaling protocol , a SIP proxy never participates in the media flow, thus it is media agnostic.

SDP packets describing a session with codecs , open ports , media formats etc are embedded in a SIP request such as invite .
Post a SDP Offer/Answer flow , RTP and RTCP esnsure that mediastream flow between the endpoints .

RTP is the provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services.

RTCP is the control protocl which provides monitoring of the data delivery, qos in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.

RTP (Real-time Transport Protocol)

A protocol framework
supports use of RTP-level translators and mixers.
independent of the underlying transport and network layers.
does not address resource reservation
does not guarantee quality-of-service for real-time services.
services like payload type identification, sequence numbering, timestamping and delivery monitoring.

RTP Packet via Wireshark
RTP Packet

The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence,

Usage :
Multimedia Multi particpant conferences
Storage of continuous data
Interactive distributed simulation
active badge, control and measurement applications

UDP provides best-effort delivery of datagrams for point-to-point as well as for multicast communications.

SRTP (Secure Real-time Transport Protocol)

Provides confidentiality, message authentication, and replay protection for both unicast and multicast RTP and RTCP streams.
Security layer which resides between the RTP/RTCP application layer and the transport layer

SRTP Packet

Cryptographic context includes includes

  • session key used directly in encryption/message authentication
  • master key securely exchanged random bit string used to derive session keys
  • other working session parameters ( master key lifetime, master key identifier and length, FEC parameters, etc)
    it must be maintained by both the sender and receiver of these streams.

Salting keys” are used to protect against pre-computation and time-memory tradeoff attacks.

To learn more about SRTP specifically visit : https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/

RTP Session

In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

Real-Time Transport Protocol
    [Stream setup by SDP (frame 554)]
        [Setup frame: 554]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 .... = Extension: False
    .... 0000 = Contributing source identifiers count: 0
    0... .... = Marker: False
    Payload type: ITU-T G.711 PCMU (0)
    Sequence number: 39644
    [Extended sequence number: 39644]
    Timestamp: 2256601824
    Synchronization Source identifier: 0x78006c62 (2013293666)
    Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

Synchronization source (SSRC)

32-bit numeric SSRC identifier for source of a stream of RTP packets.
All packets from a synchronisation source form part of the same timing and sequence number space, so a receiver groups packets by synchronisation source for playback.

Binding of the SSRC identifiers is provided through RTCP.
If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

Contributing source (CSRC)

A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer.
The mixer inserts a list of the SSRC identifiers of the sources , called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet.

An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

RTSP (Real-Time Streaming Protocol)

network control protocol
TCP to maintain an end-to-end connection
control real-time streaming media applications such as live audio and HD video streaming
establishes a media session between RTSP end-points ( can be RTSP media servers too) and initiates RTP streams to deliver the audio and video payload from the RTSP media servers to the clients.

RTCP (Real-Time Transport Control Protocol )

Real-time Transport Control Protocol (RTCP) defined in RFC 3550 and is used to send control packets and feedback on QoS to participants in a call along with RTP which sends actual media packets

  • Periodic transmission of control packet
  • Monitor data deliver on large multicast networks
  • Underlying protocol must provide multiplexing of the data and control packets
  • Provide feedback on the quality of the data distribution , congestion control, fault diagnosis , control of adaptive encoding
  • Observer for number of participants to rate of sending packets for scaling up
  • convey minimal session control information

RTCP often uses the next consecutive port as RTP
Example screenshot below 20720 for RTP

And 20721 fo RTCP

Types of RTCP packet

  1. SR: Sender report, for transmission and reception statistics from
    participants that are active senders
  2. RR: Receiver report, for reception statistics from participants
    that are not active senders and in combination with SR for
    active senders reporting on more than 31 sources
  3. SDES: Source description items, including CNAME
  4. BYE: Indicates end of participation
  5. APP: Application-specific functions

The SR: Sender Report RTCP Packet

Sender Report RTCP PAcket

Expanced Sender Report RTCP Packet

SR Report in RTCP

Explanation for some attributes

  • fraction lost: 8 bits size , fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent
  • cumulative number of packets lost: 24 bits size , total number of RTP data packets from source SSRC_n that have been lost since the beginning of reception.
  • interarrival jitter: 32 bits , estimate of the statistical variance of the RTP data packet interarrival time, measured in timestamp unit

RR: Receiver Report RTCP Packet

SDES: Source Description RTCP Packet

SDES items can contain

CNAME: Canonical End-Point Identifier SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CNAME=1 | length | user and domain name …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

NAME: User Name SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NAME=2 | length | common name of source …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

EMAIL: Electronic Mail Address SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EMAIL=3 | length | email address of source …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

PHONE: Phone Number SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PHONE=4 | length | phone number of source …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

LOC: Geographic User Location SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| LOC=5 | length | geographic location of site …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TOOL: Application or Tool Name SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TOOL=6 | length |name/version of source appl. …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

NOTE: Notice/Status SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NOTE=7 | length | note about the source …
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

PRIV: Private Extensions SDES Item

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PRIV=8 | length | prefix length |prefix string... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... | value string ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

BYE: Goodbye RTCP Packet

APP: Application-Defined RTCP Packet

Intended for experimental use

Instance of RTCP sender and receiver reports on transmission and reception statistics

Real-time Transport Control Protocol (Receiver Report)
    [Stream setup by SDP (frame 4)]
        [Setup frame: 4]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 0001 = Reception report count: 1
    Packet type: Receiver Report (201)
    Length: 7 (32 bytes)
    Sender SSRC: 0x796dd0d6 (2037240022)
    Source 1
        Identifier: 0x00000000 (0)
        SSRC contents
            Fraction lost: 0 / 256
            Cumulative number of packets lost: 1
        Extended highest sequence number received: 6534
            Sequence number cycles count: 0
            Highest sequence number received: 6534
        Interarrival jitter: 0
        Last SR timestamp: 0 (0x00000000)
        Delay since last SR timestamp: 0 (0 milliseconds)
Real-time Transport Control Protocol (Source description)
    [Stream setup by SDP (frame 4)]
        [Setup frame: 4]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 0001 = Source count: 1
    Packet type: Source description (202)
    Length: 6 (28 bytes)
    Chunk 1, SSRC/CSRC 0x796DD0D6
        Identifier: 0x796dd0d6 (2037240022)
        SDES items
            Type: CNAME (user and domain) (1)
            Length: 8
            Text: 796dd0d6
            Type: NOTE (note about source) (7)
            Length: 5
            Text: telecomorg
            Type: END (0)

Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)

RTP provides continuous feedback about the overall reception quality from all receivers — thereby allowing the sender(s) in the mid-term to adapt their coding scheme and transmission behaviour to the observed network quality of service (QoS). And also perform

RTP makes no provision for timely feedback that would allow a sender to repair the media stream immediately: through retransmissions, retroactive Forward Error Correction (FEC) control, or media-specific mechanisms for some video codecs, such as reference picture selection.

Components of RTCP based feedback

  • Status reports contained in sender report (SR)/received report (RR) packet transmitted at regular intervals . Can also contain SDES
  • FB ( Feedback ) messages . Indicate loss or reception of particular pieces of a media stream

Types of RTCP Feedback packet

Minimal compound RTCP feedback packet

minimize the size of the RTCP packet transmitted to convey feedback
maximize the frequency at which feedback can be provided
MUST contain only the mandatory information :

  • encryption prefix if necessary,
  • exactly one RR or SR,
  • exactly one SDES with only the CNAME item present, and
  • FB message(s)

Full compound RTCP feedback packet

MAY contain any additional number of RTCP packet

RTCP operation modes

  1. Immediate Feedback mode
  2. Early RTCP mode
  3. Regular RTCP Mode

The Application specific feedback threshold is a function of a number of parameters including (but not necessarily limited to):

  • type of feedback used (e.g., ACK vs. NACK),
  • bandwidth,
  • packet rate,
  • packet loss
  • probability and distribution,
  • media type,
  • codec, and
  • (worst case or observed) frequency of events to report (e.g., frame received, packet lost).

To read on SRTP session with RTP/SAVP and crypto attributes , read https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/

Conference streaming

Simulcast

client encodes the same audio/video stream twice in different resolutions and bitrates and sending these to a router who then decides who receives which of the streams.

Multicast Audio Conference

Assume obtaining a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets.
The audio conferencing application used by each conference participant sends audio data in small chunks of ms duration.
Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.

The RTP header indicates what type of audio encoding (such as PCM, ADPCM or LPC) is contained in each packet so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.

Every packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. Thus RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source.
The sequence number can also be used by the receiver to estimate how many packets are being lost.

For QoS, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP(control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits.

A site sends the RTCP BYE packet when it leaves the conference.

Audio and Video Conference

Audio and video media are transmitted as separate RTP sessions, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name in the RTCP packets for both so that the sessions can be associated.

Synchronized playback of a source’s audio and video is achieved using timing information carried in the RTCP packets

Layered Encodings

In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
Rate-adaptation should be done by a layered encoding with a layered transmission system.

In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

Mixers , Translators and Monitors

Mixer

An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

Example of Mixer for hi-speed to low-speed packet stream conversion . In conference cases where few participants are connected through a low-speed link where other have hi-speed link, instead of forcing lower-bandwidth, reduced-quality audio encoding for all, an RTP-level relay called a mixer may be placed near the low-bandwidth area.
This mixer resynchronises incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed links.

All data packets originating from a mixer will be identified as having the mixer as their synchronization source.
The RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.

Translator

An intermediate system that forwards RTP packets with their synchronization source identifier intact.

Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls.

Tranasltor for Firewall Limiting IP packet pass

Some of the intended participants in the audio conference may be connected with high bandwidth links but might not be directly reachable via IP multicast, for reasons such as being behind an application-level firewall that will not let any IP packets pass. For these sites, mixing may not be necessary, in which case another type of RTP-level relay called a translator may be used.

Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through asecure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site’s internal network.

Other cases :

video mixers can scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

Translator usage when connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.

Monitor

An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics.

Layered Encodings

In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
Rate-adaptation should be done by a layered encoding with a layered transmission system.

In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

RTP Session

In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

Real-Time Transport Protocol
    [Stream setup by SDP (frame 554)]
        [Setup frame: 554]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 .... = Extension: False
    .... 0000 = Contributing source identifiers count: 0
    0... .... = Marker: False
    Payload type: ITU-T G.711 PCMU (0)
    Sequence number: 39644
    [Extended sequence number: 39644]
    Timestamp: 2256601824
    Synchronization Source identifier: 0x78006c62 (2013293666)
    Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

Synchronization source (SSRC)

32-bit numeric SSRC identifier for source of a stream of RTP packets.
All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.

the binding of the SSRC identifiers is provided through RTCP.
If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

Contributing source (CSRC)

A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer.
The mixer inserts a list of the SSRC identifiers of the sources , called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

Multiplexing RTP Sessions

In RTP, multiplexing is provided by the destination transport address (network address and port number) which is different for each RTP session ( seprate for audio and video ). This helps in cases where there is chaneg in encodings , change of clockrates , detection of packet loss suffered and RTCP reporting .
Moreover RTP mixer would not be able to combine interleaved streams of incompatible media into one stream.

Interleaving packets with different RTP media types but using the same SSRC would introduce several problems.
But multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions.

REMB ( Receiver Estimated Maximum Bitrate)

RTCP message used to provide bandwidth estimation in order to avoid creating congestion in the network.
support for this message is negotiated in the Offer/Answer SDP Exchange.

contains total estimated available bitrate on the path to the receiving side of this RTP session (in mantissa + exponent format).
used by sender to configure the maximum bitrate of the video encoding.

also notify the available bandwidth in the network and by media servers to limit the amount of bitrate the sender is allowed to send.

In Chrome it is deprecated in favor of the new sender side bandwidth estimation based on RTCP Transport Feedback messages.

Session Description Protocol (SDP) Capability Negotiation

negotiate use of one out of several possible transport protocols. The offerer uses the expected least-common-denominator (plain RTP) as the actual configuration, and the alternative transport protocols as the potential configurations.

m=audio 53456 RTP/AVP 0 18
a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF

plain RTP (RTP/AVP)
Secure RTP (RTP/SAVP)
RTP with RTCP-based feedback (RTP/AVPF)
Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)

Adaptive bitrate control

Adapt the audio and video codec bitrates to the available bandwidth, and hence optimize audio & video quality
For video, since resolution is chosen at the start only , encoder use bitrate and frame-rate attributes only during runtime to adapt

RTCP packet called TMMBR (Temporary Maximum Media Stream Bit Rate Request) is sent to the remote client

References:

SIP conferencing and Media Bridges

SIP is the most popular signalling protocol in VOIP ecosystem. It is most suited to a caller-callee scenario , yet however supporting scalable conferences on VOIP is a market demand. It is desired that SIP must for multimedia stream but also provide conference control for building communication and collaboration apps for new and customisable solutions.

Role of SIP in conference involves

  • initiating confs
  • inviting participants
  • enabling them to join conf
  • leave conf
  • terminate conf
  • expel participants
  • configure media flow
  • control activities in conf

Centralised vs Mesh signalling for Multi participant conf

In a Centralised signalling model , all communication flows via a centralised control point

In a decentralised or mesh signalling structure , participants can communicate p2p

Unicast vs Multicast Media Distribution

Decentralised Media , Multi unicast streaming

Decentralised media , Multicast

Centralised Media / MCU

Conference types

1. Bridge

Centralised entity to book conf , start conf , leave conf . Therefore single point of failure potentially .

To create conf : conf created on a bridge URL , bridge registers on SIP Server, participants join the conf on the bridge using INVITES

To stop conf : either participant can Leave with BYE or conf can terminate by sending BYE to all

2. Endpoints as Mixer

Endpoints handle stream , decentralised media , therefore adhoc suited

mixer UAs cannot leave untill conf finishes

3. Mesh

complex and more processing power on each UA required

no single point of failure but endpoints have to handle NATIng

crtmpserver + ffmpeg

This post will show the process of installing , running and using crtmpserver on ubuntu 64 bit machine with gstreamer .

gcc and cmake

We shall build gstreamer directly from sources . For this we first need to determine if gcc is installed on the machine .

If not installed then  run the following command

GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages( C, C++, Objective-C, Fortran, Java, Ada, Go etc).

sudo apt-get install build-essential

once it is isnatlled it can be tested with printing the version

Screenshot from 2016-06-09 11-24-33.png

cmake is a software compilation tool.It uses compiler independent configuration files, and generate native makefiles and workspaces that can be used in the differemt compiler environment .

Crtmpserver

To get the source code from git install git first . Then clone the project from https://github.com/j0sh/crtmpserver

sudo apt-get git
git clone https://github.com/j0sh/crtmpserver.git
cd crtmpserver/builders/cmake

Next we create all makefile’s using cmake .

cmake .

Output should look as follows

Screenshot from 2016-06-09 11-47-05

Run make to do compilation

make

Screenshot from 2016-06-09 11-57-19

Run using following command . If should print out a list of ports and their respecting functions

./crtmpserver/crtmpserver crtmpserver/crtmpserver.lua

+—————————————————————————–+
| Services|
+—+—————+—–+————————-+————————-+
| c | ip | port| protocol stack name | application name |
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 1112| inboundJsonCli| admin|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 1935| inboundRtmp| appselector|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 8081| inboundRtmps| appselector|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 8080| inboundRtmpt| appselector|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 6666| inboundLiveFlv| flvplayback|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 9999| inboundTcpTs| flvplayback|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 6665| inboundLiveFlv| proxypublish|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 8989| httpEchoProtocol| samplefactory|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 8988| echoProtocol| samplefactory|
+—+—————+—–+————————-+————————-+
|tcp| 0.0.0.0| 1111| inboundHttpXmlVariant| vptests|
+—+—————+—–+————————-+————————-+

If you the following types of errors while pushing a stream to crtmpserver , they just denote they your pipe is not using the correct format.

/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:55524 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:8080
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/http/basehttpprotocol.cpp:281 Headers section too long
/home/altanai/crtmpserver/sources/thelib/src/protocols/http/basehttpprotocol.cpp:153 Unable to read response headers: CTCP(16) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; TCP(13) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; [IHTT(14)] &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; IH4R(15)
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [IH4R(15)]
/home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [IH4R(15)] unregistered from application: appselector
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:44964 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:9999
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/ts/inboundtsprotocol.cpp:211 I give up. I'm unable to detect the ts chunk size
/home/altanai/crtmpserver/sources/thelib/src/protocols/ts/inboundtsprotocol.cpp:136 Unable to determine chunk size
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [ITS(17)]
/home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [ITS(17)] unregistered from application: flvplayback
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:37754 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:1935
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/rtmp/inboundrtmpprotocol.cpp:77 Handshake type not implemented: 85
/home/altanai/crtmpserver/sources/thelib/src/protocols/rtmp/basertmpprotocol.cpp:309 Unable to perform handshake
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [IR(19)]
/home/altanai/crtmpserver/sources/thelib/src/application/baseclientapplication.cpp:240 Protocol [IR(19)] unregistered from application: appselector
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpacceptor.cpp:154 Client connected: 127.0.0.1:48368 -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0.0.0.0:6666
/home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/inboundliveflvprotocol.cpp:51 _waitForMetadata: 1
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:119 Handlers count changed: 11-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;12 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/baseliveflvappprotocolhandler.cpp:45 protocol CTCP(16) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; TCP(20) &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; [ILFL(21)] registered to app flvplayback
/home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/inboundliveflvprotocol.cpp:102 Frame too large: 6324058
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/tcpcarrier.cpp:89 Unable to signal data available
/home/altanai/crtmpserver/sources/thelib/src/netio/epoll/iohandlermanager.cpp:129 Handlers count changed: 12-&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;11 IOHT_TCP_CARRIER
/home/altanai/crtmpserver/sources/thelib/src/protocols/protocolmanager.cpp:45 Enqueue for delete for protocol [ILFL(21)]
/home/altanai/crtmpserver/sources/thelib/src/protocols/liveflv/baseliveflvappprotocolhandler.cpp:58 protocol [ILFL(21)] unregistered from app flvplayback

ffmpeg

Download and install ffmpeg from git

 git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg
cd ffmpeg

Once the source code is obtained we need to configure , make and make install it .
We need to have following plugins for muxing and ecoding like libx264 for h264parse , so we configure with the following options

./configure \
  --prefix=&amp;amp;amp;amp;amp;quot;$HOME/ffmpeg_build&amp;amp;amp;amp;amp;quot; \
  --pkg-config-flags=&amp;amp;amp;amp;amp;quot;--static&amp;amp;amp;amp;amp;quot; \
  --extra-cflags=&amp;amp;amp;amp;amp;quot;-I$HOME/ffmpeg_build/include&amp;amp;amp;amp;amp;quot; \
  --extra-ldflags=&amp;amp;amp;amp;amp;quot;-L$HOME/ffmpeg_build/lib&amp;amp;amp;amp;amp;quot; \
  --bindir=&amp;amp;amp;amp;amp;quot;$HOME/bin&amp;amp;amp;amp;amp;quot; \
  --enable-gpl \
  --enable-libass \
  --enable-libfreetype \
  --enable-libopus \
  --enable-libtheora \
  --enable-libvorbis \
  --enable-libx264 \
  --enable-libx265 \
  --enable-nonfree

the make and make install

make
sudo make install

Screenshot from 2016-06-09 16-59-49

Incase of errors  on ffmpeg configure command , you need to install the respective missing / not found library

libass

sudo apt-get install libass-dev

lamemp3

sudo apt-get install libmp3lame-dev

libaacplus

sudo apt-get install autoconf
sudo apt-get install libtool

wget -O libaacplus-2.0.2.tar.gz http://tipok.org.ua/downloads/media/aacplus/libaacplus/libaacplus-2.0.2.tar.gz
tar -xzf libaacplus-2.0.2.tar.gz
cd libaacplus-2.0.2
./autogen.sh --with-parameter-expansion-string-replace-capable-shell=/bin/bash --host=arm-unknown-linux-gnueabi --enable-static

make
sudo make install

libvorbis
compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channe. It is from the same reank as MPEG4 AAC

wget http://downloads.xiph.org/releases/vorbis/libvorbis-1.3.2.tar.bz2
tar -zxvf libvorbis-1.3.2.tar.bz2
cd libvorbis-1.3.2
./configure &amp;amp;amp;amp;&amp;amp;amp;amp; make &amp;amp;amp;amp;&amp;amp;amp;amp; make install

libx264
encoding video streams into the H.264/MPEG-4 AVC compression format, and is released under the terms of the GNU GPL.

git clone git://git.videolan.org/x264
cd x264
./configure --host=arm-unknown-linux-gnueabi --enable-static --disable-opencl
make
sudo make install

libvpx
libvpx is an emerging open video compression library which is gaining popularity for distributing high definition video content on the internet.

sudo apt-get install checkinstall
git clone https://chromium.googlesource.com/webm/libvpx
cd libvpx
./configure
make
sudo checkinstall --pkgname=libvpx --pkgversion=&quot;1:$(date +%Y%m%d%H%M)-git&quot; --backup=no     --deldoc=yes --fstrans=no --default

librtmp
librtmp provides support for the RTMP content streaming protocol developed by Adobe and commonly used to distribute content to flash video players on the web.

sudo apt-get install libssl-dev
cd /home/pi/src
git clone git://git.ffmpeg.org/rtmpdump
cd rtmpdump
make SYS=posix
sudo checkinstall --pkgname=rtmpdump --pkgversion=&quot;2:$(date +%Y%m%d%H%M)-git&quot; --backup=no --deldoc=yes --fstrans=no --default

Reference:
http://www.videolan.org/developers/x265.html
https://trac.ffmpeg.org/wiki/CompilationGuide/RaspberryPi
http://wiki.serviio.org/doku.php?id=howto:linux:install:raspbian
http://lame.sourceforge.net/

Additionally “pkg-config –list-all” command list down all the installed libraries.


RTMP streaming

1.start the stream from linux machine using ffmpeg

ffmpeg -f video4linux2 -s 320x240 -i /dev/video0 -f flv -s qvga -b 750000 -ar 11025 -metadata streamName=aaa "tcp://<hidden_ip>:6666/live";

Screenshot from 2016-06-11 17-50-02

2.view the incoming packets and stats on terminal at crtmpserver

Screenshot from 2016-06-11 17-53-22

3.playback the livestream from another machine

using ffplay
ffplay -i rtmp://server_ip:1935/live/ccc

Screenshot from 2016-06-09 15-43-58

RTSP streaming

1.start the rtsp stream from linux machine using ffmpeg

here using resolution 320×240 and stream name test

ffmpeg -f video4linux2 -s 320x240 -i /dev/video0 -an -r 10 -c:v libx264 -q 1 -f rtsp -metadata title=test rtsp://server_ip:5554/flvplayback

crtmp2

2.view the incoming packets and stats on terminal at crtmpserver

3.playback the livestream from another machine using

ffplay

ffplay rtsp://server_ip:5554/flvplayback/test

Screenshot from 2016-06-09 18-17-07

VLC

vlc rtsp://server_ip:5554/flvplayback/test

 

 

GStreamer-1.8.1 rtsp server and client on ubuntu

GStreamer is a streaming media framework, based on graphs of filters which operate on media data.

Gstreamer is constructed using a pipes and filter architecture.
The basic structure of a stream pipeline is that you start with a stream source (camera, screengrab, file etc) and end with a stream sink (screen window, file, network etc). The ! are called pads and they connect the filters.

Data that flows through pads is described by caps (short for capabilities). Caps can be though of as mime-type (e.g. audio/x-raw, video/x-raw) along with mime-type (e.g. width, height, depth).

Source Code

Download the latest archives from https://gstreamer.freedesktop.org/src/

Source code on git : https://github.com/GStreamer

Primarily 3 files are required

  1. gstreamer-1.8.1.tar.xz
  2. gst-plugins-base-1.8.1.tar.xz
  3. gst-rtsp-server-1.8.1.tar.xz

If the destination machine is a ec2 instance one can also scp the tar.xz file there

To extract the tar.xz files use tar -xf <filename> it will create a folder for each package.

Prerequisites

build-essentials

sudo apt-get install build-essentials

bison

flex

GLib >= 2.40.0

GLib package contains low-level libraries useful for providing data structure handling for C, portability wrappers and interfaces for such runtime functionality as an event loop, threads, dynamic loading and an object system.

sudo apt-get install libglib2.0-dev

gstreamer

Installing gstreamer 1.8.1 . Gstreamer create a media stream with elements and properties as will be shown on  later sections of this tutorial .

cd gstreamer-1.8.1
./configure
make
sudo make install

Screenshot from 2016-05-19 16-51-29.png

Screenshot from 2016-05-19 16-55-27.png

Screenshot from 2016-05-19 16-56-05.png

after installation  export the path

export LD_LIBRARY_PATH=/usr/local/lib

then verify the installation of the gstreamer by

gst-inspect-1.0

provides information on installed gstreamer modules ie print out a long list ( about 123 in my case ) plugin that are installed such as coreelements:

capsfilter: CapsFilter ximagesink: ximagesink: Video sink videorate: videorate: Video rate adjuster typefindfunctions: image/x-quicktime: qif, qtif, qti typefindfunctions: video/quicktime: mov, mp4 typefindfunctions: application/x-3gp: 3gp typefindfunctions: audio/x-m4a: m4a typefindfunctions: video/x-nuv: nuv typefindfunctions: video/x-h265: h265, x265, 265 typefindfunctions: video/x-h264: h264, x264, 264 typefindfunctions: video/x-h263: h263, 263 typefindfunctions: video/mpeg4: m4v typefindfunctions: video/mpeg-elementary: mpv, mpeg, mpg typefindfunctions: application/ogg: ogg, oga, ogv, ogm, ogx, spx, anx, axa, axv typefindfunctions: video/mpegts: ts, mts typefindfunctions: video/mpeg-sys: mpe, mpeg, mpg typefindfunctions: audio/x-gsm: gsm

gst plugins

Now build the plugins

cd gst-plugins-base-1.8.1
./configure
make
sudo make install

 

gst plugins good

cd gst-plugins-good-1.8.1.tar
./configure
 make
sudo make install

RTSP Server

Now make and install the rtsp server

cd gst-rtsp-server-1.8.1
./configure

last few lines from console traces

Configuration
Version : 1.8.1
Source code location : .
Prefix : /usr/local
Compiler : gcc -std=gnu99
CGroups example : no

make

It will compile the examples .

sudo make install

 

stream video test src

~/mediaServer/gst-rtsp-server-1.8.1/examples]$./test-launch --gst-debug=0 &quot;( videotestsrc ! video/x-raw,format=(yuv),width=352,height=288,framerate=15/1 ! x264enc ! rtph264pay name=pay0 pt=96 )&quot;
stream ready at rtsp://127.0.0.1:8554/test

Ref:

Manual for developers : https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-rtsp-server/html/index.html


Simplest pipeline

gst-launch-1.0 fakesrc ! fakesink

➜ ~ gst-launch-1.0 fakesrc ! fakesink Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock
To stop press ctrl +c ^
Chandling interrupt. Interrupt: Stopping pipeline ... Execution ended after 0:00:48.004547887 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... [/sourcecode ] or to display to a audiovideosink gst-launch-1.0 videotestsrc ! autovideosink
Screenshot from 2016-05-20 12-31-18.png To capture webcam
gst-launch v4l2src ! xvimagesink

Screenshot from 2016-05-20 13-06-56.png

Wowza RTMP Authentication with Third party Token provider over Tiny Encryption Algorithm (TEA)

this article is focused on  Wowza RTMP Authentication with  Third party Token provider over Tiny Encryption Algorithm (TEA)  and  is a continuation of the previous post about setting up a basic RTMP Authentication module on Wowza Engine above version 4.

The task is divided into 3 parts .

  1. RTMP Encoder Application
  2. Wowza RTMP Auth module
  3. Third party Authentication Server

The component diagram is as follows :

Copy of Publisher App iOS

The detailed explanation of the components are :

1.Wowza RTMP Auth module

The Wowza Server receives a rtmp stream url in the format as :

rtmp://username:pass@wowzaip:1935/Application/stteamname

It considers the username and pass to be user credentials . RTMP auth Module invokes the getPassword() function inside of deployed application class  passing the username as parameter.  The username is then  encrypted using TEA ( Tiny Encryption algorithm)

TEA is a block cipher  which is based on symmetric ( private) key encryption . Input is a 64 bit of plain or cipher text with a 128 bit key resulting in output of cipher or plain text respectively.

The code for encryption  is


TEA.encrypt( username, sharedSecret );

The code to make a connection to third party auth server is


 url = new URL(serverTokenValidatorURL);
 
 URLConnection connection;
 connection = url.openConnection();
 connection.setDoOutput(true);

OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
 out.write("clientid=" + TEA.encrypt( username, sharedSecret ););
 out.close(); 

The sharedsecret is the common key which is with both the Auth server and wowza server . It must be atleast a 16 digit alphanumeric / special character based key . An example of shared secret is abcdefghijklmnop .The value can be stored as property in Application.xml file.

<Property>
<Name>secureTokenSharedSecret</Name>
<Value><![CDATA[abcdefghijklmnop]]></Value>
</Property>

<Property>
<Name>serverTokenValidatorURL</Name>
<Value>http://127.0.0.1:8080/TokenProvider/authentication/token</Value&gt;
</Property>

The values of serverTokenValidatorURL is the third party auth server listening for REST POST request .

The code for receiving the incoming  resulting json data is


	ObjectMapper mapper = new ObjectMapper();
	JsonNode node = mapper.readTree(connection.getInputStream()); 
	node = node.get("publisherToken") ;
	String token = node.asText();
        String token2 =TEA.decrypt(token, sharedSecret);

2.Third party Authentication Server

The 3rd party Auth server stores the passwords for users or performs oauth based authentication . It uses a shared secret key to decrypt the token based on TEA as explained in above section .

The code to decrypt the incoming clientId


TEA.decrypt(id, sharedSecret);

Add own custom logic to check files , databases etc for obtaining the password corresponding to the username as decrypted above.

The code to encrypt the password for the user if exists or send invalid response if non exists is


        try {

            String clientID = TEA.decrypt(id, sharedSecret);
            
            String token= findUserPassword(clientID);
            
             token = TEA.encrypt(token, sharedSecret); 
                        
            return "{\"publisherToken\":\""  + token+ "\"}";
            
        }catch (Exception ex) {

            return "{\"error\":\"Invalid Client\"}";
        }

The final callflow thus becomes :

Copy of Publisher App iOS (1)

Screenshots :

Screenshot_2015-09-16-20-22-37Screenshot_2015-09-17-18-36-23Screenshot_2015-09-16-20-22-42Screenshot_2015-09-16-20-23-30

Wowza RTMP Authenticate Module

To purpose of the article is the use the RTMP Authentication Module in wowza Engine .  This will enable us to intercept a connect request with username and password to be checked from any outside source like – database , password file , third party token provider , third party oauth etc.  Once the password provided by user is verified with the authentic password form external sources the user is allowed to connect and publish.

Step 1 : Create a new Wowza Media Server Project in Eclipse .  It is assumed that user has already integrated WowzaIDE into eclipse .

File -> New -> Wowza Media Server Project  

Step 2: Give any project name . I named it as “RTMPAuthSampleCode”.

wowza RTMP Auth
wowza RTMP Auth

Step 3 :   Point the location to existing Wowza Engine installed in local environment .

It is usually in /usr/local/WowzaStreamingEngine/

Wowza RTMP Auth
Wowza RTMP Auth

Step 4 : Proceed with the creation , uncheck the event methods as we are not using them right now .

Screenshot from 2015-09-17 13:10:24

Step 5: Put the code in class.

The class RTMPAuthSampleCode extends AuthenticateUsernamePasswordProviderBase . Its mandatory to define getPassword(String username ) and userExists(String username).  ModuleRTMPAuthenticate will invoke getPassword for connection request from users .

Screenshot from 2015-09-17 13:11:58

We can add any source of obtaining password for a given username which will be matched to the password supplied by user . If it matches he will be granted access otherwise we can return null or error message .

We may use various ways of obtaining user credentials like databse , password files , third part token provider etc . I will be discussing more ways to do RTMP authenticate esp using a third part token provider which using TEA.encrypt and shared secret in the next blog.

Step 6: Build the project and Run.

Project-> Build the Project 

Run -> Run Configurations … -> WowzaMediaServer_RTMPAuthSampleCode

To modules in my ubuntu 64 bit   version 14.04 system , I also need to provide

-Dcom.wowza.wms.native.base=”linux” inside of the VM Arguments . Its highlighted in figure below.

Screenshot from 2015-09-17 13:12:23

Step 7: Click Run to start the wowza Media Engine

Step 8 : Open the Manager Console of Wowza.

web based GUI interface of managing the application and checking for incoming streams . The manager script can be started with

sudo ./usr/local/WowzaStreamingEngine/manager/bin/startmgr.sh

The console can be opened at http://127.0.0.1:8088

Screenshot from 2015-09-17 13:53:58

Also you can see that RTMPAuthSampleCode.jar would have been copied to /usr/local/WowzaStreamingEngine/lib folder.

Step 9: Add module to applications

Add folder “RTMPAuthSampleCode” inside /usr/local/WowzaStreamingEngine/applications folder .

Step 10 : Add conf

Add folder “RTMPAuthSampleCode” inside /usr/local/WowzaStreamingEngine/conf  folder

Copy paste Application.xml from conf folder inside RTMPAuthSampleCode folder and make the following changes .

Add the ModuleRTMPAuthenticate module to Modules

<Module> <Name>ModuleRTMPAuthenticate</Name> <Description>ModuleRTMPAuthenticate</Description> <Class>com.wowza.wms.security.ModuleRTMPAuthenticate</Class> </Module>

and comment ModuleCoreSecurity

<!--    <Module>
     <Name>ModuleCoreSecurity</Name>
     <Description>Core Security Module for Applications</Description>
     <Class>com.wowza.wms.security.ModuleCoreSecurity</Class>
</Module> -->

Step 11: Add property usernamePasswordProviderClass to Properties .

usualy present inside Application at the bootom of Application.xml file

<Property>
<Name>usernamePasswordProviderClass</Name>
<Value>com.wowza.wms.example.authenticate.RTMPAuthSampleCode</Value>
</Property>

Step 12 : Make Authentication.xml file inside /usr/local/WowzaStreamingEngine/conf folder.

Note that from wowza 4 and later versions the Authentiocation.xml has come bundled with wms-server.jar which is inside of lib folder .   However for me , without giving a explicit Authentication.xml file the program froze and using my own simple authentication.xml gave problems with the digest . Hence follow the below process to get a working Authentication.xml file inside conf folder

Expand the archive and  inside the extracted folder wms-server copy the file from location wms-server/com/wowza/wms/conf/Authentication.xml to /usr/local/WowzaStreamingEngine/conf.

Step 13 : Restart Wowza Media Engine .

Step 14 : Use any RTMP encoder as Adobe Live Media Encoder or Gocoder or your own app ( could not use this with ffmpeg ) and  try to connect to application RTMPAuthSampleCode with username test and password 1234.

Step 15 : Observer the logs for incoming streams and traces from getpassword  .

 If you want the user test to have permission to publish stream to this application then return 1234 from getPassword else return null .

References :

  1. Media security overview
    http://www.wowza.com/forums/content.php?115-MediaSecurity-AddOn-Package-(SecureToken-RTMP-RTSP-Authentication-and-more
  2. How to integrate Wowza user authentication with external authentication systems (ModuleRTMPAuthenticate)
    http://www.wowza.com/forums/content.php?236-How-to-integrate-Wowza-user-authentication-with-external-authentication-systems-%28ModuleRTMPAuthenticate%29
  3. How to enable username/password authentication for RTMP and RTSP publishing
    http://www.wowza.com/forums/content.php?449-How-to-enable-username-password-authentication-for-RTMP-and-RTSP-publishing
  4. configuration ref 4.2 http://www.wowza.com/resources/WowzaStreamingEngine_ConfigurationReference.pdf

WebRTC Live Stream Broadcast

WebRTC has the potential to drive the Live Streaming broadcasting area with its powerful no plugin , no installation , open standard  policy . However the only roadblock is the VP8 codec which differs from the traditional H264 codec that is used by almost all the media servers , media control units , etc .

This post is first in the series of building a WebRTC based broadcasting solution. Note that a p2p session differs from a broadcasting session as Peer-to-peer session applies to bidirectional media streaming where as broadcasting only applies unidirectional media flow.

Scalable Broadcasting and Live streaming alternatives

1. WebRTC multi peers

Since WebRTC is p2p technology , it is convenient to build a  network of webrtc client viewers which can pass on the stream to 3 other peers in different session. In this fashion a fission chain like structure is created where a single stream originated to first peer is replicated to 3 others which is in turn replicated to 9 peers etc .

WebRTC Scalable Streaming Server -WebRTC multi peers
WebRTC Scalable Streaming Server -WebRTC multi peers

Advantages :

  1. Scalable without the investment of media servers
  2. No additional space required at service providers network .

Disadvantage :

  1. The entire set of end clients to a node get disconnected if a single node is broken .
  2. Since sessions are dynamically created , it is difficult to maintain a map with fallback option in case of service disruption from any single node .
  3. Client incur bandwidth load of 2 Mbps( stream incoming peer ) incoming and 6 Mbps ( for 3 connected peers ) outgoing data .

2. Torrent based WebRTC chain

To over come the shortcoming of previous approach of  tree based broadcasting , it is suggested to use a chained broadcasting mechanism .

WebRTC Scalable Streaming Server v1 (4)
WebRTC Scalable Streaming Server- Single chain connection

To improvise on this mechanism for incresing efficieny for slow bandwidth connections we can stop their outgoing stream converting them to only consumers . This way the connection is mapped and arranged in such a fashion that every alternate peer is connected to 2 peers  for stream replication. The slow bandwidth clients can be attached as independent endpoints . WebRTC Scalable Streaming Server v1 (3)

3. WebRTC Relay nodes for multiple peers

The aim here is to build a career grade WebRTC stream broadcasting platform , which is capable of using the WebRTC’s mediastream and peerconnection API , along with repeaters to make a scalable broadcasting / live streaming solution using socketio for behavior control and signalling .

Algorithm :

At the Publisher’s end

1. GetUserMedia
2. Start Room “liveConf”
3. Add outgoing stream to session “liveConf “ with peer “BR” in 1 way transport .

1 outgoing audio stream -> 1 MB in 1 RTP port
1 outgoing video -> 1 MB 1 more RTP port
Total Required 2 MB and 2 RTP ports

At the Repeater layer (high upload and download bandwidth )

4. Peer “BR” opens parallel room “liveConf_1” , “liveConf_2” with 4 other peers “Repeater1 “, “Repeater2” , so on
5. Repetare1 getRemoteStream from “liveConf_1” and add as localStream to “liveConf_1_1”

Here the upload bandwidth is high and each repeater is capable of handling 6 outgoing streams . Therefore total 4 repeaters can handle upto 24 streams very easily

At the Viewer’s end

6. Viewer Joins room ”liveConf_1_1”
7. Play the incoming stream on WebRTC browser video element”

WebRTC Relay nodes for multiple peers
WebRTC Relay nodes for multiple peers

Advantages:

  1. As 6 viewers can connect to 1 repeater for feed , total of 24 viewers will require only 4 repeaters.
  2. Only 2 MB consumption at publishers end and 2MB at each viewer’s end.

4. WebRTC  recorder to Broadcasting Media Server VOD

This process is essentially NOT a live streaming solution but a Video On Demand type of implementation for a recorded webRTC stream .

Figure shows a WebRTC node which can record the webrtc files as webm . Audio and video can be together  recorded on fireox . With chrome one needs to merge a separately recorded webm ( video) and wav ( audio ) file to make a single webm file containing both audio and video and them store in VOD server’s repo.

WebRTC Scalable Streaming Server  - WebRTC Chunk recorder to Broadcasting Media Server VOD
WebRTC Chunk recorder to Broadcasting Media Server VOD

Although inherently Media Server do not support webm format but few new age lightweight media servers such as Kurento are capable of this .

Advantages :

  1. Can solve the end goal of broadcasting from a webrtc browser to multiple webrtc browsers without incurring extra load on any client machine ( Obviously assuming that  Media Server handles the distribution of video and load sharing automatically )

Disadvantages:

  1. It is not livestreaming
  2. For significantly longer recorded stream the delta in delay of streaming increases considerably .  Ideally this delta should be no more than 5 minutes .