Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1


Compression algorithms differ from media containers since they involves compressing the information in raw stream to reduce the size for streaming applications while media files are containers which are just used for playback from a set location.

Examples of Codecs: H.261, H.263, VC-1, MPEG-1, MPEG-2, MPEG-4, AVS1, AVS2, AVS3, VP8, VP9, AV1, AVC/H.264, HEVC/H.265, VVC/H.266, EVC, LCEVC

Examples of containers inlude :MPEG-1 System Stream, MPEG-2 Program Stream, MPEG-2 Transport Stream, MP4, MOV, MKV, WebM, AVI, FLV, IVF, MXF, HEIC so on.

MPEG 2

MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
Generic coding of moving pictures and associated audio information
Combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

MPEG2 is better than MPEG 1

Evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo), No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.

Application

  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard

MPEG -4

Video coding standards :-
MPEG-4 Part 2 Visual (ISO/IEC 14496-2) released in 1999 as MPEG-4 video codec
MPEG-4 Part 10 Advanced Video Coding (ISO/IEC 14496-10) released in 2003 as AVC/H.264 video codec;
MPEG-4 Part 14 (ISO/IEC 14496-14) MP4 file format is a media container. rather than a Codec ( compression algorithm).

H264

Introduced in 2004 as Advanced Video Coding (AVC)/H.264 or MPEG-4 AVC or ITU-T H.264/MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC). It is a widely supported vendor agnostic solution.

MPEG-4 Part 10 AVC/H.264 is better than MPEG2

  • 40-50% bit rate reduction compared to MPEG-2
  • Resolution support 4K (4,096×2,304) and 59.94 fps
  • 21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )

Applications of H264

  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat

H265

High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC. Streams high-quality videos in congested network environments or bandwidth constrained mobile networks.

  • (+) 2 times the video compression with the same video quality as H264.
  • (-) higher processing power required

Introduced in Jan 2013 as product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

H265 is better than H264

Overcome shortage of bandwidth, spectrum, storage and performs bandwidth savings of approx. 45% over H.264 encoded content

  • Resolutions up to 8192×4320, including 8K UHD
  • Supports up to 300 fps
  • 3 approved profiles, draft for additional 5 ; 13 levels

Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

Multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

  1. Enhanced Hybrid spatial-temporal prediction model
  2. CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures
  3. Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction). Individual rectangular regions that divide the image are independent
  4. Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.
  5. Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
  6. 33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
  7. Entropy coding is only CABAC

Applications of H265

  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

Reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4  
                                            
ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   
libavutil      56. 22.100 / 56. 22.100   
libavcodec     58. 35.100 / 58. 35.100   
libavformat    58. 20.100 / 58. 20.100   
libavdevice    58.  5.100 / 58.  5.100   
libavfilter     7. 40.101 /  7. 40.101   
libavresample   4.  0.  0 /  4.  0.  0   
libswscale      5.  3.100 /  5.  3.100   
libswresample   3.  3.100 /  3.  3.100   
libpostproc    55.  3.100 / 55.  3.100 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   
...

If you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support

HEVC bitstream is an ordered sequence of the syntax elements. Each syntax element is placed into a logical packet called a NAL (network abstraction layer) Unit. There are 64 different NAL Unit types. They can be grouped into 10 classes:

  1. VPS – Video parameter set
  2. SPS – Sequence parameter set
  3. PPS – Picture parameter set
  4. Slice (different types)
  5. AUD – Access unit delimiter signals the start of video frame
  6. EOS – End of sequence
  7. EOB – End of bitstream
  8. FD – Filler data for bitrate smoothening
  9. SEI – Supplemental enhancement information such as picture timing, color space information, etc.
  10. Reserved and unspecified

AV1

Realtime High quality video encoder , product of the Alliance for Open Media (AOM). Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC).

Av1 is better than H265

  • (+) AV1 is royalty free and overcomes the patent complexities around H265/HVEC

Applications

  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

SVC ( scalable video encoding)

SVC standardises the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. Asubset bitsteam can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal ( dropped packet) compared to the base stream it is derieved from.

  • Temporal (frame rate) scalability is enabled through structuring motion compensation dependencie so that complete pictures (i.e. their associated packets) can be dropped from the bitstream.
  • Spatial (picture size) scalability is enabled with video coded at multiple spatial resolutions
  • SNR/Quality/Fidelity scalability: video is coded at a single spatial resolution but at different qualities. The data and decoded samples of lower qualities can be used to predict data or samples of higher qualities in order to reduce the bit rate to code the higher qualities.
  • Combined scalability: a combination of the 3 scalability modalities described above.

Not all codecs sypport all modes. While the Av1 and VP9 support majority of modes defined in the table, VP8 only supports temporal scalability (e.g. “L1T2”, “L1T3”);H.264/SVC supports both temporal and spatial scalability but only permits transport of simulcast on distinct SSRCs.

Vp8

{
      "clockRate": 90000,
      "mimeType": "video/VP8",
      "scalabilityModes": [
        "L1T1",
        "L1T2",
        "L1T3"
      ]
    },
 const stream = await navigator.mediaDevices.getUserMedia(constraints);
    selfView.srcObject = stream;
    pc.addTransceiver(stream.getAudioTracks()[0], {direction: 'sendonly'});
    pc.addTransceiver(stream.getVideoTracks()[0], {
      direction: 'sendonly',
      sendEncodings: [
        {rid: 'q', scaleResolutionDownBy: 4.0, scalabilityMode: 'L1T3'}
        {rid: 'h', scaleResolutionDownBy: 2.0, scalabilityMode: 'L1T3'},
        {rid: 'f', scalabilityMode: 'L1T3'},
      ]    
    });


References :