Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1

Compression algorithms differ from media containers since they involves compressing the information in raw stream to reduce the size for streaming applications while media files are containers which are just used for playback from a set location.

Examples of Codecs: H.261, H.263, VC-1, MPEG-1, MPEG-2, MPEG-4, AVS1, AVS2, AVS3, VP8, VP9, AV1, AVC/H.264, HEVC/H.265, VVC/H.266, EVC, LCEVC

Examples of containers inlude :MPEG-1 System Stream, MPEG-2 Program Stream, MPEG-2 Transport Stream, MP4, MOV, MKV, WebM, AVI, FLV, IVF, MXF, HEIC so on.


MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
Generic coding of moving pictures and associated audio information
Combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

MPEG2 is better than MPEG 1

Evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo), No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.


  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard


Video coding standards :-
MPEG-4 Part 2 Visual (ISO/IEC 14496-2) released in 1999 as MPEG-4 video codec
MPEG-4 Part 10 Advanced Video Coding (ISO/IEC 14496-10) released in 2003 as AVC/H.264 video codec;
MPEG-4 Part 14 (ISO/IEC 14496-14) MP4 file format is a media container. rather than a Codec ( compression algorithm).


Introduced in 2004 as Advanced Video Coding (AVC)/H.264 or MPEG-4 AVC or ITU-T H.264/MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC). It is a widely supported vendor agnostic solution.

MPEG-4 Part 10 AVC/H.264 is better than MPEG2

  • 40-50% bit rate reduction compared to MPEG-2
  • Resolution support 4K (4,096×2,304) and 59.94 fps
  • 21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )

Applications of H264

  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat


High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC. Streams high-quality videos in congested network environments or bandwidth constrained mobile networks.

  • (+) 2 times the video compression with the same video quality as H264.
  • (-) higher processing power required

Introduced in Jan 2013 as product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

H265 is better than H264

Overcome shortage of bandwidth, spectrum, storage and performs bandwidth savings of approx. 45% over H.264 encoded content

  • Resolutions up to 8192×4320, including 8K UHD
  • Supports up to 300 fps
  • 3 approved profiles, draft for additional 5 ; 13 levels

Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

Multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

  1. Enhanced Hybrid spatial-temporal prediction model
  2. CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures
  3. Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction). Individual rectangular regions that divide the image are independent
  4. Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.
  5. Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
  6. 33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
  7. Entropy coding is only CABAC

Applications of H265

  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

Reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4  
ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   
libavutil      56. 22.100 / 56. 22.100   
libavcodec     58. 35.100 / 58. 35.100   
libavformat    58. 20.100 / 58. 20.100   
libavdevice    58.  5.100 / 58.  5.100   
libavfilter     7. 40.101 /  7. 40.101   
libavresample   4.  0.  0 /  4.  0.  0   
libswscale      5.  3.100 /  5.  3.100   
libswresample   3.  3.100 /  3.  3.100   
libpostproc    55.  3.100 / 55.  3.100 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   

If you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support

HEVC bitstream is an ordered sequence of the syntax elements. Each syntax element is placed into a logical packet called a NAL (network abstraction layer) Unit. There are 64 different NAL Unit types. They can be grouped into 10 classes:

  1. VPS – Video parameter set
  2. SPS – Sequence parameter set
  3. PPS – Picture parameter set
  4. Slice (different types)
  5. AUD – Access unit delimiter signals the start of video frame
  6. EOS – End of sequence
  7. EOB – End of bitstream
  8. FD – Filler data for bitrate smoothening
  9. SEI – Supplemental enhancement information such as picture timing, color space information, etc.
  10. Reserved and unspecified


Realtime High quality video encoder , product of the Alliance for Open Media (AOM). Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC).

Av1 is better than H265

  • (+) AV1 is royalty free and overcomes the patent complexities around H265/HVEC


  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

SVC ( scalable video encoding)

SVC standardises the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. Asubset bitsteam can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal ( dropped packet) compared to the base stream it is derieved from.

  • Temporal (frame rate) scalability is enabled through structuring motion compensation dependencie so that complete pictures (i.e. their associated packets) can be dropped from the bitstream.
  • Spatial (picture size) scalability is enabled with video coded at multiple spatial resolutions
  • SNR/Quality/Fidelity scalability: video is coded at a single spatial resolution but at different qualities. The data and decoded samples of lower qualities can be used to predict data or samples of higher qualities in order to reduce the bit rate to code the higher qualities.
  • Combined scalability: a combination of the 3 scalability modalities described above.

Not all codecs sypport all modes. While the Av1 and VP9 support majority of modes defined in the table, VP8 only supports temporal scalability (e.g. “L1T2”, “L1T3”);H.264/SVC supports both temporal and spatial scalability but only permits transport of simulcast on distinct SSRCs.


      "clockRate": 90000,
      "mimeType": "video/VP8",
      "scalabilityModes": [
 const stream = await navigator.mediaDevices.getUserMedia(constraints);
    selfView.srcObject = stream;
    pc.addTransceiver(stream.getAudioTracks()[0], {direction: 'sendonly'});
    pc.addTransceiver(stream.getVideoTracks()[0], {
      direction: 'sendonly',
      sendEncodings: [
        {rid: 'q', scaleResolutionDownBy: 4.0, scalabilityMode: 'L1T3'}
        {rid: 'h', scaleResolutionDownBy: 2.0, scalabilityMode: 'L1T3'},
        {rid: 'f', scalabilityMode: 'L1T3'},

References :

Telecommunications convergence – VoIP , PBX and IMS

There has been rapid evolution of telecom platform over the last few decades . Starting from the the mobile phone network-enabled universal communication agnostic to actual location to present day high bandwidth high data rate entertainment/ streaming like applications. The affordable, personal communication system has converged to enterperise level secure communication systems that cater to low latency and highly secure end to end encrypted scenarios.

As bandwidth has increased, so has the proliferation of VoIP systems. From the user’s perspective, modern mobile devices deliver the converged, multi-media communication and entertainment experience.


VOIP , short for Voice over IP , is called so beacuse it not only converts your voice calls in analog voice into digital packets but also channels voice data through IP networks such as LAN , WAN , Internet etc using the Internet Protocol (IP) .

  • VOIP system on LAN ( Local Area Network ) can use it as its backbone system to establish communication between endpoints . For example : Office communication system within the same enterprise/building.
  • Similarity  VOIP over WAN ( Wide Area Network ) use the help  of IP PBX and VoIP service provider to enable communication across Internet . For example : OTT providers and internet calls.
  • By using the services of telecom providers in support with above plan it is also possible to land a VOIP call onto a real phone over GSM / PSTN via gateways.

For a provider of IP telephony system , number of factors come into picture such as :

  1. Bandwidth :
    • Low bandwidth has always been a big concern for IP calls especially due to packet loss and thus high noise. While a LAN connection ensures good experience, calls over internet or VOIP PBX are not necessarily as neat.
    • Network switching between different Internet service providers causes congestion and lags too.
  2. Inter-operability : Connecting remote works / employees to the VOIP network requires interoperablity between their hand held device like android , ios , tablets , smart watch or other types od communication devices such as hardphone, desktop-systems , kiosk , surveillance cams etc is a challenging considering the underlying OS and networking support.
  3. Traffic: Maximum simultaneous call or peak traffic rate can create bottlenecks in communication channel or worse still result in high bandwidth usage. For example as p2p conf call between 5 parties will create a mesh network between each participant resulting in 4 outgoing and 4 incoming channels.
  4. QoS (Quality of service ) :
    • Call drops ,
    • prioritization of important calls ,
    • Security preventing the attacks and hacks ,
    • keeping information secure by encryption end to end
  5. AAA
    managing Authentication , Authorization and accounting
  6. Reuse existing Hardware :
    • Replacing old hardware or installing softphone apps on mobiles etc .
    • Reuse old servers . Manage setup between datacentres and cloud deployments
    • Administravtive hurdles between different counteries and geographies for using hardware
  7. Scaling
    • How quickly can it scale up or scale down ?
    • Will the communication system grow horizontly or vertically ?
    • How to ensure that the growing system can accommodate new users , physical office location , remote centers , call centres etc ?
  8. Codecs
    Under low bandwidth condition it is a good idea to switch to low resolution ( in case of video ) and low bandwidth codec ( in case of audio ) .

Other factors such as privacy , accounability , Lawful interception ( legal requirnments in many enterprises ) , Auditing , SLA ( Service Level Agreements) to ensure the system is up 99.99 % of time and agrreeing to pay compensation if system is down for longer duration than 0.01 % of time so on.

Some of the positive aspects of using IP communication over traditional communication systems are :

Higher ROI : Return of investment is a big factor for SME before making the switch to IP telephony inplace of traditional established system like landline phone and cables. However it is for a fact that once the VOIP comm system is setup , it most certainly reduces call costs by 70%.

Third party Interations : It is often a necessaity to integrate communication system with CRM ( content realationship management ) systems or Sales management systems or other lead gtracking systesm which are driven from communications with possible clients or investors ( called leads) . Since most web portals are on IP protocl as HTTP, VOIP fits very well, with the click to call on webpage itself among other features such as directory integration , notofoication , call scripts etc.

VAS ( Value Added Service) : Value Added Services , refer to services build on top of existing underlying mobile communication call and sms. These could be innovation usecases build using -IVR / DTMF such as cricket score , astrology updates or call recoring , find-me-follow-me applicatoion for multiple devices , voicemail/ visual voice mail , re-routing to home phone or assiatnt phone , called ID etc . In short it can add intelligence to the way calls are managed .

Hosting the PBX

Unified communication Solutions as SaaS or IaaS refer to on-premise or cloud-hosted IP PBX Solutions. Comparison of both is as follows

On -premiseCloud Based
The solution is usually of the SaaS nature ( software as a service ) which is hosted by the consumer / business unit itself . The service provider offers his infrastructure to the consumer as a service and bills monthly / yearly etc .
Hosting the solution system on premise and setting up the infrastructure means more customization and flexibility but it also means more investment and maintenance . On the other hand hosting the solution on cloud is often a quick setup with relatively lower upfront payment. The billing is either carried out per per user basis or based on consumption . The data is synced to cloud servers for storage and can be fetched from there when required such as cloud synced Call-logs or contact-book .

Convergence Vision 

We already know some of the latest trends of industry with respect to telecom convergence such as :


Fixed Mobile Convergence (FMC) stands for integrating user’s fixed desk phone with his mobile phone. Call continuity is a VAS( Value added service ) which lets him to switch calls between different call devices even softphones , mid call also. It has multi-faced advantages such as not missing any call on account of being out of office , having the same call preferences on each device such as blocked numbers , IVR settings etc .


Unified Communication refers to the accessibility of all communication and collaboration services from the users call agent ( phone / soft-phone ) . These services can include file transfer , chat , conference , call settings , blocking , white-listing , fax , cloud sync , call logs , called ID , favorites , recording .
Read more about Unified communication and collaboration here .


Bring your own device is one of the hottest trends in industry almost across all domains where user is expected or is given to option to bring his personal laptop for official use . It is the responsibility of enterprise comm system to seamlessly integrate it with in-office communication system and provide the same privileges and security to business critical applications as preset in configuration settings . It increases the flexibility and productivity while keeping the infrastructure cost down.

IMS provided Network Interoperability and Access Independence

Image Source unknown. Represents the convergence of IMS subsystem with various access types

IMS based tele-coommunication convergence described in figure below

  • clients get direct connectivity to IP PBX in offices or hotels
  • home users connect through cable wires or Wifi/WiMax
  • non SIP based legacy endpoints connect via signalling and media gateways

The access endpoints connecte to a single managed core IP network which intercoonectes with IMS core . The back end system not only manages calls and sessions but also registration  ,  billing , operations and adminstartion.

IMS convergence vision
picture courtesy – unknown

 Intelligent Network   —>    Next Generation IMS System 

The signalling protocols migration like from signalling system 7 (SS7) to session initial protocol (SIP) have been taking place in Telco-Industry. Similarly nodes of legacy network like signal transfer point (STP) of legacy network are being migrated to call session control function (CSCF) of IMS  that allows the rapid development and deployment of enhanced, revenue-generating multimedia services for fixed, mobile and cable operators.

IMS architecture enables operators to seamlessly run a plethora of next-generation converged services over their fixed, mobile and cable networks, achieve a faster time-to-market for new services and have fewer performance bottlenecks.

converged telecommunications

Business benefits of IMS 

  1. Delivering Services: Delivering services and applications on a “wherever, however, whenever” basis.
  2. Multimedia services: Enabling service providers to offer multimedia services across both next-gen, packet-switched networks and traditional circuit-switched networks.
  3. Protocol stack: IMS architecture provides pipes and protocols onto which service providers can attach no. of applications very conveniently.
  4. Open Source standard: IMS architecture is based on open standard which makes it possible for different vendors of hardware and software to integrate with each other seamlessly.

As a subscriber, one of the main benefits of the IMS architecture is the capacity of the network to deliver the same set of services whatever the access network used.


This is made possible thanks to the centralization of the service execution process. A specific call server of the control plan (called Serving Call Session Control Function, S-CSCF) is responsible for invoking the application servers based on criteria provisioned in the central database. The S-CSCF gets these criteria (called Initial Filter Criteria) during the user’s registration in the IMS network.

Circuit Switched Voice –> Packet based VOIP 

Voice over IP revolutionized in the Telecommunication space.It also makes your communication experience much richer and nicer with a series of enhanced features and extended possibilities. The no. of user migrating from traditional circuit switched network to IP has been quite substantial in recent years. CSP are embracing VOIP technology as a potential revenue generator and investing huge chunk of money to create value propositions for themselves in VOIP.


In conclusion here are the top business benefits of adopting a converged and unified IP telephony solution such as IMS and SIP are

Cost Savings : Saving money is the number-one reason most businesses and households make the switch to a VoIP system, VoIP systems don’t require a phone cabinet or on-site routing equipment- just phones.

Features: VoIP also allows users to take advantage of advanced features only available on internet-based phone systems. Features like online call monitoring, and online phone system access to add or configure extensions are also available with VoIP systems.

Flexibility: VoIP allows people to go mobile and call directly from their cell phone and be charged at low VoIP rates

Tracking Options: Since VoIP is an internet-based system, user can track and manage their system from their computer. Most VoIP systems allow user to track call volume and call time fairly easily- a feature that can be especially helpful for businesses that bill clients hourly or for time spent on the phone.