WebRTC Audio/Video Codecs


Codecs signifies the media stream’s compession and decompression. For peers to have suceesfull excchange of media, they need a common set of codecs to agree upon for the session. The list codecs are sent  between each other as part of offeer and answer or SDP in SIP.

As WebRTC provides containerless bare mediastreamgtrackobjects. Codecs for these tracks is not mandated by webRTC . Yet the codecs are specified by two seprate RFCs

  1. RFC 7878 WebRTC Audio Codec and Processing Requirements specifies least the Opus codec as well as G.711’s PCMA and PCMU formats.
  2. RFC 7742 WebRTC Video Processing and Codec Requirnments specifies support for  VP8 and H.264’s Constrained Baseline profile for video .

In WebRTC video is protected using Datagram Transport Layer Security (DTLS) / Secure Real-time Transport Protocol (SRTP). In this article we are going to dicuss Audio/Video Codecs processing requirnments only.

WebRTC is free and opensource and its woring bodies promote royality free codecs too. The working groups RTCWEB and IETF make the sure of the fact that non-royality beraning codec are mandatory while other codecs can be optional in WebRTC non browsers .

WebRTC Browsers MUST implement the VP8 video codec as described in RFC6386 and H.264 Constrained Baseline described in RFC 7442.

WebRTC Video Codec and Processing Requirements
Media Flow in WebRTC Call

WebRTC Video Codecs

Most of the codesc below follow Lossy DCT(discrete cosine transform (DCT) based algorithm for encoding. Sample SDP from offer in Chrome browser v80 for Linux incliudes these profile :

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123
a=rtcp-mux
a=rtcp-rsize

a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96

a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100

a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=102

a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=127

a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125

a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 red/90000
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=124

VP8

Developed by on2 and then acquired and opensource by google.

libvpx encoder library.

  • Supported conatiner – 3GP, Ogg, WebM
  • (+) supported simulcast
  • (+) Now free of royality fees.
  • (+) No limit on frame rate or data rate

Maximum resolution of 16384×16384 pixels.

VP8 encoders must limit the streams they send to conform to the values indicated by receivers in the corresponding max-fr and max-fs SDP attributes.

Encode and decode pixels with an implied 1:1 (square) aspect ratio.

VP9

Video Processor 9 (VP9) is the successor to the older VP8 and comparable to HEVC as they both have simillar bit rates.

  • supported Containers are – MP4, Ogg, WebM
  • (+) Open and free of royalties and any other licensing requirements

H264/AVC constrained

AVC’s Constrained Baseline (CBP ) profile compliant with WebRTC.

  • propertiary, patented codec, mianted by MPEG / ITU

Constrained Baseline Profile Level 1.2 and H.264 Constrained High Profile Level 1.3 . Contrained baseline is a submet of the main profile , suited to low dealy , low complexity. suited to lower processing device like mobile videos

Multiview Video Coding – can have multiple views of the same scene ,such as stereoscopic video.

Other profiles , which are not supporedt are Baseline(BP), Extended(XP), Main(MP) , High(HiP) , Progressive High(ProHiP) , High 10(Hi10P), High 4:2:2 (Hi422P) and High 4:4:4 Predictive

  • supported containers are 3GP, MP4, WebM

Parameter settings:

  • packetization-mode
  • max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
  • sprop-parameter-sets: H.264 allows sequence and picture information to be sent both in-band and out-of-band. WebRTC implementations must signal this information in-band.
  • Supplemental Enhancement Information (SEI) “filler payload” and “full frame freeze” messages( used while video switching in MCU streams )

AV1 (AOMedia Video 1)

open format designed by the Alliance for Open Media. It is royality free and especially designed for internet video HTML element and WebRTC.

  • higher data compression rates than VP9 and H.265/HEVC

offers 3 profiles in increasing support for color depths and chroma subsampling.
1. main,
2. high, and
3. professional

  • supports HDR
  • supports Varible Frame Rate
  • Supported container are ISOBMFF, MPEG-TS, MP4, WebM

Stats for Video based media stream track

timestamp 04/05/2020, 14:25:59
ssrc 3929649593
isRemote false
mediaType video
kind video
trackId RTCMediaStreamTrack_sender_2
transportId RTCTransport_0_1
codecId RTCCodec_1_Outbound_96
[codec] VP8 (payloadType: 96)
firCount 0
pliCount 9
nackCount 476
qpSum 912936
[qpSum/framesEncoded] 32.86666666666667
mediaSourceId RTCVideoSource_2
packetsSent 333664
[packetsSent/s] 29.021823604499957
retransmittedPacketsSent 0
bytesSent 342640589
[bytesSent/s] 3685.7715977714947
headerBytesSent 8157584
retransmittedBytesSent 0
framesEncoded 52837
[framesEncoded/s] 30.022576142586164
keyFramesEncoded 31
totalEncodeTime 438.752
[totalEncodeTime/framesEncoded_in_ms] 3.5333333333331516
totalEncodedBytesTarget 335009905
[totalEncodedBytesTarget/s] 3602.7091371103397
totalPacketSendDelay 20872.8
[totalPacketSendDelay/packetsSent_in_ms] 6.89655172416302
qualityLimitationReason bandwidth
qualityLimitationResolutionChanges 20
encoderImplementation libvpx
Graph for Video Track in chrome://webrtc-internals

Non WebRTC supported Video codecs

Need active realtime media transcoding

H.263

Already used for video conferencing on PSTN (Public Switched Telephone Networks), RTSP, and SIP (IP-based videoconferencing) systems.

  • suited for low bandwidth networks
  • (-) not comaptible with WebRTC
    • but many media gateways incldue realtime transcoding existed between H263 based SIP systems and vp8 based webrtc ones to enable video communication between them

H.265 / HEVC

proprietary format and is covered by a number of patents. Licensing is managed by MPEG LA .

  • Container – Mp4

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

With the rise of Internet of Things many Endpoints especially IP cameras connected to Raspberry Pi like SOC( system on chiops )n wanted to stream directly to the browser within theor own provate network or even on public network using TURN / STUN.

The figure below shows how such a call flow is possible between an IP cemera ( such as Baby Cam ) and its parent monitoring it over a WebRTC suppported mobile phone browser . The process includes streaming teh content from IOT device on RTSP stream and using realtime trans-coding between H264 and VP8

Interoprabiloity between non WebRT Compatible and WebRTC compatible endpoints

WebRTC Audio Codecs

source : unknown

WebRTC endpoints are should implement audio codecs: OPUS and PCMA / PCMU, along with Comforrt Noise and DTMF events.

Trace for audio codecs supported in chrome (Version 80.0.3987.149 (Official Build) (64-bit) on ubuntu)

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126

a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000

Opus

Opus is a lossy audio compression format developed by the Internet Engineering Task Force (IETF) targeting a broad range of interactive real-time applications over the Internet, from speech to music and supportes multiple compression algorithms

  • Constant and variable bitrate encoding – 6 kbit/s to 510 kbit/s
  • frame sizes – 2.5 ms to 60 ms
  • sampling rates – 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).
  • container- Ogg, WebM, MPEG-TS, MP4

As an open format standardized through RFC 6716, a reference implementation is provided under the 3-clause BSD license. All known software patents which cover Opus are licensed under royalty-free terms.

  • (+ ) flexible, suited for speech ( by SILK) and music ( CELT)
  • (+) support for mono and stereo
  • (+) inbuild FEC( Forward Error Correction) thus resilient to packet loss
  • (+) compression adjustability\ for unpredictable networks
  • (-) Highly CPU intensive ( unsuitable for embedded devices like rpi)
  • (-) processing and memory intensive

For all cases where the endpoint is able to process audio at a sampling rate higher than 8 kHz, it is w3C recommends that Opus be offered before PCMA/PCMU.

AAC (Advanvced Audio Encoding)

part of the MPEG-4 (H.264) standard. Lossy compression but has number pf profiles suiting each usecase like high quality surround sound to low-fidelity audio for speech-only use.

  • supported containers – MP4, ADTS, 3GP

G.711 (PCMA and PCMU)

G.711 is an ITU standard (1972) for audio compression. It is primarily used in telephony.

ITU published Pulse Code Modulation (PCM) with either µ-law or A-law encoding.
vital to interface with the standard telecom network and carriers. G.711 PCM (A-law) is known as PCMA and G.711 PCM (µ-law) is known as PCMU

It is the required standard in many voice-based systems and technologies, for example in H.320 and H.323 specifications.

  • Fixed 64Kbpd bit rate
  • supports 3GP container formats

G.722

ITU standard (1988) Encoded using Adaptive Differential Pulse Code Modulation (ADPCM) which is suited for voice compression

  • 7 kHz Wideband audio codec operating
  • Bitrate 48, 56 and 64 kbit/s.
  • containers used 3GP, AMR-WB

G722 improved speech quality due to a wider speech bandwidth of up to 50-7000 Hz compared to G.711 of 300–3400 Hz.

Comfort noise (CN)

artificial background noise which is used to fill gaps in a transmission instead of using pure silence. It prevents – jarring or RTP Timeout.

Should be used for streams encoded with G.711 or any other supported codec that does not provide its own CN. Use of Discontinuous Transmission (DTX) / CN by senders is optional

Internet Low Bitrate Codec (iLBC)

A opensource narrowband speech codec for VoIP and streaming audio.

  • 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
  • Defined by IETF RFCs 3951 and 3952.

Internet Speech Audio Codec (iSAC)

iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. It is designed for voice transmissions which are encapsulated within an RTP stream.

  • 16 kHz or 32 kHz sampling frequency
  • adaptive and variable bit rate of 12 to 52 kbps.

Speex

patent-free audio compression format designed for speech and also a free software speech codec that is used in VoIP applications and podcasts. May be obsolete, with Opus as its official successor.

AMR-WB Adaptive Multi-rate Wideband is a patented wideband speech coding standard that provides improved speech quality. This is codec is generally available on mobile phones.

  • wider speech bandwidth of 50–7000 Hz.
  • data rate is between 6-12 kbit/s, and the

DTMF and ‘audio/telephone-event’ media type

endpoints may send DTMF events at any time and should suppress in-band dual-tone multi-frequency (DTMF) tones, if any.

DTMF events list

| 0 | DTMF digit "0"
| 1 | DTMF digit "1"
| 2 | DTMF digit "2"
| 3 | DTMF digit "3"
| 4 | DTMF digit "4"
| 5 | DTMF digit "5"
| 6 | DTMF digit "6"
| 7 | DTMF digit "7"
| 8 | DTMF digit "8"
| 9 | DTMF digit "9"
| 10 | DTMF digit "*"
| 11 | DTMF digit "#"
| 12 | DTMF digit "A"
| 13 | DTMF digit "B"
| 14 | DTMF digit "C"
| 15 | DTMF digit "D"

Stats for Audio Media track

Stats for Audio Media include

  • headerBytesSent
  • packetsSent
  • bytesSent
timestamp 04/05/2020, 14:25:59
ssrc 3005719707
isRemote fals
mediaType audio
kind audio
trackId RTCMediaStreamTrack_sender_1
transportId RTCTransport_0_1
codecId RTCCodec_0_Outbound_111
[codec] opus (payloadType: 111)
mediaSourceId RTCAudioSource_1
packetsSent 88277
[packetsSent/s] 50.03762690431027
retransmittedPacketsSent 0
bytesSent 1977974
[bytesSent/s] 150.11288071293083
headerBytesSent 2118648
retransmittedBytesSent 0
Graphs in chrome://webrtc-internals for Audio

DataChannel

m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
b=AS:30
a=ice-ufrag:blj+
a=ice-pwd:Ytdofc24WZYWRAnyNSNhuF4F
a=ice-options:trickle
a=fingerprint:sha-256 18:2F:B9:13:A1:BA:33:0C:D0:59:DB:83:9A:EA:38:0B:D7:DC:EC:50:20:6E:89:54:CC:E8:70:10:80:2B:8C:EE
a=setup:active
a=mid:2
a=sctp-port:5000
a=max-message-size:262144

Stats for Datachannel

Statistics RTCDataChannel_1
timestamp 04/05/2020, 14:25:59
label sctp
protocol
datachannelid 1
state open
messagesSent 1
[messagesSent/s] 0
bytesSent 228
[bytesSent/s] 0
messagesReceived 1
[messagesReceived/s] 0
bytesReceived 228
[bytesReceived/s] 0

Refrenecs :

Quick links : If you are new to WebRTC read : Introduction to WebRTC is at https://telecom.altanai.com/2013/08/02/what-is-webrtc/

Layers of WebRTC at https://telecom.altanai.com/2013/07/31/webrtc/

Video Codecs – H264 , H265 , AV1

Article discusses the popularly adopted current standards for video codecs( compression / decompression) namely MPEG2, H264, H265 and AV1


Compression algorithms differ from media containers since they involves compressing the information in raw stream to reduce the size for streaming applications while media files are containers which are just used for playback from a set location.

Examples of Codecs: H.261, H.263, VC-1, MPEG-1, MPEG-2, MPEG-4, AVS1, AVS2, AVS3, VP8, VP9, AV1, AVC/H.264, HEVC/H.265, VVC/H.266, EVC, LCEVC

Examples of containers inlude :MPEG-1 System Stream, MPEG-2 Program Stream, MPEG-2 Transport Stream, MP4, MOV, MKV, WebM, AVI, FLV, IVF, MXF, HEIC so on.

MPEG 2

MPEG-2 (a.k.a. H.222/H.262 as defined by the ITU)
Generic coding of moving pictures and associated audio information
Combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth.

MPEG2 is better than MPEG 1

Evolved out of the shortcomings of MPEG-1 such as audio compression system limited to two channels (stereo), No standardized support for interlaced video with poor compression , Only one standardized “profile” (Constrained Parameters Bitstream), which was unsuited for higher resolution video.

Application

  • over-the-air digital television broadcasting and in the DVD-Video standard.
  • TV stations, TV receivers, DVD players, and other equipment
  • MOD and TOD – recording formats for use in consumer digital file-based camcorders.
  • XDCAM – professional file-based video recording format.
  • DVB – Application-specific restrictions on MPEG-2 video in the DVB standard

MPEG -4

Video coding standards :-
MPEG-4 Part 2 Visual (ISO/IEC 14496-2) released in 1999 as MPEG-4 video codec
MPEG-4 Part 10 Advanced Video Coding (ISO/IEC 14496-10) released in 2003 as AVC/H.264 video codec;
MPEG-4 Part 14 (ISO/IEC 14496-14) MP4 file format is a media container. rather than a Codec ( compression algorithm).

H264

Introduced in 2004 as Advanced Video Coding (AVC)/H.264 or MPEG-4 AVC or ITU-T H.264/MPEG-4 Part 10 ‘Advanced Video Coding’ (AVC). It is a widely supported vendor agnostic solution.

MPEG-4 Part 10 AVC/H.264 is better than MPEG2

  • 40-50% bit rate reduction compared to MPEG-2
  • Resolution support 4K (4,096×2,304) and 59.94 fps
  • 21 profiles ; 17 levels

Compression Model

Video compression relies on predicting motion between frames. It works by comparing different parts of a video frame to find the ones that are redundant within the subsequent frames ie not changed such as background sections in video. These areas are replaced with a short information, referencing the original pixels(intraframe motion prediction) using mathematical function and direction of motion

Hybrid spatial-temporal prediction model
Flexible partition of Macro Block(MB), sub MB for motion estimation
Intra Prediction (extrapolate already decoded neighbouring pixels for prediction)
Introduced multi-view extension
9 directional modes for intra prediction
Macro Blocks structure with maximum size of 16×16
Entropy coding is CABAC(Context-adaptive binary arithmetic coding) and CAVLC(Context-adaptive variable-length coding )

Applications of H264

  • most deployed video compression standard
  • Delivers high definition video images over direct-broadcast satellite-based television services,
  • Digital storage media and Blu-Ray disc formats,
  • Terrestrial, Cable, Satellite and Internet Protocol television (IPTV)
  • Security and surveillance systems and DVB
  • Mobile video, media players, video chat

H265

High Efficiency Video Coding (HEVC), or H.265 or MPEG-H HEVC. Streams high-quality videos in congested network environments or bandwidth constrained mobile networks.

  • (+) 2 times the video compression with the same video quality as H264.
  • (-) higher processing power required

Introduced in Jan 2013 as product of collaboration between the ITU Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).

H265 is better than H264

Overcome shortage of bandwidth, spectrum, storage and performs bandwidth savings of approx. 45% over H.264 encoded content

  • Resolutions up to 8192×4320, including 8K UHD
  • Supports up to 300 fps
  • 3 approved profiles, draft for additional 5 ; 13 levels

Whereas macroblocks can span 4×4 to 16×16 block sizes, CTUs can process as many as 64×64 blocks, giving it the ability to compress information more efficiently.

Multiview encoding – stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It also packs a large amount of inter-view statistical dependencies.

Compression Model

  1. Enhanced Hybrid spatial-temporal prediction model
  2. CTU ( coding tree units) supporting larger block structure (64×64) with more variable sub partition structures
  3. Motion Estimation – Intra prediction with more nodes, asymmetric partitions in Inter Prediction). Individual rectangular regions that divide the image are independent
  4. Paralleling processing computing – decoding process can be split up across multiple parallel process threads, taking advantage multi-core processors.
  5. Wavefront Parallel Processing (WPP)- sort of decision tree that grants a more productive and effectual compression.
  6. 33 directional nodes – DC intra prediction , planar prediction. , Adaptive Motion Vector Prediction
  7. Entropy coding is only CABAC

Applications of H265

  • cater to growing HD content for multi platform delivery
  • differentiated and premium 4K content

Reduced bitrate enables broadcasters and OTT vendors to bundle more channels / content on existing delivery mediums
also provide greater video quality experience at same bitrate

Using ffmpeg for H265 encoding

I took a h264 file (640×480) , duration 30 seconds of size 39,08,744 bytes (3.9 MB on disk) and converted using ffnpeg

After conversion it was a HEVC (Parameter Sets in Bitstream) , MPEG-4 movie – 621 KB only !!! without any loss of clarity.

> ffmpeg -i pivideo3.mp4 -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4  
                                            
ffmpeg version 4.1.4 Copyright (c) 2000-2019 the FFmpeg developers   
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)   
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.4_2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-12.0.1.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr   
libavutil      56. 22.100 / 56. 22.100   
libavcodec     58. 35.100 / 58. 35.100   
libavformat    58. 20.100 / 58. 20.100   
libavdevice    58.  5.100 / 58.  5.100   
libavfilter     7. 40.101 /  7. 40.101   
libavresample   4.  0.  0 /  4.  0.  0   
libswscale      5.  3.100 /  5.  3.100   
libswresample   3.  3.100 /  3.  3.100   
libpostproc    55.  3.100 / 55.  3.100 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pivideo3.mp4':   
...

If you get error like

Unknown encoder 'libx265'

then reinstall ffmpeg with h265 support

HEVC bitstream is an ordered sequence of the syntax elements. Each syntax element is placed into a logical packet called a NAL (network abstraction layer) Unit. There are 64 different NAL Unit types. They can be grouped into 10 classes:

  1. VPS – Video parameter set
  2. SPS – Sequence parameter set
  3. PPS – Picture parameter set
  4. Slice (different types)
  5. AUD – Access unit delimiter signals the start of video frame
  6. EOS – End of sequence
  7. EOB – End of bitstream
  8. FD – Filler data for bitrate smoothening
  9. SEI – Supplemental enhancement information such as picture timing, color space information, etc.
  10. Reserved and unspecified

AV1

Realtime High quality video encoder , product of the Alliance for Open Media (AOM). Contained by Matroska , WebM , ISOBMFF , RTP (WebRTC).

Av1 is better than H265

  • (+) AV1 is royalty free and overcomes the patent complexities around H265/HVEC

Applications

  • Video transmission over internet , voip , multi conference
  • Virtual / Augmented reality
  • self driving cars streaming
  • intended for use in HTML5 web video and WebRTC together with the Opus audio format

SVC ( scalable video encoding)

SVC standardises the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. Asubset bitsteam can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal ( dropped packet) compared to the base stream it is derieved from.

  • Temporal (frame rate) scalability is enabled through structuring motion compensation dependencie so that complete pictures (i.e. their associated packets) can be dropped from the bitstream.
  • Spatial (picture size) scalability is enabled with video coded at multiple spatial resolutions
  • SNR/Quality/Fidelity scalability: video is coded at a single spatial resolution but at different qualities. The data and decoded samples of lower qualities can be used to predict data or samples of higher qualities in order to reduce the bit rate to code the higher qualities.
  • Combined scalability: a combination of the 3 scalability modalities described above.

Not all codecs sypport all modes. While the Av1 and VP9 support majority of modes defined in the table, VP8 only supports temporal scalability (e.g. “L1T2”, “L1T3”);H.264/SVC supports both temporal and spatial scalability but only permits transport of simulcast on distinct SSRCs.

Vp8

{
      "clockRate": 90000,
      "mimeType": "video/VP8",
      "scalabilityModes": [
        "L1T1",
        "L1T2",
        "L1T3"
      ]
    },
 const stream = await navigator.mediaDevices.getUserMedia(constraints);
    selfView.srcObject = stream;
    pc.addTransceiver(stream.getAudioTracks()[0], {direction: 'sendonly'});
    pc.addTransceiver(stream.getVideoTracks()[0], {
      direction: 'sendonly',
      sendEncodings: [
        {rid: 'q', scaleResolutionDownBy: 4.0, scalabilityMode: 'L1T3'}
        {rid: 'h', scaleResolutionDownBy: 2.0, scalabilityMode: 'L1T3'},
        {rid: 'f', scalabilityMode: 'L1T3'},
      ]    
    });


References :

Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

As the title of this article suggests I am going to pen my attempts of streaming / broadcasting Live Video WebRTC call to non WebRTC supported browsers and media players such as VLC , ffplay , default video player in Linux etc.

Some of the high level archietctures for streaming Webrtc Video to multiple endpoints can be viewed in the post below.

Aim : I will be attempting to create a lightweight WebRTC to raw/h264 transcoder by making my own media engine which takes input from WebRTC peerconnection or getusermedia. I am sharing my past experiments in hope of helping someone whose objective may be to acheive the same since many non webrtc supported endpoints ( Rpi , kisosks , mobile browsers ) could benifit heavily from webrtc streaming . Even if your objective is not the same as mine, you may gain some insigh on what not to do when making a media transcoder.


Attempt 1 : use one to many brodcasting API in js

<table class=”visible”> 
<tr> 
<td style=”text-align: right;”> 
<input type=”text” id=”conference-name” placeholder=”Broadcast Name”> </td> 
<td> <select id=”broadcasting-option”> <option>Audio + Video</option> <option>Only Audio</option> <option>Screen</option> </select> </td> 
<td> <button id=”start-conferencing”>Start Broadcasting</button> </td> </tr> 
</table> 
<table id=”rooms-list” class=”visible”></table> 
<div id=”participants”></div> 
http://”RTCPeerConnection-v1.5.js” http://”firebase.js” 
http://”broadcast.js” 
http://”broadcast-ui.js”

It uses API fromwebrtc-experiment.com. The broadcast is in one direction only where the viewrs are never asked for their mic / webcam permission .

problem : The broadcast is for WebRTC browsers only and doesnt support non webrtc players / browsers

Attempt 1.1: Stream the media directly to nodejs through websocke

window.addEventListener('DOMContentLoaded', function () {

    var v = document.getElementById('v');
    navigator.getUserMedia = (navigator.getUserMedia ||
        navigator.webkitGetUserMedia ||
        navigator.mozGetUserMedia ||
        navigator.msGetUserMedia);

    if (navigator.getUserMedia) {
// Request access to video only
        navigator.getUserMedia(
            {
                video: true,
                audio: false
            },
            function (stream) {
                var url = window.URL || window.webkitURL;
                v.src = url ? url.createObjectURL(stream) : stream;
                v.play();

                var ws = new WebSocket('ws://localhost:3000', 'echo-protocol');
                waitForSocketConnection(ws, function () {

                    console.log(" url.createObjectURL(stream)-----", url.createObjectURL(stream))
                    ws.send(stream);

                    console.log("message sent!!!");
                });

            },
            function (error) {
                alert('Something went wrong. (error code ' + error.code + ')');
                return;
            }
        );
    } else {
        alert('Sorry, the browser you are using doesn\'t support getUserMedia');
        return;
    }
});

//Make the function wait until the connection is made...
function waitForSocketConnection(socket, callback) {
    setTimeout(
        function () {
            if (socket.readyState === 1) {
                console.log("Connection is made")
                if (callback != null) {
                    callback();
                }
                return;

            } else {
                console.log("wait for connection...")
                waitForSocketConnection(socket, callback);
            }

        }, 5); // wait 5 milisecond for the connection...
}

Problem : The video is in form of buffer and doesnot play

Attempt 2: Record the WebRTC media ( 5 secs each ) into chunks of webm format->  transfer them to other end -> append the chunks together like a regular file 

This process involved the following components :

  • Recorder Javascript library : RecordJs
  • Transfer mechanism : Record using RecordRTC.js -> send to other end for media server -> stitching together the small webm files into big one at runtime and play
  • Programs :

Code for video recorder

navigator.getUserMedia(videoConstraints, function (stream) {

    video.onloadedmetadata = function () {
        video.width = 320;
        video.height = 240;

        var options = {
            type: isRecordVideo ? 'video' : 'gif',
            video: video,
            canvas: {
                width: canvasWidth_input.value,
                height: canvasHeight_input.value
            }
        };

        recorder = window.RecordRTC(stream, options);
        recorder.startRecording();
    };
    video.src = URL.createObjectURL(stream);
}, function () {
    if (document.getElementById('record-screen').checked) {
        if (location.protocol === 'http:')
            alert('https is mandatory to capture screen.');
        else
            alert('Multi-capturing of screen is not allowed.Have you enabled flag: "Enable screen capture support in getUserMedia"?');
    } else
        alert('Webcam access is denied.');
});

Code for video append-er

var FILE1 = '1.webm';
var FILE2 = '2.webm';
var FILE3 = '3.webm';
var FILE4 = '4.webm';
var FILE5 = '5.webm';

var NUM_CHUNKS = 5;
var video = document.querySelector('video');

window.MediaSource = window.MediaSource || window.WebKitMediaSource;
if (!!!window.MediaSource) {
    alert('MediaSource API is not available');
}

var mediaSource = new MediaSource();
video.src = window.URL.createObjectURL(mediaSource);

function callback(e) {
    var sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vorbis,vp8"');
    GET(FILE1, function (uInt8Array) {
        var file = new Blob([uInt8Array], {type: 'video/webm'});
        var i = 1;
        (function readChunk_(i) {
            var reader = new FileReader();
            reader.onload = function (e) {
                sourceBuffer.appendBuffer(new Uint8Array(e.target.result));
                if (i == NUM_CHUNKS) mediaSource.endOfStream();
                else {
                    if (video.paused) {
                        video.play(); // Start playing after 1st chunk is appended.
                    }
                    readChunk_(++i);
                }
            };
            reader.readAsArrayBuffer(file);
        })(i); // Start the recursive call by self calling.
    });
}

mediaSource.addEventListener('sourceopen', callback, false);
mediaSource.addEventListener('webkitsourceopen', callback, false);
mediaSource.addEventListener('webkitsourceended', function (e) {
    logger.log('mediaSource readyState: ' + this.readyState);
}, false);

// function get the video via XHR
function GET(url, callback) {
    var xhr = new XMLHttpRequest();
    xhr.open('GET', url, true);
    xhr.responseType = 'arraybuffer';
    xhr.send();
    xhr.onload = function (e) {
        if (xhr.status != 200) {
            alert("Unexpected status code " + xhr.status + " for " + url);
            return false;
        }
        callback(new Uint8Array(xhr.response));
    };
}

Shortcoming of this approach

  1. The webm files failed to play on most of the media players
  2. The recorder can only either record video or audio file at a time .

Attempt 2.Chunking and media proxy

Since the previous approach failed to support on webrtc endpoinst , the next iteration of this approach was to channel the webrtc media via a nodejs server thus disrupting the peer to peer media strem in favour of centralized / proxied emdia stream. This would enable me to obtain raw media packets form teh stream using low level C based vp8 decoder libraries and then re encode them to h364 or other media formats suitable for endpoints .

In theory media could be reencoded jusing openH264 library and the frame could be then send to players

let mediaSource = new MediaSource();
let sourceBuffer = mediaSource.addSourceBuffer('video/mp4; codecs=vp9', 
    new VP9Decoder());
let buffer = await loadBuffer();
sourceBuffer.appendBuffer(buffer);

Further extending for uncompressed video

let mediaSource = new MediaSource();
let sourceBuffer = mediaSource.addSourceBuffer('video/raw; codecs=yuv420p');
for (let p in demuxPAckets()) {
    let frame = await codec.decode(p);
    sourceBuffer.appendBuffer(frame);
}

Atleast that was the plan .

Attempt 2.1:  Record the WebRTC media ( 5 secs each ) into chunks of webm format ( RecordRTC.js) >  Use Kurento JS script ( kws-media-api,js) to make a HTTP Endpoint to recorded Webm files  -> append the chunks together like a regular file at runtime 

// UI elements
function getByID(id) {
    return document.getElementById(id);
}

var recordAudio = getByID('record-audio'),
    recordVideo = getByID('record-video'),
    stopRecordingAudio = getByID('stop-recording-audio'),
    stopRecordingVideo = getByID('stop-recording-video'),
    broadcasting = getByID('broadcasting');

var canvasWidth_input = getByID('canvas-width-input'),
    canvasHeight_input = getByID('canvas-height-input');

var video = getByID('video');
var audio = getByID('audio');

// Audio video constraints
var videoConstraints = {
    audio: false,
    video: {
        mandatory: {},
        optional: []
    }
};

var audioConstraints = {
    audio: true,
    video: false
};

// Recording and stop recording - to be convrted into real time capture and chunking 
const ws_uri = 'ws://localhost:8888/kurento';
var URL_SMALL = "http://localhost:8080/streamtomp4/approach1/5561840332.webm";

var audioStream;
var recorder;

recordAudio.onclick = function () {
    if (!audioStream)
        navigator.getUserMedia(audioConstraints, function (stream) {
            if (window.IsChrome) stream = new window.MediaStream(stream.getAudioTracks());
            audioStream = stream;
            audio.src = URL.createObjectURL(audioStream);
            audio.muted = true;
            audio.play();
            // "audio" is a default type
            recorder = window.RecordRTC(stream, {
                type: 'audio'
            });
            recorder.startRecording();
        }, function () {
        });
    else {
        audio.src = URL.createObjectURL(audioStream);
        audio.muted = true;
        audio.play();
        if (recorder) recorder.startRecording();
    }
    window.isAudio = true;
    this.disabled = true;
    stopRecordingAudio.disabled = false;
};

Recording and stop recording video inot small media files ( chunks )

recordVideo.onclick = function () {
    recordVideoOrGIF(true);
};
stopRecordingAudio.onclick = function () {
    this.disabled = true;
    recordAudio.disabled = false;
    audio.src = '';

    if (recorder)
        recorder.stopRecording(function (url) {
            audio.src = url;
            audio.muted = false;
            audio.play();

            document.getElementById('audio-url-preview').innerHTML = '&amp;amp;amp;amp;lt;a href="' + url + '" target="_blank"&amp;amp;amp;amp;gt;Recorded Audio URL&amp;amp;amp;amp;lt;/a&amp;amp;amp;amp;gt;';
        });
};
function recordVideoOrGIF(isRecordVideo) {
    navigator.getUserMedia(videoConstraints, function (stream) {

        video.onloadedmetadata = function () {
            video.width = 320;
            video.height = 240;

            var options = {
                type: isRecordVideo ? 'video' : 'gif',
                video: video,
                canvas: {
                    width: canvasWidth_input.value,
                    height: canvasHeight_input.value
                }
            };

            recorder = window.RecordRTC(stream, options);
            recorder.startRecording();
        };
        video.src = URL.createObjectURL(stream);
    }, function () {
        if (document.getElementById('record-screen').checked) {
            if (location.protocol === 'http:')
                alert('&amp;amp;amp;amp;lt;https&amp;amp;amp;amp;gt; is mandatory to capture screen.');
            else
                alert('Multi-capturing of screen is not allowed. Capturing process is denied. Are you enabled flag: "Enable screen capture support in getUserMedia"?');
        } else
            alert('Webcam access is denied.');
    });

    window.isAudio = false;

    if (isRecordVideo) {
        recordVideo.disabled = true;
        stopRecordingVideo.disabled = false;
    } else {
        recordGIF.disabled = true;
        stopRecordingGIF.disabled = false;
    }
}

stopRecordingVideo.onclick = function () {
    this.disabled = true;
    recordVideo.disabled = false;

    if (recorder)
        recorder.stopRecording(function (url) {
            video.src = url;
            video.play();
            document.getElementById('video-url-preview').innerHTML = '&amp;amp;amp;amp;lt;a href="' + url + '" target="_blank"&amp;amp;amp;amp;gt;Recorded Video URL&amp;amp;amp;amp;lt;/a&amp;amp;amp;amp;gt;';

        });
};

Broadcasting the chunks to media engine

function onerror(error) {
    console.log(" error occured");
    console.error(error);
}

broadcast.onclick = function () {
var videoOutput = document.getElementById("videoOutput");
KwsMedia(ws_uri, function (error, kwsMedia) {
    if (error) return onerror(error);
    // Create pipeline
    kwsMedia.create('MediaPipeline', function (error, pipeline) {
        if (error) return onerror(error);
        // Create pipeline media elements (endpoints &amp;amp;amp;amp;amp; filters)
        pipeline.create('PlayerEndpoint', {uri: URL_SMALL}, function (error, player) {
                if (error) return console.error(error);

                pipeline.create('HttpGetEndpoint', function (error, httpGet) {
                    if (error) return onerror(error);
                    // Connect media element between them
                    player.connect(httpGet, function (error, pipeline) {
                        if (error) return onerror(error);
                        // Set the video on the video tag
                        httpGet.getUrl(function (error, url) {
                            if (error) return onerror(error);
                            videoOutput.src = url;
                            console.log(url);
                            // Start player
                            player.play(function (error) {
                                if (error) return onerror(error);
                                console.log('player.play');
                            });
                        });
                    });

                    // Subscribe to HttpGetEndpoint EOS event
                    httpGet.on('EndOfStream', function (event) {
                        console.log("EndOfStream event:", event);
                    });
                });
            });
    });
}, onerror);
}

problem : dissecting the live video into small the files and appending to each other on reception is an expensive , time and resource consuming process . Also involves heavy buffering and other problems pertaining to real-time streaming .

Attempt 2.2 : Send the recorded chunks of webm to a port on linux server. Use socket programming to pick up these individual files and play using  VLC player from UDP port of the Linux Server

Screenshot from 2015-01-22 15:32:51

End Result : Small file containers play but slow buffering makes this approach non compatible for streaming files chunks and appending as single file.

Attempt 2.3: Send the recorded chunks of webm to a port on linux server socket . Use socket programming to pick up these individual webm files and convert to H264 format so that they can be send to a media server. 

This process involved the following components :

  • Recorder Javascript library : RecordJs
  • Transfer mechanism :WebRTC endpoint -> Call handler ( Record in chunks ) -> ffmpeg / gstreamer to put it on RTP -> streaming server like wowza – > viewers
  • Programs : Use HTML webpage Webscoket connection -> nodejs program to write content from websocket to linux socket -> nodejs program to read that socket and print the content on console

Snippet to transfer the webm recorder files over websocket to nodejs program

// Make the function wait until the connection is made.
function waitForSocketConnection(socket, callback) {
    setTimeout(
        function () {
            if (socket.readyState === 1) {
                console.log("Connection is made")
                if (callback != null) 
                    callback();
            } else {
                console.log("wait for connection...")
                waitForSocketConnection(socket, callback);
            }
        }, 5); // wait 5 milisecond for the connection...
}

function previewFile() {
    var preview = document.querySelector('img');
    var file = document.querySelector('input[type=file]').files[0];
    var reader = new FileReader();

    reader.onloadend = function () {
        preview.src = reader.result;
        console.log(" reader result ", reader.result);

        var video = document.getElementById("v");
        video.src = reader.result;
        console.log(" video played ");

        var ws = new WebSocket('ws://localhost:3000', 'echo-protocol');
        waitForSocketConnection(ws, function () {
            ws.send(reader.result);
            console.log("message sent!!!");
        });

    }

    if (file) {
        // converts to base64 encoded string of the file data
        //reader.readAsDataURL(file);
        reader.readAsBinaryString(file);
    } else {
        preview.src = "";
    }
}

Program for Linux Sockets sender which creates the socket for the webm files in nodejs

var net = require('net');
var fs = require('fs');
var socketPath = '/tmp/tfxsocket';
var http = require('http');
var stream = require('stream');
var util = require('util');

var WebSocketServer = require('ws').Server;
var port = 3000;
var serverUrl = "localhost";

var socket;
/*----------http server -----------*/
var server = http.createServer(function (request, response) {});
server.listen(port, serverUrl);
console.log('HTTP Server running at ', serverUrl, port);

/*------websocket server ----------*/
var wss = new WebSocketServer({server: server});

wss.on("connection", function (ws) {
    console.log("websocket connection open");
    ws.on('message', function (message) {
        console.log(" stream recived from broadcast client on port 3000 ");
        var s = require('net').Socket();
        s.connect(socketPath);
        s.write(message);
        console.log(" send the stream to socketPath", socketPath);
    });

    ws.on("close", function () {
        console.log("websocket connection close")
    });
});

Program for Linux Socket Listener using nodejs and socket . Here the socket is in node /tmp/mysocket

var net = require('net');
var client = net.createConnection("/tmp/mysocket");
client.on("connect", function() {
    console.log("connected to mysocket");
});
client.on("data", function(data) {
    console.log(data);
});
client.on('end', function() {
    console.log('server disconnected');
});

Output 1: Video Buffer displayed

Screenshot from 2015-01-22 15:35:06 (copy)

Output 2 : Payload from Video displayed that shows the pipeline works but no output yet.

Screenshot from 2015-01-23 12:57:35

ffmpeg format of transfering the content from socket to UDP IP and port

ffmpeg -i unix://tmp/mysocket -f format udp://192.168.0.119:8083

problems of this approach : The video was on a passing stage from the socket and contained no information as such when tried to play / show console.


Attempt 3 : Use existing media engine like kurento to do the transocding for me

Send the live WebRTC stream from Kurento WebRTC endpoint to Kurento HTTP endpoint then play using Mozilla VLC web plugin

VLC mozilla plugin can be embedded by :

name="video2"
autoplay="yes" loop="no" hidden="no"
target="rtp://@192.165.0.119:8086" />

screenshot of failure on part of Mozilla VLC plugin to play from a WebRTC endpoint

Screenshot from 2015-01-29 10:37:06
Screenshot from 2015-01-29 10:37:17
Screenshot from 2015-01-29 12:06:14

problem : VLC mozilla plugin was unable to play the video and mozilla playback only was a difficult optio for most consumers .

Contnued on next article : continue : Streaming / broadcasting Live Video call to non webrtc supported browsers and media players

More article on simmilar topics