Fault Tolerance and Error Correction in WebRTC

  1. Fluctuating Networks
    1. Dynamic Bandwidth estimation
    2. JitterBuffer
    3. SDP renegotiation
  2. Demand for High Quality Video
    1. Tradeoff between Latency vs Quality
    2. Layering for adaptive streaming
    3. Better compression algorithms vs CPU compute
    4. Full INTRA-frame Request (FIR)
    5. Picture Loss Indication (PLI)
    6. Redundant Encoding (RED) in Media Packets
  3. Congestion
    1. Feedback Loop
    2. Overcome congestion with lower bitrate
    3. Reduce frame quality and resolution
    4. Congestion control Algorithms : Google Congestion Control ( GCC)
  4. Low Network Strength and High Packet Loss
    1. Recovering Lost packets
    2. Acknowledgement to identify packet loss
    3. Forward Error Correction (FEC)
  5. Long distance Calls and High Round Trip Time
    1. Using Receiver reports and Sender Reports from RTCP to adjust to network conditions
  6. Low Latency Media Streaming
    1. Measuring latency
  7. NTP Synchronization of Audio Video Sync
  8. Demand for higher security on WebRTC’s CPaaS
    1. End to End Encryption
    2. Minimize Public-private mapping pairs vai RTCP-mux

Fluctuating Networks

WebRTC has build in capabilities to detect network glitches and adapt itself to changing situations. Some of the methodologies used are listed below.

Dynamic Bandwidth estimation

Bandwidth are dependent on network strength and is affected by the other users on the network. Under hetrogenious network conditions Bandwidth estimation is a critical step to improve call quality and end user exeprince.

JitterBuffer

An unreliable network / fluctiating one will cause some packets to be delivered on time and some to be delayed more thn others, causing them to come in bursts. JitterBuffer is an effective methodology for Jitter management which ensures a steady delivery of apckets even when the peers transmit at flucting rates.

A jitter buffer is a buffer that consumes packets as soon as they arrive and keep them untill the frame can be fully reconstructed. At the point when all apckets have bee filled in buffer ( in any order ) it emiits it for decoding which the play can playback to user. Note that serveral RTP packet can have the same timestamp is they are part of the same video frame.

  • (+) dynamically manages unordered packets and reconstrcts a frame after accumulating all packets
  • (-) can introduce latency for packets that arrive early
  • (-) Need active resisizing by means of feedback
    • for hi speed and goog network jitterbuffer can ve small sized
    • for congested and disruptive networks it is better to keep a longer buffer which can also add some latency
  • (-) buffer has limited capacity so the packet can expire if not received within a duration “jitterBufferDealy”.

SDP renegotiation

-TBD

Demand for High Quality Video

Applications telehealth, advertising or broadcasting on WebRTC media streams

Tradeoff between Latency vs Quality

Reduced resolution, framerate, bit rate are effective for congestion control however not suited to the case of High defintaion video conferecing such as gaming , telehealth of broadcast of concert as it may hinder with user experience.

Layering for adaptive streaming

using the I-frame , P-frame and B frame efficiently in the codec combines with predictive machine learning models make packet loss unnoticible to the human eye. Marker ( M bit) in the RTP packet structure marks keyframes.

  • (-) more complex compression algorithms

Better compression algorithms vs CPU compute

A better performing compression algorithm produces fewer bits to encode the same video quality as its predecessor.

  • (-) Higher performing compression engines most always has higher energy consumption and carbon footprint
  • (+) resilent to network fluctuations

Full INTRA-frame Request (FIR)

Requests a key frame to decode the frame. Can be used when a new peer joins the conference a key frane is required to start decoding its video strea,.

Picture Loss Indication (PLI)

Partial frames given to decoder are unprocessable, then PLI message is send to the sender. As the sender receives pli message it will produce new I-frames to help the reciver decore the frames.

a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
FIRPIL
request a full key frame from the sender , when new memeber enters the session.request a full key frame from the sender, when partial frames were given to the decoder, but it was unable to decode them
causes of making PLI request could be decoder crash or heavy loss

Redundant Encoding (RED) in Media Packets

Recovers packet loss under lossy networks by adding extra bits of information in following packets.

  • (+) good for unpredictable networks

LBRR ( low bit-rate redundancy) – tbd

Congestion

Congestion is created when a network path has reached its maximum limits which could be due to

  • failures(switches, routers, cables, fibres ..)
  • over subscription and operating at peak bandwidth.
  • broadcast storms
  • Inapt BGP routing and congestion detection
  • BGP is responsisble for finiding the shortest routable path for a packet

The direct consequences of congestion for any network transport can be

  • High Latency
  • Connection Timeouts
  • Low throughput
  • Packet loss
  • Queueing delay

With respect to WebRTC streams too, if a network has congestion, the buffer will overflow and packets will be droppped. Due to excessive dropping of packets both transmission time and jitter increases.To overcome this adaptive buffereing is used as jitter increases or decreases.

Feedback Loop

A congestion notifier and detection algorithm can analyze the RTCP metrics for possible congestion in the network route and suggest options to overcome it. Part of Adaptive Bitrate and Bandwidth Estimation process.

Overcome congestion with lower bitrate

Rate limiting the sending information is one way to overcome congestion, even though it could lead to bad call quality at the reciver’s end and non typical for realtime communciation systems

Reduce frame quality and resolution

Full HD constraint
vga constraints

Congestion control Algorithms : Google Congestion Control ( GCC)

Bandwidth estimation and congestion control are ofetn paird in as a operational unit. Primarily packet loss and inter packet arrival times drives the bandwidth estimation and enable GCC to flagcongestion.

  • On the receiver side TMMBR/TMMBN (Temporary Maximum Media Stream Bit Rate Request/Notification) and REMB(Receiver Estimated Maximum Bitrate ) exchange the bandwodth estimates.
  • On the sender side TWCC(Transport wide congestion control) can be used.

Other congestion control algorithms

  • QUIC Loss Detection and Congestion Control RFC 9002
  • Coupled Congestion Control for RTP Media rfc8699
  • NADA: A Unified Congestion Control Scheme for Real-Time Media – Network Working group
  • Self-Clocked Rate Adaptation for Multimedia RMCAT WG
  • SCReAM – Mobile optimised congestion control algorithm by Ericson

Low Network Strength and High Packet Loss

Packet loss is the loss of packets in transmission which could be owing to

  • network resources and path
  • transmission medium congestion
  • applications inability to absord delayed packets.
  • Maximum Transmission Unit size : measure of how large a single apcket can be.

Recovering Lost packets

High definition video stream requires low/no packet loss and fast recovery if any. RTP intrinsically has no means for recovering packet loss. Instead, low bit rate redundancy can be added to packets themselves to make up for any loss. Retransmission of lost packets can be a feature developed over RTP using sequence numbers head in RTP.

Acknowledgement to identify packet loss

A receiver can notifiy the sender of the possible concerns around packet loss by means of sendings acks.

  • Selective Acknowledgement (SACK) : notifies the sender of multiple packets and thereby indicating gaps
  • Negative Acknowledgements (NACK ) : notifies the sender of packets lost
    • RTCP Packet Type 193 denotes NACK.
  • (+) higher NACK count is suggestive of high packet loss
  • (-) round trip time for NACK to send and waiting for packet to be retransmitted and receive in response can cause significant delay

Forward Error Correction (FEC)

The sender proactively send redundant data such that lost packets dont affact the stream on receiver’s end.

  • (+) receiver doesnt have to request for exgtra data to be sent , the sender does it by itself at RTP level
  • (+) less delay than NACK which incurs round trip time
  • (-) involve extra bandwidth.

Long distance Calls and High Round Trip Time

Geographical distances can add significant delay in Transmission time.Transmission time is an important metric in the Call Quality analysis however calculating transmission time as sthe different of timestamp of sending and timestamp of receiving requires perfect sync of systems clock which is unreliable.

transmission_time = timestamp_send - timestamp_receive

For this reason RTT( Round Trip Time)is a better means to avoid clock synchoronization errors.

transmission_time = rtt /2 

Using Receiver reports and Sender Reports from RTCP to adjust to network conditions

Sender and receiver reports (SR and RR) provide a highlight of the connection and media quality streaming on this connection.

RTCP Senders report for WebRTC media stream
RTCP receivers report for WebRTC media stream

Low Latency Media Streaming

Latency is calculated from getting user media encoding transmission , network delays , buffering , decoding and playback. There are many factors involved in latency management such as queing delays , media path, CPU utilization etc.

Optimize Compute resource

  • mobile agents have lesser computative power
  • Camera with features such as auto focus or other adjustments will taker more time to cappture
  • network should be of suited bandwidth and strength

Reduce information to be encoded and sent

  • Subject focus and blurring backgroud
  • Filtering noise at source
  • Voice Activity Detection (VAD)
    • send extra data in FEC only is there is voice activity detected in packet
  • Echo Cancellation

Measuring latency

Since we know that synchorinizaing clocks in distributed systems is a tough task and mostly avoided by wither using NTP or using other means of synchronization

NTP Synchronization of Audio Video Sync

During the buffereng of incoming [ackets ( which canrage from few ten of miliseconds to few hundred milisecond ) the streams are synchronized.

Time used by RTP for sync is NTP and RTP based ( which are not required to be in sync).

  • NTP Timestamp : 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. Formatted as fractional seconds since Jan 1, 1900
  • RTP Timestamp : RTP timestamp corresponds to the same instant as the NTP timestamp. Expressed in the units of the RTP media clock.
    • Majority of video formats use a 90kHz clock.
    • For receiver to sync audio and video streams these two streasm must be from same clock
Frame 300: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface 0 (outbound)
....
Packet type: Sender Report (200)
Length: 6 (28 bytes)
Sender SSRC: 0x39a659b4 (967203252)
Timestamp, MSW: 3855754463 (0xe5d224df)
Timestamp, LSW: 2364654374 (0x8cf1c326)
[MSW and LSW as NTP timestamp: Mar 8, 2022 18:54:23.550563999 UTC]
RTP timestamp: 1110449770

Demand for higher security on WebRTC’s CPaaS

Webrtc uses Stream Control Transmission Protocol (SCTP) over DTLS connection as an alternative to TCP and UDP.

Features :

  • multihoming : one or both endpoints of a connection can consist of more than one IP address. This enables transparent failover between redundant network paths
  • Multistreaming transmit several independent streams of chunks in parallel
  • SCTP has similarities to TCP retransmission and partial reliability like UDP.
  • Heartbest to keep connection alive with exponential backoff if packet hasnt arrived.
  • Validation and acknowledgment mechanisms protect against flooding attack

SCTP frames data as datagrams and not as a byte stream

  • (+) SCTP enables WebRTC to be multiplexing
  • (+) It has flow control and congestion avoidance support
  • (+)  

End to End Encryption

End to end encryption model of WebRTC is a good defence to MIM ( man in middle ) attacks howver it is not yet 100% foolproof. I discussed more security loopholes and concerns in WebRTC and Realtime communication platfroms in this article WebRTC App and webpage Security.

Minimize Public-private mapping pairs vai RTCP-mux

Traditionally 2 separte ports for RTP aand RTCP were used in SIP / RTP based realtime communications systems. Thus demultiplexisng of the traffic of these data streams is peformed at the transport later.

With rtcp-mux the NAT tarversal si simplified as onlya single port is used for media and control messages .

  • (+) easier to manage security by gathering ICE candidates for a single port only instead of 2
  • (+) increases the systesm capacity for media session using the same number of ports
  • (+) further simplified using BUNDLE as all media session and their control messages flow on the same port .
  • WebRTC has rtcp-mux capabilities thus simplifying the ICE candidate pairing

References :

RTCP Reports and QoE metric calculation

RTCP works alongside RTP to monitor and control media streams with QoS feedback, synchronization and session management . This writeup describes the key format and functions of this protocol

  1. RTCP (Real-Time Transport Control Protocol )
    1. RTCP Control and Management
    2. Gathers statistics on media connection
    3. SR: Sender Report RTCP Packet
    4. RR: Receiver Report RTCP Packet
    5. SDES: Source Description RTCP Packet
    6. BYE: Goodbye RTCP Packet
    7. APP: Application-Defined RTCP Packet
  2. RTCP XR (Extended Reports) 
  3. Extended RTP Profile for RTCP Based Feedback (RTP/AVPF)
  4. RTCP operation modes
  5.  RTCP for multicast sessions with unicast feedback
  6. RTCP Extensions for Multiplexed Media Streams

    RTCP (Real-Time Transport Control Protocol )

    Real-time Transport Control Protocol (RTCP) defined in RFC 3550, is used to send control packets and feedback on QoS to participants in a call along with RTP which sends actual media packets. RTCP provides monitoring of the data delivery, qos in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.

    RTCP is typically on port RTP+1, e.g., RTP=5004 → RTCP=5005.
    Also RTCP uses 5% of total session bandwidth (RTP uses 95%). One can also adjust RTCP report intervals to avoid congestion.

    RTCP Control and Management

    RTCP provides feedback on the quality of the data distribution, congestion control, fault diagnosis, control of adaptive encoding. It is a periodic transmission of control packets. Since the control traffic is not self-limiting, the RR (Receiver Reports) from participants should be rate adjusted to limit traffic to the sender thus we observe for number of participants to estimate RTCP Transmission Interval for scaling up. This allows the communication system to monitor multimedia delivered on large multicast networks with hundreds of receivers. The underlying protocol must provide multiplexing of the data and control packets to convey minimal session control information such as Bytes sent, packets sent, lost packets, jitter, feedback and round trip delay.

    Gathers statistics on media connection

    Some metrics gathered by RTCP reports are :

    • timestamps
      • tp : last RTCP transmit time 
      • tr : curr time
      • tn : next schedule RTCP transmission time 
      • pmembers : last estimated count of members 
      • members: a current estimate of the number of session members 
      • senders 
      • rtcp_bw
      • avg_rtcp_size
    • flags as 
      • initial 
      • we_sent if the participant is sender ie has sent an RTP packet 
    • constants 
      • n is set to the number of receivers = members – senders
      • C : If sender then C =avg_rtcp_size / 25% of rtcp_bw else C=avg_rtcp_size / 75% of rtcp_bw

    Time Intervals calculation should be random but should give at least 25 of bw to senders and if sender >25% of members then split equally. This is done so that it is uniformly distributed and should avoid unintended synchronization or burst of RTCP packets to the sender. 

    • step 1 : Tmin=2.5 seconds if not yet sent an RTCP packet, else Tmin=5 seconds.
    • step 2 : Td = max(Tmin, n*C)
    • step 3 : T = 0.5 or 1.5 times Td.  
    • step 4 : resulting T is divided by e-3/2=1.21828 to compensate for the fact that the timer reconsideration algorithm converges to a value of the RTCP bandwidth below the intended average.

    Application may use this information to increase the quality of service, perhaps by limiting flow or using a different codec. 

    RTCP often uses the next consecutive port( odd number) as RTP( even number). Example screenshot shows port 20720 for RTP

    And next consecutive port 20721 for RTCP

    When RTCP is not being used or the CNAME identifier corresponding to a synchronization source has not been received yet, the participant associated with a synchronization source is not known.

    • (+) RTCP helps in monitoring the quality of service for every session
    • (+) RTCP sender and receiver reports allow the implementation of adaptive streaming where senders scale their bandwidth consumption based on network load.
    • (+) RCP SDES contains additional information like CNAME which helps in tracing of troublesome multimedia sources.( via email , phone number etc )

    Types of RTCP packet

    1. SR: Sender report, for transmission and reception statistics from participants that are active senders
    2. RR: Receiver report, for reception statistics from participants that are not active senders and in combination with SR for active senders reporting on more than 31 sources
    3. SDES: Source description items, including CNAME,email or phone
    4. BYE: Indicates end of participation
    5. APP: Application-specific functions

    SR is issued if a site has sent any data packets during the interval since issuing the last report or the previous one, otherwise the RR is issued.

    SR: Sender Report RTCP Packet

    Sender Report RTCP Packet.

    Expanded Sender Report RTCP Packet has sender information is 20 octets long and is present in every sender report packet. It summarizes the data transmissions from this sender.

    • NTP timestamp 64 bit
    • RTP Timestamp 32 bit
    • sender’s packet count: 32 bits, total number of RTP data packets transmitted by the sender since starting transmission up until the time this SR packet was generated.
    • sender’s octet count: 32 bits
    SR Report in RTCP

    Explanation for some attributes

    • highest sequence number received: 32 bits
    • fraction lost: 8 bits, fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent
    • cumulative number of packets lost: 24 bits size, total number of RTP data packets from source SSRC_n that have been lost since the beginning of reception.
    • interarrival jitter: 32 bits, estimate of the statistical variance of the RTP data packet interarrival time, measured in timestamp unit. Jitter J is mean deviation (smoothed absolute value) of the difference D is packet spacing at the receiver compared to the sender for a pair of packets.
    RTCP SR ( Senders Report)

    Synchronization and exposing delays using RTCP : For multimedia conferences the NTP timestamp from RTCP SR is used to give a common time reference that can associate these independent timestamps with a wall clock shared time. the NTP timestamps also help the endpoints measure their delays.

    RR: Receiver Report RTCP Packet

    Snapshot

    SDES: Source Description RTCP Packet

    abbrev. name value

    • END end of SDES list 0
    • CNAME canonical name 1
    • NAME user name 2
    • EMAIL user’s electronic mail address 3
    • PHONE user’s phone number 4
    • LOC geographic user location 5
    • TOOL name of application or tool 6
    • NOTE notice about the source 7
    • PRIV private extensions 8

    BYE: Goodbye RTCP Packet

    APP: Application-Defined RTCP Packet

    Intended for experimental use

    Instance of RTCP sender and receiver reports on transmission and reception statistics

    Real-time Transport Control Protocol (Receiver Report)
        [Stream setup by SDP (frame 4)]
            [Setup frame: 4]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 0001 = Reception report count: 1
        Packet type: Receiver Report (201)
        Length: 7 (32 bytes)
        Sender SSRC: 0x796dd0d6 (2037240022)
        Source 1
            Identifier: 0x00000000 (0)
            SSRC contents
                Fraction lost: 0 / 256
                Cumulative number of packets lost: 1
            Extended highest sequence number received: 6534
                Sequence number cycles count: 0
                Highest sequence number received: 6534
            Interarrival jitter: 0
            Last SR timestamp: 0 (0x00000000)
            Delay since last SR timestamp: 0 (0 milliseconds)
    Real-time Transport Control Protocol (Source description)
        [Stream setup by SDP (frame 4)]
            [Setup frame: 4]
            [Setup Method: SDP]
        10.. .... = Version: RFC 1889 Version (2)
        ..0. .... = Padding: False
        ...0 0001 = Source count: 1
        Packet type: Source description (202)
        Length: 6 (28 bytes)
        Chunk 1, SSRC/CSRC 0x796DD0D6
            Identifier: 0x796dd0d6 (2037240022)
            SDES items
                Type: CNAME (user and domain) (1)
                Length: 8
                Text: 796dd0d6
                Type: NOTE (note about source) (7)
                Length: 5
                Text: telecomorg
                Type: END (0)

    Negative Acknowledgment (NACK) packets can be used to explicitly indicate that packets have not been received.

    Full Intra Request (FIR) and Picture Loss Indication (PLI) packets are used for video to indicate that there is a need for the sender to produce a refresh point( key frame) in the stream.

    Receiver-Estimated Maximum Bitrate (REMB) feedback packets signal to a sender the maximum bitrate a receiver wishes to receive.

    Transport-wide Congestion Control (TCC) feedback packets are used to provide detailed packet-by-packet reception information from a receiver to the sender.

    RTCP XR (Extended Reports) 

    The purpose of the extended reporting format is to convey information that supplements the six statistics that are contained in the report blocks used by RTCP’s Sender Report (SR) and Receiver Report (RR) packets.

        0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT | type-specific | block length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : type-specific block contents :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Categories

    1. packet-by-packet reports on received or lost RTP packets

    Loss RLE Report Block (1)

        0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT=1 | rsvd. | T | block length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | SSRC of source |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | begin_seq | end_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | chunk 1 | chunk 2 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ... :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | chunk n-1 | chunk n |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Duplicate RLE Report Block ( 2)

    Packet Receipt Times Report Block (3)

        0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT=3 | rsvd. | T | block length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | SSRC of source |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | begin_seq | end_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Receipt time of packet begin_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Receipt time of packet (begin_seq + 1) mod 65536 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ... :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Receipt time of packet (end_seq - 1) mod 65536 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    1. reference time information between RTP participants

    Receiver Reference Time Report Block : Receiver-end wallclock timestamps(4)
    DLRR Report Block : delay since the last Receiver Reference Time Report Block was received (5)

    3. metrics relating to packet receipts, that are summary in nature

    Statistics summary block (6)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | BT=6 |L|D|J|ToH|rsvd.| block length = 9 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | SSRC of source |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | begin_seq | end_seq |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | lost_packets |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | dup_packets |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | min_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | max_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | mean_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | dev_jitter |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | min_ttl_or_hl | max_ttl_or_hl |mean_ttl_or_hl | dev_ttl_or_hl |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    VOIP metric report block (7)

          0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | BT=7 | reserved | block length = 8 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | SSRC of source |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | loss rate | discard rate | burst density | gap density |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | burst duration | gap duration |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | round trip delay | end system delay |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | signal level | noise level | RERL | Gmin |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | R factor | ext. R factor | MOS-LQ | MOS-CQ |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | RX config | reserved | JB nominal |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | JB maximum | JB abs max |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Extended RTP Profile for RTCP Based Feedback (RTP/AVPF)

      RTP provides continuous feedback about the overall reception quality from all receivers — thereby allowing the sender(s) in the mid-term to adapt their coding scheme and transmission behavior to the observed network quality of service (QoS).

      RTP makes no provision for timely feedback that would allow a sender to repair the media stream immediately: through retransmissions, retroactive Forward Error Correction (FEC) control, or media-specific mechanisms for some video codecs, such as reference picture selection.

      Components of RTCP based feedback

      • Status reports contained in sender report (SR)/received report (RR) packet transmitted at regular intervals . Can also contain SDES
      • FB ( Feedback ) messages . Indicate loss or reception of particular pieces of a media stream

      Types of RTCP Feedback packet

      Minimal compound RTCP feedback packet

      This is to minimize the size of the RTCP packet transmitted to convey feedback and maximize the frequency at which feedback can be provided. MUST contain only the mandatory information :

      • encryption prefix if necessary,
      • exactly one RR or SR,
      • exactly one SDES with only the CNAME item present, and
      • FB message(s)

      Full compound RTCP feedback packet

      MAY contain any additional number of RTCP packet

      RTCP operation modes

      1. Immediate Feedback mode
      2. Early RTCP mode
      3. Regular RTCP Mode

      The Application specific feedback threshold is a function of a number of parameters including (but not necessarily limited to):

      • type of feedback used (e.g., ACK vs. NACK),
      • bandwidth,
      • packet rate,
      • packet loss
      • probability and distribution,
      • media type,
      • codec,
      • (worst case or observed) frequency of events to report (e.g., frame received, packet lost).

      Payload specific Feedback messages

      Three payload-specific FB messages are defined so far plus an application layer FB message. They are identified by means of the FMT parameter as follows:

      • 0: unassigned
      • 1: Picture Loss Indication (PLI)
      • 2: Slice Loss Indication (SLI)
      • 3: Reference Picture Selection Indication (RPSI)
      • 4-14: unassigned
      • 15: Application layer FB (AFB) message
      • 16-30: unassigned
      • 31: reserved for future expansion of the sequence number space

       RTCP for multicast sessions with unicast feedback

      Single-source multicast sessions (e.g., IPTV, live streaming) where receivers provide feedback via unicast RTCP. These feedbacks can quickly overwhelm the sender as it will receive as many feedback as the number of viewers which is many folds in case of large scale deployments for example webinars.

      To mitigate this feedback implosion, sender aggregates reports instead of processing individual multicast feedback in Multicast Acquisition Report (MAR)

      a=rtcp-fb:* x-mar unicast   // Supports MAR over unicast
      a=rtcp-fb:* nack unicast // NACKs sent via unicast

      RTCP Extensions for Multiplexed Media Streams

      For multiplexed media streams , where different kinds of media share a common port, we use payload type and SSRC to distinguish streams. we can mux the rtcp too. The Offer/Answer Negotiation has the following attribute in SDP

      a=rtcp-mux
      FeaturesRTP and RTCP on different port RTP , RTCP mux
      Number of ports21
      NAT simplicity complex with per stream overhead for ICE candidates gatheringrelatively simple , only 1 pinhole

      References :

      • RFC 3611 RTP Control Protocol Extended Reports (RTCP XR)
      • RFC 4585 Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)
      • RFC 7002 RTP Control Protocol (RTCP) Extended Report (XR) Block for Discard Count Metric Reporting
      • RFC 7003 RTP Control Protocol (RTCP) Extended Report (XR) Block for Burst/Gap Discard Metric Reporting

      RealTime Transport protocol (RTP) and supporting protocols


      RTP is a protocol for delivering media stream end-to-end in real time over an IP network. Its applications include VoIP with SIP/XMPP, push to talk, WebRTC and teleconf, IOT media streaming, audio/video or simulation data, over multicast or unicast network services so on.

      RTSP provides stream control features to an RTP stream along with session management.

      RTCP is also a companion protocol to RTP, used for feedback and inter-frame synchronization.

      • Receiver Reports (RRs) include information about the packet loss, interarrival jitter, and a timestamp allowing computation of the round-trip time between the sender and receiver.
      • Sender Reports( SR) include the number of packets and bytes sent, and a pair of timestamps facilitating inter-stream synchronization.

      SRTP provides security by end-to-end encryption while SDP provides session negotiation capabilities.

      In this article I will be going over RTP and its associated protocols in depth to show the inner workings in a RTP media streaming session.

      RTP (Real-time Transport Protocol)

      RTP handles realtime multimedia transport between end to end network components . RFC 3550. RTP is extensible in headers format and simplified the application integration ( encryption , padding) and even use of proxies, mixers, translators etc.

      Image result for RTP packet structure
      Packet structure of RTP     
      Image result for RTP header structure
      RTP Header contain timestamp , name of media source , codec type and sequence number .

      RTP is independent of the underlying transport and network layers and can be described as an application layer protocol dealing with IP networks. While RTP was originally adapted from VAT( now obsolete) it was designed to be protocol independent ie it can be used with non-IP protocols like ATM, AAL5 as well as IP protocols IPV4, and IPv6. It does not address resource reservations and does not guarantee the quality of service for real-time services. However, it does provide services like payload type identification, sequence numbering, timestamping and delivery monitoring.

      RTP Packet via Wireshark
      RTP Packet Headers and position in packet

      The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence. Usage : Multimedia Multi particpant conferences, Storage of continuous data, Interactive distributed simulation, active badge, control and measurement applications.

      RTP Session

      Real-Time Transport Protocol
          [Stream setup by SDP (frame 554)]
              [Setup frame: 554]
              [Setup Method: SDP]
          10.. .... = Version: RFC 1889 Version (2)
          ..0. .... = Padding: False
          ...0 .... = Extension: False
          .... 0000 = Contributing source identifiers count: 0
          0... .... = Marker: False
          Payload type: ITU-T G.711 PCMU (0)
          Sequence number: 39644
          [Extended sequence number: 39644]
          Timestamp: 2256601824
          Synchronization Source identifier: 0x78006c62 (2013293666)
          Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

      Ordering via Timestamp (TS) and Sequence Number (SN)

      • TS ( Timestamp) used to order packets in correct timing order,
      • SN ( Sequence Number ) is used to detect packet loss

      For a video frame that spans multiple packets – TS is same but SN is different.

      Payload

      RTP payload type is a 7-bit numeric identifier that identifies a payload format. 

      Audio

      • 0 PCMU
      • 1 reserved (previously FS-1016 CELP)
      • 2 reserved (previously G721 or G726-32)
      • 3 GSM
      • 4 G723
      • 8 PCMA
      • 9 G722
      • 12 QCELP
      • 13 CN
      • 14 MPA
      • 15 G728
      • 18 G729
      • 19 reserved (previously CN)

      Video

      • 25 CELB
      • 26 JPEG
      • 28 nv
      • 31 H261
      • 32 MPV
      • 33 MP2T
      • 34 H263
      • 72-76 reserved
      • 77–95 unassigned
      • dynamic H263-1998, H263-2000
      • dynamic (or profile) H264 AVC, H264 SVC , H265theora , iLBC , PCMA-WB ( G711 a law) , PCMU-WB ( G711 u law)G718, G719, G7221, vorbis , opus , speex , VP8 , VP9, raw , ac3 , eac3,

      Note : difference between PCMA ( G711 alaw) and PCMU ( G711 u law)G.711 μ-law tends to give more resolution to higher range signals while G.711 A-law provides more quantization levels at lower signal levels.

      Dynamic Payloads

      Dynamic payload in RTP A/V Profile , unlike static ones above, are not assigned by IANA. They are assigned by means outside of the RTP profile or protocol specifications.

      Tones

      • dynamic tone
      • telephone event ( DTMF)

      These codes were initially specified in RFC 1890, “RTP Profile for Audio and Video Conferences with Minimal Control” (AVP profile), superseded by RFC 3550, and are registered as MIME types in RFC 3555.  Now registering static payload types is now considered a deprecated practice in favor of dynamic payload type negotiation.

      Session identifiers

      SSRC was designed for distinguishing several sources by labelling them differently. In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |V=2|P|X|  CC   |M|     PT      |       sequence number         |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |                           timestamp                           |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |           synchronization source (SSRC) identifier            |
         +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
         |            contributing source (CSRC) identifiers             |
         |                             ....                              |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      

      Synchronization source (SSRC) is a 32-bit numeric SSRC identifier for the source of a stream of RTP packets. This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. All packets from a synchronisation source form part of the exact timing and sequence number space, so a receiver groups packets by synchronisation source for playback.  The binding of the SSRC identifiers is provided through RTCP. If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

      Contributing source (CSRC) – A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the SSRC identifiers of the sources, called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet. An example application is – audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).

      Timestamp calculation

      Timesatmp is picked independantly from other streams in session. It is incemeneted based on packetization interval times the sampling rate. For example

      • audio 8000 Hz sampled at 20 ms, has blocks timestamps t1:160, t2:t1+160.. actual sampling may differ sligntly form this nominal rate.
      • video clock rate 90 kHz, frame rate 30 f/s would have blocks timestamps t1:3000 , t2:t1+3000.. For 25 fps t1:3600, t2:t1+3600. In some software coders timestamps can also be computer from the system clock such as gettimeofday()

      Cross media synchronization using timestamp: RTP timestamp and NTP timestamps form a pair that identify the absolute time of a particular sample in the stream.

      NTP timestamp in RTCP Sender Report SR
      Numerical timestamp in RTP packet in the same session

      UDP provides best-effort delivery of datagrams for point-to-point as well as for multicast communications.

      Threading and Queues by RTP stacks

      Reception and transmission queues handled are by the RTP stack.

      Packet Reception – Application does not directly read packets from sockets but gets them from a reception queue. RTP stack is responsuble for updating this queue.

      Packt transmission – Packets are not directly written to sockets but inserted in a transmission queue handled by the stack.

      Incoming packet queue takes care of functions such as packet reordering or filtering out duplicate packets.

      Threading model – Most libraries uses separate execution thread for each RTP session handling the queues.

      RTSP (Real-Time Streaming Protocol)

      RTSP is is Streaming Session protocol using RTP. It is also a network control protocol which uses TCP to maintain an end-to-end connection. Session protocols are actually negotiation/session establishment protocols that assist multimedia applications.

      Applications : control real-time streaming media applications such as live audio and HD video streaming.
      RTSP establishes a media session between RTSP end-points ( can be 2 RTSP media servers too) and initiates RTP streams to deliver the audio and video payload from the RTSP media servers to the clients.

      Flow for RTSP stream between client and Server

      1. Initialize the RTP stack on Server and Client – Can be done by calling the constructor for object and ind initilaizing object with arguments

      At Server

      Server rtspserver = new Server();

      At client

      Client rtsplient = new Client();

      2. Initiate TCP connection with the client and server respectively (via socket ) for the RTSP session

      At Server

      ServerSocket listenSocket = new ServerSocket(RTSPport);
      rtspserver.RTSPsocket = listenSocket.accept();
      rtspserver.ClientIPAddr = rtspserver.RTSPsocket.getInetAddress();

      At Client

      rtspclient.RTSPsocket = new Socket(ServerIPAddr, RTSP_server_port);

      3. Set input and output stream filters

      RTSPBufferedReader = new BufferedReader(new InputStreamReader(theServer.RTSPsocket.getInputStream()));
      RTSPBufferedWriter = new BufferedWriter(new OutputStreamWriter(theServer.RTSPsocket.getOutputStream()));

      4. Parse and Reply to RTSP commands

      ReadLine from RTSPBufferedReader and parse tokens to get the RTSP request type

      request = rtspserver.parse_RTSP_request();

      On receiving each request send the appropriate response using RTSPBufferedWriter

      rtspserver.send_RTSP_response();

      Request Can be either of DESCRIBE, SETUP , PLAY, PAUSe , TEARDOWN

      4. TEARDOWN RTSP Command

      Either calls destructor which release the resources and end the session or call the BYE explicietly and close sockets

      rtspserver.RTSPsocket.close();
      rtspserver.RTPsocket.close();

      RTP processing

      1. At Transmitter ( Server) – packetization of the video data into RTP packets.

      This involves creating the packet, set the fields in the packet header, and copy the payload (i.e., one video frame) into the packet.

      Get next frame to send from the video and build the RTP packet

      RTPpacket rtp_packet = new RTPpacket(MJPEG_TYPE, imagenb, imagenb * FRAME_PERIOD, buf, video.getnextframe(buf));

      RTP header formation from above accept parameters – PType, SequenceNumber, TimeStamp , buffer byte[] data and data_length of next frame in buffer go in the packet

      3. TransmitterRetrieve the packet bitstream and store it in an array of bytes and send it as Datagram packet over UDP socket

      senddp = new DatagramPacket(packet_bits, packet_length, ClientIPAddr, RTP_dest_port);
      RTPsocket.send(senddp);

      4. At Receiver – construct a new DatagramSocket to received RTP packets, on client’s RTP port

      rcvdp = new DatagramPacket(buf, buf.length);
      RTPsocket.receive(rcvdp);

      5. Receiver RTP packet header and payload retrival

      RTPpacket rtp_packet = new RTPpacket(rcvdp.getData(), rcvdp.getLength());
      rtp_packet.getsequencenumber(); 
      rtp_packet.getpayload(payload); // payload is bitstreams

      6. Decode the payload as image/ video frame / audio segment and send for consumption by player or file or socket etc.

      SRTP (Secure Real-time Transport Protocol)

      Neither RTP or RTCP provide any flow encryption or authentication means, which is where SRTP comes into picture. SRTP is the security layer which resides between the RTP/RTCP application layer and the transport layer. It provides confidentiality, message authentication, and replay protection for both unicast and multicast RTP and RTCP streams.

      SRTP Packet

      Cryptographic context includes includes

      • session key used directly in encryption/message authentication
      • master key securely exchanged random bit string used to derive session keys
      • other working session parameters ( master key lifetime, master key identifier and length, FEC parameters, etc)
        it must be maintained by both the sender and receiver of these streams.

      Salting keys” are used to protect against pre-computation and time-memory tradeoff attacks.

      To learn more about SRTP specifically visit : https://telecom.altanai.com/2018/03/16/secure-communication-with-rtp-srtp-zrtp-and-dtls/

      RTP in a VoIP Communication system and Conference streaming

      Simulcast

      Client encodes the same audio/video stream twice in different resolutions and bitrates and sending these to a router who then decides who receives which of the streams.

      Multicast Audio Conference

      Assume obtaining a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets. The audio conferencing application used by each conference participant sends audio data in small chunks of ms duration. Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.

      The RTP header indicates what type of audio encoding (such as PCM, ADPCM or LPC) is contained in each packet so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.

      Every packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. Thus RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source. The sequence number can also be used by the receiver to estimate how many packets are being lost.

      For QoS, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP(control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits.

      A site sends the RTCP BYE packet when it leaves the conference.

      Audio and Video Conference

      Audio and video media are transmitted as separate RTP sessions, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name in the RTCP packets for both so that the sessions can be associated.

      Synchronized playback of a source’s audio and video is achieved using timing information carried in the RTCP packets

      Layered Encodings

      In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
      Rate-adaptation should be done by a layered encoding with a layered transmission system.

      In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

      Mixers, Translators and Monitors

      Note that in a VOIP system, where SIP is a signaling protocol , a SIP signalling proxy never participates in the media flow, thus it is media agnostic.

      Mixer

      An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

      Example of Mixer for hi-speed to low-speed packet stream conversion . In conference cases where few participants are connected through a low-speed link where other have hi-speed link, instead of forcing lower-bandwidth, reduced-quality audio encoding for all, an RTP-level relay called a mixer may be placed near the low-bandwidth area.

      This mixer resynchronises incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed links.

      All data packets originating from a mixer will be identified as having the mixer as their synchronization source.

      RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.

      Translator

      An intermediate system that forwards RTP packets with their synchronization source identifier intact.

      Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls.

      Translator for Firewall Limiting IP packet pass

      Some of the intended participants in the audio conference may be connected with high bandwidth links but might not be directly reachable via IP multicast, for reasons such as being behind an application-level firewall that will not let any IP packets pass. For these sites, mixing may not be necessary, in which case another type of RTP-level relay called a translator may be used.

      Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through asecure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site’s internal network.

      Other cases :

      video mixers can scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

      Translator usage when connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.

      Monitor

      An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics.

      Layered Encodings

      In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion. Rate-adaptation should be done by a layered encoding with a layered transmission system.

      In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

      Multiplexing RTP Sessions

      In RTP, multiplexing is provided by the destination transport address (network address and port number) which is different for each RTP session ( seprate for audio and video ). This helps in cases where there is chaneg in encodings , change of clockrates , detection of packet loss suffered and RTCP reporting .
      Moreover RTP mixer would not be able to combine interleaved streams of incompatible media into one stream.

      Interleaving packets with different RTP media types but using the same SSRC would introduce several problems. But multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions.

      REMB ( Receiver Estimated Maximum Bitrate)

      RTCP message used to provide bandwidth estimation in order to avoid creating congestion in the network. Support for this message is negotiated in the Offer/Answer SDP Exchange. Contains total estimated available bitrate on the path to the receiving side of this RTP session (in mantissa + exponent format). REMB is used by

      • sender to configure the maximum bitrate of the video encoding.
      • notify the available bandwidth in the network and by media servers to limit the amount of bitrate the sender is allowed to send.

      In Chrome it is deprecated in favor of the new sender side bandwidth estimation based on RTCP Transport Feedback messages.

      Session Description Protocol (SDP) Capability Negotiation

      SDP Offer/Answer flow

      RTP can carry multiple formats.For each class of application (e.g., audio, video), RTP defines a profile and associated payload formats. Session Description Protocol used to specify the parameters for the sessions.

      Usually in voIP systems SDP packets describing a session with codecs , open ports , media formats etc are embedded in a SIP request such as INVITE .

      SDP can negotiate use of one out of several possible transport protocols. The offerer uses the expected least-common-denominator (plain RTP) as the actual configuration, and the alternative transport protocols as the potential configurations.

      m=audio 53456 RTP/AVP 0 18
      a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF
      

      plain RTP (RTP/AVP)
      Secure RTP (RTP/SAVP)
      RTP with RTCP-based feedback (RTP/AVPF)
      Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)

      Adaptive bitrate control

      Adapt the audio and video codec bitrates to the available bandwidth, and hence optimize audio & video quality. For video, since resolution is chosen at the start only , encoder use bitrate and frame-rate attributes only during runtime to adapt.

      RTCP packet called TMMBR (Temporary Maximum Media Stream Bit Rate Request) is sent to the remote client.


      References:

      Secure Communication with SRTP and key managemnt protocols like SDES, ZRTP and DTLS


      With advent of Voice over IP, the real time streaming of data/audio/video also became critically important to be protected from eavesdropping or modification over the open internet. While Secure Real-time Transport Protocol (SRTP) is a profile of the Real-time Transport Protocol (RTP), which can provide confidentiality, message authentication, and replay protection to the RTP traffic and to the control traffic for RTP, the Real-time Transport Control Protocol (RTCP). ZRTP is a protocol that negotiates the keys and other information required to setup a SRTP audio and video session. To read about RealTime Transport protocol (RTP) , RTP control protocol (RTCP ), before reading about adding security to RTP , RTCP and its feedback use the article link below.

      SRTP (Secure Real-time Transport Protocol)

      SRTP provides a framework for encryption and message authentication of RTP and RTCP streams by negotiating keys.

      It is not a transport but a profile of the Real-time Transport Protocol (RTP) for securing RTP streams in addition to providing confidentiality, integrity protection, source authentication, and replay protection.

      The SRTP specification also defines how to setup and maintain a cryptographic context. This context holds all necessary data to perform the security operations, for example the SRTP encryption keys, the packet sequence counters, authentication keys, and so on. Each SRTP session, which is the same as a RTP session, has its own context. Thus a bidirectional SRTP communication requires two different SRTP cryptographic contexts.

      Features of SRTP

      It is a framework for encryption and message authentication of RTP and RTCP streams.
      Offers confidentiality and integrity of the entire RTP and RTCP packets, together with protection against replayed packets.
      – secure for unicast and multicast RTP applications
      – low computational cost and small footprint
      – high throughput and low packet expansion to support bandwidth economy.
      – permits upgrading with new cryptographic transforms,
      – protection for heterogeneous environments (mix of wired and wireless networks)

      Independant from the underlying transport, network, and physical layers used by RTP, in particular high tolerance to packet loss and re-ordering.

      Normal RTP Packet
      SecureRTP Packet

      SRTCP (Secure RTCP)

      Secure RTCP (SRTCP) is similar to the SRTP format of the SRTCP packet which has the authentication tag and MKI headers, including two additional headers:

      • SRTCP index
      • Encrypt-flag

      Key management protocols for SRTP

      Since SRTP does not contain an integrated key management solution, one can employ any of the following key management protocols

      SDES (Session Description Protocol Security Descriptions) – SRTP Key management

      It is a way to negotiate the key/cryptographic parameters for SRTP.
      Keys are transported in the SDP attachment of a SIP message using TLS transport layer (SSLv3/TLSv1) or other methods like S/MIME.

      media attribute defined by SDES is “crypto”
      a=crypto: inline: [session-parms]

      SDES packet

      3 commonly used crypto suites are :

      1. AES_CM_128_HMAC_SHA1_80
      2. AES_CM_128_HMAC_SHA1_32
      3. F8_128_HMAC_SHA1_32

      DTLS – SRTP Key management

      DTLS keying happens on the media path, independent of any out-of-band signalling channel present.
      DTLS differs from TLS in the way that it is UDP oriented ie unreliable and packet oriented. However these charevteristics amke it a good candidate for lower ;atency , fast handshakes usecasews such as WebRTC.
      DTLS supports replay proctection and flow includes :

      Client                     Server
      |---- ClientHello -------->|
      |<-- HelloVerifyRequest ---|
      |---- ClientHello + Cookie>|
      |<------ ServerHello ------|
      |<----- Certificate -------|
      |<-- ServerHelloDone ------|
      |---- ClientKeyExchange -->|
      |---- Finished ------------|
      |<----- Finished ----------|
      | Secure Communication |
      • Cient Hello, with protocol version, random number, session ID, and supported cipher suites. Addiotnally cookies exchange and seq numbers to protect againt DoS attacks
      • Hello VerifyReq by Server to verify clients identity. To meet this , client needs to retuen its cookie to confirm its origin.
      • ServerHello, , confirming the selected cipher suite and protocol version. Also salt ( random number )
      • Following are further optional steps performed by the server:
        • Server may send a Certificate message to authenticate itself if required by the selected cipher suite.
        • Server does key exchange parameters need to be transmitted, such as for Diffie-Hellman.
        • Server may request a client certificate for mutual authentication.
      • Server HelloDone
      • CLient Certificate
      • Client Key Exchange
      • Finished , handshake is completed
      Jitsi Client SRTP configuration

      An offer can include any of –

      • plain RTP (RTP/AVP),
      • RTP with RTCP-based feedback (RTP/AVPF),
      • Secure RTP (RTP/SAVP), or
      • Secure RTP with RTCP-based feedback (RTP/SAVPF)

      SDP for RTP/AVP

      v=0
      o=987654321-jitsi.org 0 0 IN IP4 x.x.x.x.
      s=-
      c=IN IP4 x.x.x.x
      t=0 0
      m=audio 24380 RTP/AVP 9
      a=rtcp-xr:voip-metrics
      a=rtpmap:9 G722/8000
      a=sendrecv
      m=audio 24400 RTP/AVP 9
      a=rtcp-xr:voip-metrics
      a=rtpmap:9 G722/8000
      a=sendrecv

      or

      v=0.
      o=987654321-jitsi.org 0 0 IN IP4 x.x.x.x.
      s=-.
      c=IN IP4 x.x.x.x.
      t=0 0.
      m=audio 5018 UDP/TLS/RTP/SAVP 9.
      a=rtpmap:9 G722/8000.
      a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level.
      a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level.
      a=rtcp-xr:voip-metrics.
      a=setup:actpass.
      a=fingerprint:sha-1 B9:0F:89:EE:BD:1F:B1:C4:86:B6:D7:5C:25:88:53:F4:02:F4:F5:91.
      m=audio 5018 RTP/SAVPF 9.
      a=rtpmap:9 G722/8000.
      a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level.
      a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level.
      a=rtcp-xr:voip-metrics.
      a=setup:actpass.
      a=fingerprint:sha-1 B9:0F:89:EE:BD:1F:B1:C4:86:B6:D7:5C:25:88:53:F4:02:F4:F5:91.

      The m line indicates which mode of RTP and RTCP is it offering.

      Case where offerer/calleer wants to establish a Secure RTP audio stream on plain RTP with DTLS-SRTP as the key management protocol.

      type: offer, sdp: 
      v=0
      o=- 2977074634695769063 2 IN IP4 127.0.0.1
      s=-
      t=0 0
      a=group:BUNDLE 0 1 2
      a=msid-semantic: WMS i2CKXQdort5QF76tyO5SUKyyyyPfMYR4kjZO
      m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 110 112 113 126
      c=IN IP4 0.0.0.0
      a=rtcp:9 IN IP4 0.0.0.0
      a=ice-ufrag:w5/T
      a=ice-pwd:zuPM49QcEX3cKRQiKylJU4Y6
      a=ice-options:trickle
      a=fingerprint:sha-256 5A:70:05:55:C1:5A:82:51:02:D3:00:A3:BF:E7:EF:62:DF:29:EB:F2:9F:5F:51:58:12:D9:4C:AA:41:36:86:13
      a=setup:actpass
      a=mid:0
      a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
      a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid
      a=sendrecv
      a=msid:i2CKXQdort5QF76tyO5SUKyyyyPfMYR4kjZO 5ffdb0f9-48b1-43bc-9f63-ea032643aeba
      a=rtcp-mux
      a=rtpmap:111 opus/48000/2
      a=rtcp-fb:111 transport-cc
      a=fmtp:111 minptime=10;useinbandfec=1
      a=rtpmap:103 ISAC/16000
      a=rtpmap:104 ISAC/32000
      a=rtpmap:9 G722/8000
      a=rtpmap:0 PCMU/8000
      a=rtpmap:8 PCMA/8000
      a=rtpmap:110 telephone-event/48000
      a=rtpmap:112 telephone-event/32000
      a=rtpmap:113 telephone-event/16000
      a=rtpmap:126 telephone-event/8000
      a=ssrc:2215726670 cname:e6egqLfRbLu6vH45
      a=ssrc:2215726670 msid:i2CKXQdort5QF76tyO5SUKyyyyPfMYR4kjZO 5ffdb0f9-48b1-43bc-9f63-ea032643aeba
      a=ssrc:2215726670 mslabel:i2CKXQdort5QF76tyO5SUKyyyyPfMYR4kjZO
      a=ssrc:2215726670 label:5ffdb0f9-48b1-43bc-9f63-ea032643aeba
      m=application 9 DTLS/SCTP 5000
      c=IN IP4 0.0.0.0
      a=ice-ufrag:w5/T
      a=ice-pwd:zuPM49QcEX3cKRQiKylJU4Y6
      a=ice-options:trickle
      a=fingerprint:sha-256 5A:70:05:55:C1:5A:82:51:02:D3:00:A3:BF:E7:EF:62:DF:29:EB:F2:9F:5F:51:58:12:D9:4C:AA:41:36:86:13
      a=setup:actpass
      a=mid:2
      a=sctpmap:5000 webrtc-datachannel 1024

      SRTP on kamailio

      For Secure Communication kamailio supports – Digest SIP User authentication , Authorization via ACL or group membership , IP and Network authentication , TLS support for SIP signaling , transparent handling of SRTP for secure audio , TLS domain name extension support ,authentication and authorization against database (MySQL, PostgreSQL, UnixODBC, BerkeleyDB, Oracle, text files), RADIUS and DIAMETER.

      Code to set flag rtp_secure_media to true if both TLS and SRTP are active

      <condition field="${rtp_has_crypto}" expression="^(AES_CM_128_HMAC_SHA1_32|AES_CM_128_HMAC_SHA1_80)$" break="never">	
          <action application="set" data="rtp_secure_media=true"/>
      </condition>

      Invite from Jitsi client alternatively offering 3 different types of audio SDP’s – RTP/SAVPF , RTP/SAVP and RTP/AVP. Which ever will be accepted by the other endpoint will be communicated back using SDP in 200 OK.

      INVITE sip:99999999999@x.x.x.x:5080 SIP/2.0
         Call-ID: 2a34d1e981602c82c345513f3f2f89ed@0:0:0:0:0:0:0:0
         CSeq: 1 INVITE
         From: "altanai" ;tag=bed49270
         To: 
         Via: SIP/2.0/UDP y.y.y.y:5060;branch=z9hG4bK-3130-9657d2ae9b662779bc08cdd32881828f
         Max-Forwards: 70
         Contact: "altanai" 
         User-Agent: Jitsi2.10.5550Mac OS X
         Content-Type: application/sdp
         Content-Length: 2336
         v=0
         o=7777777777-jitsi.org 0 0 IN IP4 y.y.y.y
         s=-
         c=IN IP4 y.y.y.y
         t=0 0
         m=audio 5016 UDP/TLS/RTP/SAVP 9
         a=rtpmap:9 G722/8000
         a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level
         a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level
         a=rtcp-xr:voip-metrics
         a=setup:actpass
         a=fingerprint:sha-1 55:CF:25:5D:D5:65:71:C8:D9:FF:97:AD:CC:F2:08:DB:38:DD:81:38
      m=audio 5016 RTP/SAVPF 9
         a=rtpmap:9 G722/8000
         a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level
         a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level
         a=rtcp-xr:voip-metrics
         a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Ekb2qAA8F7VCmz0FMSrad0rIt8duHQFedu/KxMbD
         a=crypto:2 AES_CM_128_HMAC_SHA1_32 inline:rEeGiaLCUbFw0sS0FxARgX9i5pwEj/frxxbgGkch
         a=crypto:3 AES_192_CM_HMAC_SHA1_80 inline:up9VO2T/rfu8V0cecA4RuG0aWgSaCC5gD/p/RdY1odg1p/0Pto0=
         a=crypto:4 AES_192_CM_HMAC_SHA1_32 inline:6yLDM31gAuwrlL0qkH72QYJLwtzX1IX+Z+7UML3VA5CpIbUWeAw=
         a=crypto:5 AES_256_CM_HMAC_SHA1_80 inline:2Q3b3UpPJMosXTrm/0Ui5q3Mw8tQ6ig5Xq0jt4Ibj0t5hVQx5KBRbC+8sMJDMg==
         a=crypto:6 AES_256_CM_HMAC_SHA1_32 inline:yVs8C3xPFY2LAUXIH+dlgBBNSz+jm1cbAQlAgv8hPKGe1zfu2wzx1d465UfFzQ==
         a=crypto:7 F8_128_HMAC_SHA1_80 inline:bhIPhj1TryAB63p/g8B3gL5NXJJ7V4kbjXqYaU54
         a=setup:actpass
         a=fingerprint:sha-1 55:CF:25:5D:D5:65:71:C8:D9:FF:97:AD:CC:F2:08:DB:38:DD:81:38
      m=audio 5016 RTP/SAVP 9
         a=rtpmap:9 G722/8000
         a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level
         a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level
         a=rtcp-xr:voip-metrics
         a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Ekb2qAA8F7VCmz0FMSrad0rIt8duHQFedu/KxMbD
         a=crypto:2 AES_CM_128_HMAC_SHA1_32 inline:rEeGiaLCUbFw0sS0FxARgX9i5pwEj/frxxbgGkch
         a=crypto:3 AES_192_CM_HMAC_SHA1_80 inline:up9VO2T/rfu8V0cecA4RuG0aWgSaCC5gD/p/RdY1odg1p/0Pto0=
         a=crypto:4 AES_192_CM_HMAC_SHA1_32 inline:6yLDM31gAuwrlL0qkH72QYJLwtzX1IX+Z+7UML3VA5CpIbUWeAw=
         a=crypto:5 AES_256_CM_HMAC_SHA1_80 inline:2Q3b3UpPJMosXTrm/0Ui5q3Mw8tQ6ig5Xq0jt4Ibj0t5hVQx5KBRbC+8sMJDMg==
         a=crypto:6 AES_256_CM_HMAC_SHA1_32 inline:yVs8C3xPFY2LAUXIH+dlgBBNSz+jm1cbAQlAgv8hPKGe1zfu2wzx1d465UfFzQ==
         a=crypto:7 F8_128_HMAC_SHA1_80 inline:bhIPhj1TryAB63p/g8B3gL5NXJJ7V4kbjXqYaU54
      m=audio 5016 RTP/AVP 9
      a=rtpmap:9 G722/8000
      a=extmap:1 urn:ietf:params:rtp-hdrext:csrc-audio-level
      a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level
      a=rtcp-xr:voip-metrics

      Kamailio in secure mode selects the SRTP block of Audio SDP and responds in 200 OK

      RTP to SRTP Bridging in Freeswitch

      Enable ZRTP globally. Can override this on a per channel basis http://wiki.freeswitch.org/wiki/ZRTP (on how to enable zrtp).

      When SRTP it’s critical to not offer or accept variable bit rate codecs, doing so would leak information and possibly compromising SRTP stream. (FS-6404).

      Supported SRTP Crypto Suites:

      AEAD_AES_256_GCM_8

      This algorithm is identical to AEAD_AES_256_GCM (see Section 5.2 of [RFC5116]), except that the tag length, t, is 8, and an authentication tag with a length of 8 octets (64 bits) is used. An AEAD_AES_256_GCM_8 ciphertext is exactly 8 octets longer than its corresponding plaintext.

      AEAD_AES_128_GCM_8

      This algorithm is identical to AEAD_AES_128_GCM (see Section 5.1 of [RFC5116]), except that the tag length, t, is 8, and an authentication tag with a length of 8 octets (64 bits) is used. An AEAD_AES_128_GCM_8 ciphertext is exactly 8 octets longer than its corresponding plaintext.

      AES_CM_256_HMAC_SHA1_80 | AES_CM_192_HMAC_SHA1_80 | AES_CM_128_HMAC_SHA1_80

      AES_CM_128_HMAC_SHA1_80 is the SRTP default AES Counter Mode cipher and HMAC-SHA1 message authentication with an 80-bit authentication tag. The master-key length is 128 bits and has a default lifetime of a maximum of 2^48 SRTP packets or 2^31 SRTCP packets, whichever comes first.

      AES_CM_256_HMAC_SHA1_32 | AES_CM_192_HMAC_SHA1_32 | AES_CM_128_HMAC_SHA1_32

      This crypto-suite is identical to AES_CM_128_HMAC_SHA1_80 except that the authentication tag is 32 bits. The length of the base64-decoded key and salt value for this crypto-suite MUST be 30 octets i.e., 240 bits; otherwise, the crypto attribute is considered invalid.

      AES_CM_128_NULL_AUTH

      The SRTP default cipher (AES-128 Counter Mode), but to use no authentication method. This policy is NOT RECOMMENDED unless it is unavoidable; see Section 7.5 of [RFC3711].

      SRTP variables that modify behaviors based on direction/leg:

      rtp_secure_media

      possible values:
      mandatory – Accept/Offer SAVP negotiation ONLY
      optional – Accept/Offer SAVP/AVP with SAVP preferred
      forbidden – More useful for inbound to deny SAVP negotiation
      false – implies forbidden
      true – implies mandatory

      default if not set is accept SAVP inbound if offered.

      rtp_secure_media_inbound | rtp_secure_media_outbound

      This is the same as rtp_secure_media, but would apply to either inbound or outbound offers specifically.

      How to specify crypto suites: By default without specifying any crypto suites FreeSWITCH will offer crypto suites from strongest to weakest accepting the strongest each endpoint has in common. If you wish to force specific crypto suites you can do so by appending the suites in a comma separated list in the order that you wish to offer them in.

      Examples:
      rtp_secure_media=mandatory:AES_CM_256_HMAC_SHA1_80,AES_CM_256_HMAC_SHA1_32
      rtp_secure_media=true:AES_CM_256_HMAC_SHA1_80,AES_CM_256_HMAC_SHA1_32
      rtp_secure_media=optional:AES_CM_256_HMAC_SHA1_80
      rtp_secure_media=true:AES_CM_256_HMAC_SHA1_80

      Additionally you can narrow this down on either inbound or outbound by specifying as so:

      rtp_secure_media_inbound=true:AEAD_AES_256_GCM_8
      rtp_secure_media_inbound=mandatory:AEAD_AES_256_GCM_8
      rtp_secure_media_outbound=true:AEAD_AES_128_GCM_8
      rtp_secure_media_outbound=optional:AEAD_AES_128_GCM_8
      

      rtp_secure_media_suites

      Optionaly you can use rtp_secure_media_suites to dictate the suite list and only use rtp_secure_media=[optional|mandatory|false|true] without having to dictate the suite list with the rtp_secure_media* variables.

      In vars.xml SIP and TLS settings valid options: sslv2,sslv3,sslv23,tlsv1,tlsv1.1,tlsv1.2 default: tlsv1,tlsv1.1,tlsv1.2 . http://wiki.freeswitch.org/wiki/Tls

      TLS cipher suite: default ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH The actual ciphers supported will change per platform. openssl ciphers -v ‘ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH’ Will show you what is available in your verion of openssl.

      SRTP to RTP over multiple Crypto suits

      Logs and explanation for RTP to SRTP translation in Freeswitch

      A client at 7777777777@ is trying to call 9999999999@ , which freeswitch has to proxy and convert from RTP to SRTP. The following debug logs form sofia external show this process.

      A RTP SIP INVITE + offer with SDP is received.

      INVITE sip:9999999999@:5080;transport=UDP SIP/2.0
         Via: SIP/2.0/UDP :47851;branch=z9hG4bK-524287-1---7cc8ad9383e9787d;rport
         Max-Forwards: 70
         Contact: :47851;transport=UDP>
         To: :5080;transport=UDP>
         From: :5080;transport=UDP>;tag=5df9f82c
         Call-ID: lFNvnuABQfOpROxfFp-MZQ..
         CSeq: 1 INVITE
         Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, MESSAGE, OPTIONS, INFO, SUBSCRIBE
         Content-Type: application/sdp
         User-Agent: Z 5.2.28 rv2.8.115
         Allow-Events: presence, kpml, talk
         Content-Length: 607
         
         v=0
         o=Z 20472192 0 IN IP4 
         s=Z
         c=IN IP4 
         t=0 0
         m=audio 8000 RTP/AVP 106 9 3 111 0 8 97 110 112 98 101 100 99 102
         a=rtpmap:106 opus/48000/2
         a=fmtp:106 minptime=20; cbr=1; maxaveragebitrate=40000; useinbandfec=1
         a=rtpmap:111 speex/16000
         a=rtpmap:97 iLBC/8000
         a=fmtp:97 mode=20
         a=rtpmap:110 speex/8000
         a=rtpmap:112 speex/32000
         a=rtpmap:98 telephone-event/48000
         a=fmtp:98 0-16
         a=rtpmap:101 telephone-event/8000
         a=fmtp:101 0-16
         a=rtpmap:100 telephone-event/16000
         a=fmtp:100 0-16
         a=rtpmap:99 telephone-event/32000
         a=fmtp:99 0-16
         a=rtpmap:102 G726-32/8000
         a=sendrecv
      [NOTICE] switch_channel.c:1104 New Channel sofia/external/7777777777@:5080 [ed5e07ee-bd00-4a47-b4e1-6abc9dd23ed6]
      [DEBUG] switch_core_state_machine.c:584 (sofia/external/7777777777@:5080) Running State Change CS_NEW (Cur 1 Tot 33)
      [DEBUG] sofia.c:10078 sofia/external/7777777777@:5080 receiving invite from :4642 version: 1.9.0 -742-8f1be0 64bit
      [DEBUG] sofia.c:7291 Channel sofia/external/7777777777@:5080 entering state [received][100]
      
      [DEBUG] sofia.c:7301 Remote SDP:
      v=0
      o=Z 20472192 0 IN IP4 
      s=Z
      c=IN IP4 
      t=0 0
      m=audio 8000 RTP/AVP 106 9 3 111 0 8 97 110 112 98 101 100 99 102
      a=rtpmap:106 opus/48000/2
      a=fmtp:106 minptime=20; cbr=1; maxaveragebitrate=40000; useinbandfec=1
      a=rtpmap:111 speex/16000
      a=rtpmap:97 iLBC/8000
      a=fmtp:97 mode=20
      a=rtpmap:110 speex/8000
      a=rtpmap:112 speex/32000
      a=rtpmap:98 telephone-event/48000
      a=fmtp:98 0-16
      a=rtpmap:101 telephone-event/8000
      a=fmtp:101 0-16
      a=rtpmap:100 telephone-event/16000
      a=fmtp:100 0-16
      a=rtpmap:99 telephone-event/32000
      a=fmtp:99 0-16
      a=rtpmap:102 G726-32/8000
      [DEBUG] sofia.c:7693 (sofia/external/7777777777@:5080) State Change CS_NEW -> CS_INIT
      State NEW
      Running State Change CS_INIT (Cur 1 Tot 33)
      Standard INIT
      State Change CS_INIT -> CS_ROUTING
      State INIT going to sleep
      Running State Change CS_ROUTING (Cur 1 Tot 33)
      Callstate Change DOWN -> RINGING
      State ROUTING
      
      send 389 bytes to udp/[]:4642 at 07:08:27.376085:
      SIP/2.0 100 Trying
      
      Via: SIP/2.0/UDP :47851;branch=z9hG4bK-524287-1---7cc8ad9383e9787d;rport=4642;received=
         From: :5080;transport=UDP>;tag=5df9f82c
         To: :5080;transport=UDP>
         Call-ID: lFNvnuABQfOpROxfFp-MZQ..
         CSeq: 1 INVITE
         User-Agent: FreeSWITCH-mod_sofia/1.9.0-742-8f1b7e0~64bit
         Content-Length: 0

      After the INVITE is received and processed with 100 trying reply, the routing and RTP secure tranformation begins by adding crypto keys and forwarding to destination

      Standard EXECUTE
      ed5e07ee EXECUTE sofia/external/7777777777@:5080 set(rtp_secure_media=optional)
      [rtp_secure_media]=[optional]
      ed5e07ee EXECUTE sofia/external/7777777777@:5080 log(INFO Forwarding calls 9999999999@ )
      Forwarding calls 9999999999@
      …
      Set Local audio crypto Key [1 AEAD_AES_256_GCM_8 inline:aHJ1yquBtm4Lzfi2oMpe6cV7IBEy3YgKxrJ3qjvLuRXSuZfHcV4VtVNwHDw]
      Set Local video crypto Key [1 AEAD_AES_256_GCM_8 inline:qeJbqlSbnKBNew575hSZ3LX78o6GBsjgOrSMxzGH/zb1E7mkls1Mda93U9w]
      Set Local text crypto Key [1 AEAD_AES_256_GCM_8 inline:VghMVsjWQwnOAAjBJ1NTB3jZgfpNV/Yu4poxkAPMqkC7C+fhPKApCJrWg3U]
      Set Local audio crypto Key [2 AEAD_AES_128_GCM_8 inline:7XNrjjwC/eOVnWlBSp74DfiIGAEYn/BN+latfA]
      Set Local video crypto Key [2 AEAD_AES_128_GCM_8 inline:UQrFpy9Q7L5DI/ww4e5IAmwy7BxSw5yd/T0v0Q]
      Set Local text crypto Key [2 AEAD_AES_128_GCM_8 inline:ZqkEPrUFHkaQ+7CROp52H/JO0MbrYWk/Eyl9lQ]
      Set Local audio crypto Key [3 AES_CM_256_HMAC_SHA1_80 inline:PTGAm2KlbfuKtIUVGtXknKKzALAzfILZJuPOjfO9S07eWRE6FR0aMUvjuehJgw]
      Set Local video crypto Key [3 AES_CM_256_HMAC_SHA1_80 inline:ahHIB0o/dp3SliYWK9BkxM7TfzILwG0bjDn7JuvYi+puRkTM4mYvvsSmywLaYA]
      Set Local text crypto Key [3 AES_CM_256_HMAC_SHA1_80 inline:crAs8dPcWJkEEGj5nqTvFGl/TWpxxb86k+dX5gBXhh+q6DO2pEqWNkQmm55aLA]
      Set Local audio crypto Key [4 AES_CM_192_HMAC_SHA1_80 inline:SLBJWjgMdfiYX7TUwWQ9CmqUsILLJrpBIVjbfuQmpBIFLvvA/XU]
      Set Local video crypto Key [4 AES_CM_192_HMAC_SHA1_80 fNazWgWwNRPjUKNHVqkz44]
      Set Local text crypto Key [4 AES_CM_192_HMAC_SHA1_80 inline:hbe9qqETBSK5hRQ8DI9mXL4QAjjGSR8tGDiTHCJF3yxCrRk1ajk]
      Set Local audio crypto Key [5 AES_CM_128_HMAC_SHA1_80 inline:8q8mer9N2V4qVxnaazuJeT0KXgW2scONy36J3KaS]
      Set Local video crypto Key [5 AES_CM_128_HMAC_SHA1_80 inline:TP5NQ1yB8ZSCCwZMgXur9VHZ5SlpNfnXePj7eZrk]
      Set Local text crypto Key [5 AES_CM_128_HMAC_SHA1_80 inline:HT3F3iYG8H/majhBZbOs2Z8ye/WEVGT5Oytx2oQS]
      Set Local audio crypto Key [6 AES_CM_256_HMAC_SHA1_32 inline:fEohh92lX2xLmeFYlt8YouM2jN4z5pU05d90BYfoAKU6m4CWv8g8AnifDUKk9A]
      Set Local video crypto Key [6 AES_CM_256_HMAC_SHA1_32 inline:+uBNmLcvj41hXoMxNlMNBpq68gU4PmLwYcdopEB/X/jfPElkUgHfguPIgIFJUg]
      Set Local text crypto Key [6 AES_CM_256_HMAC_SHA1_32 inline:cqk7D3+KMQ+31R4FFDRRzn/aluyIgjxBL59vfxcsdf5OW9izEJtU+06GewJyIA]
      Set Local audio crypto Key [7 AES_CM_192_HMAC_SHA1_32 inline:Tv25TfP9fQZ+ljs/tFlHohkckiK4F6cemzEjHSvo2+q6No4ai+o]
      Set Local video crypto Key [7 AES_CM_192_HMAC_SHA1_32 inline:CY/Dizd1QrlobZtgnigr0hWE+oDSx4S1F51Zpo4aZamN+8ZMdp8]
      Set Local text crypto Key [7 AES_CM_192_HMAC_SHA1_32 inline:aEox/7IMps5c+uOWbosZ618+opkJV/GnrKc2EnAhVnDNeo91+No]
      Set Local audio crypto Key [8 AES_CM_128_HMAC_SHA1_32 inline:0LwKGyljIed0zhukiMMyD5ive0ZsyybwBrnevcAv]
      Set Local video crypto Key [8 AES_CM_128_HMAC_SHA1_32 inline:eZN8rAG8UPPntdYxsg1kkWL4qMsVgTiGGiS4UeUM]
      Set Local text crypto Key [8 AES_CM_128_HMAC_SHA1_32 inline:bAYzbfr+El8usaTkPBR6iFuTda4uLNGjyx9lQWkX]
      Set Local audio crypto Key [9 AES_CM_128_NULL_AUTH inline:5m3142gGG1HZ5VnoXsAOyopSwDCYbrIsGpdbEO3D]
      Set Local video crypto Key [9 AES_CM_128_NULL_AUTH inline:zXk67wjwRhSilq0kiz5TWxXqrxuTaWTA3qqbVo/G]
      Set Local text crypto Key [9 AES_CM_128_NULL_AUTH inline:FRP9CJbBO+PRj6I9RSBAiMxRZ/qFtyrEXPfxocG0]
      sending invite version: 1.9.0 -742-8f1b7e0 64bit
      Local SDP:
      v=0
      o=FreeSWITCH 1552960557 1552960558 IN IP4
      s=FreeSWITCH
      c=IN IP4
      t=0 0
      m=audio 18750 RTP/SAVP 102 9 0 8 103 101
      a=rtpmap:102 opus/48000/2
      a=fmtp:102 useinbandfec=1; maxaveragebitrate=30000; maxplaybackrate=48000; ptime=20; minptime=10; maxptime=40; stereo=1
      a=rtpmap:9 G722/8000
      a=rtpmap:0 PCMU/8000
      a=rtpmap:8 PCMA/8000
      a=rtpmap:103 telephone-event/48000
      a=fmtp:103 0-16
      a=rtpmap:101 telephone-event/8000
      a=fmtp:101 0-16
      a=crypto:1 AEAD_AES_256_GCM_8 inline:aHJ1yquBtm4Lzfi2oMpe6cV7IBEy3YgKxrJ3qjvLuRXSuZfHcV4VtVNwHDw
      a=crypto:2 AEAD_AES_128_GCM_8 inline:7XNrjjwC/eOVnWlBSp74DfiIGAEYn/BN+latfA
      a=crypto:3 AES_CM_256_HMAC_SHA1_80 inline:PTGAm2KlbfuKtIUVGtXknKKzALAzfILZJuPOjfO9S07eWRE6FR0aMUvjuehJgw
      a=crypto:4 AES_CM_192_HMAC_SHA1_80 inline:SLBJWjgMdfiYX7TUwWQ9CmqUsILLJrpBIVjbfuQmpBIFLvvA/XU
      a=crypto:5 AES_CM_128_HMAC_SHA1_80 inline:8q8mer9N2V4qVxnaazuJeT0KXgW2scONy36J3KaS
      a=crypto:6 AES_CM_256_HMAC_SHA1_32 inline:fEohh92lX2xLmeFYlt8YouM2jN4z5pU05d90BYfoAKU6m4CWv8g8AnifDUKk9A
      a=crypto:7 AES_CM_192_HMAC_SHA1_32 inline:Tv25TfP9fQZ+ljs/tFlHohkckiK4F6cemzEjHSvo2+q6No4ai+o
      a=crypto:8 AES_CM_128_HMAC_SHA1_32 inline:0LwKGyljIed0zhukiMMyD5ive0ZsyybwBrnevcAv
      a=crypto:9 AES_CM_128_NULL_AUTH inline:5m3142gGG1HZ5VnoXsAOyopSwDCYbrIsGpdbEO3D
      a=ptime:20
      a=sendrecv

      Once the SDP is ready with crypto keys it is the forwarded to the next_up

      send 2104 bytes to udp/[]:5060 at 07:08:27.378167:
      INVITE sip:9999999999@ SIP/2.0
         Via: SIP/2.0/UDP :5080;rport;branch=z9hG4bKmF251mK2pN35B
         Max-Forwards: 69
         From: "7777777777" >;tag=vcKeKD6SN02cB
         To: >
         Call-ID: a27898fd-c4b8-1237-ddaa-02a933b32da0
         CSeq: 1935861 INVITE
         Contact: :5080>
         User-Agent: FreeSWITCH-mod_sofia/1.9.0-742-8f1b7e0~64bit
         Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, MESSAGE, INFO, UPDATE, REGISTER, REFER, NOTIFY
         Supported: timer, path, replaces
         Allow-Events: talk, hold, conference, refer
         Content-Type: application/sdp
         Content-Disposition: session
         Content-Length: 1304
         X-FS-Support: update_display,send_info
         Remote-Party-ID: "7777777777" >;party=calling;screen=yes;privacy=off
         v=0
         o=FreeSWITCH 1552960557 1552960558 IN IP4 
         s=FreeSWITCH
         c=IN IP4 
         t=0 0
         m=audio 18750 RTP/SAVP 102 9 0 8 103 101
         a=rtpmap:102 opus/48000/2
         a=fmtp:102 useinbandfec=1; maxaveragebitrate=30000; maxplaybackrate=48000; ptime=20; minptime=10; maxptime=40; stereo=1
         a=rtpmap:9 G722/8000
         a=rtpmap:0 PCMU/8000
         a=rtpmap:8 PCMA/8000
         a=rtpmap:103 telephone-event/48000
         a=fmtp:103 0-16
         a=rtpmap:101 telephone-event/8000
         a=fmtp:101 0-16
         a=crypto:1 AEAD_AES_256_GCM_8 inline:aHJ1yquBtm4Lzfi2oMpe6cV7IBEy3YgKxrJ3qjvLuRXSuZfHcV4VtVNwHDw
         a=crypto:2 AEAD_AES_128_GCM_8 inline:7XNrjjwC/eOVnWlBSp74DfiIGAEYn/BN+latfA
         a=crypto:3 AES_CM_256_HMAC_SHA1_80 inline:PTGAm2KlbfuKtIUVGtXknKKzALAzfILZJuPOjfO9S07eWRE6FR0aMUvjuehJgw
         a=crypto:4 AES_CM_192_HMAC_SHA1_80 inline:SLBJWjgMdfiYX7TUwWQ9CmqUsILLJrpBIVjbfuQmpBIFLvvA/XU
         a=crypto:5 AES_CM_128_HMAC_SHA1_80 inline:8q8mer9N2V4qVxnaazuJeT0KXgW2scONy36J3KaS
         a=crypto:6 AES_CM_256_HMAC_SHA1_32 inline:fEohh92lX2xLmeFYlt8YouM2jN4z5pU05d90BYfoAKU6m4CWv8g8AnifDUKk9A
         a=crypto:7 AES_CM_192_HMAC_SHA1_32 inline:Tv25TfP9fQZ+ljs/tFlHohkckiK4F6cemzEjHSvo2+q6No4ai+o
         a=crypto:8 AES_CM_128_HMAC_SHA1_32 inline:0LwKGyljIed0zhukiMMyD5ive0ZsyybwBrnevcAv
         a=crypto:9 AES_CM_128_NULL_AUTH inline:5m3142gGG1HZ5VnoXsAOyopSwDCYbrIsGpdbEO3D
         a=ptime:20

      Multimedia Internet Keying (MIKEY) – Key management of SRTP

      can establish multiple security contexts or cryptographic sessions with a single message.
      Can be used in p2p or bradcast scenarios where one entity generates the key and needs to distribute the key to a number of participants.

      Modes of operations

      • Pre-Shared Key
      • Public Key Encryption
      • Diffie-Hellman
      • HMAC-Authenticated Diffie-Hellman
      • RSA-R
      • TICKET
      • IBAKE
      • SAKKE

      References