RealTime Transport protocol (RTP) and RTP control protocol (RTCP )

In a VOIP system, where SIP is a signaling protocol , a SIP proxy never participates in the media flow, thus it is media agnostic.

SDP packets describing a session with codecs , open ports , media formats etc are embedded in a SIP request such as invite .
Post a SDP Offer/Answer flow , RTP and RTCP esnsure that mediastream flow between the endpoints .

RTP is the provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services.

RTCP is the control protocl which provides monitoring of the data delivery, qos in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.


protocol framework
supports use of RTP-level translators and mixers.
independent of the underlying transport and network layers.
does not address resource reservation
does not guarantee quality-of-service for real-time services.
services like payload type identification, sequence numbering, timestamping and delivery monitoring.

The sequence numbers included in RTP allow the receiver to reconstruct the sender’s packet sequence,

Usage :
Multimedia Multi particpant conferences
Storage of continuous data
Interactive distributed simulation
active badge, control and measurement applications

Simple Multicast Audio Conference

Assume obtaining a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets.
The audio conferencing application used by each conference participant sends audio data in small chunks of ms duration.
Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.

The RTP header indicates what type of audio encoding (such as PCM, ADPCM or LPC) is contained in each packet so that senders can change the encoding during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.

Every packet networks, occasionally loses and reorders packets and delays them by variable amounts of time. Thus RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source.
The sequence number can also be used by the receiver to estimate how many packets are being lost.

For QoS, each instance of the audio application in the conference periodically multicasts a reception report plus the name of its user on the RTCP(control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings. In addition to the user name, other identifying information may also be included subject to control bandwidth limits.

A site sends the RTCP BYE packet when it leaves the conference.

Audio and Video Conference

Audio and video media y are transmitted as separate RTP sessions . separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses.
There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same distinguished (canonical) name in the RTCP packets for both so that the sessions can be associated.

synchronized playback of a source’s audio and video is achieved using timing information carried in the RTCP packets

Mixers , Translators and Monitors


An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet.

example of Mixer for hi-speed to low-speed packet stream conversion

In conference cases where few participants are connected through a low-speed link where other have hi-speed link, instead of forcing lower-bandwidth, reduced-quality audio encoding for all, an RTP-level relay called a mixer may be placed near the low-bandwidth area.
This mixer resynchronizes incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed links.

All data packets originating from a mixer will be identified as having the mixer as their synchronization source.
The RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers.


An intermediate system that forwards RTP packets with their synchronization source identifier intact.

Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls.

Tranasltor for Firewall Limiting IP packet pass

Some of the intended participants in the audio conference may be connected with high bandwidth links but might not be directly reachable via IP multicast, for reasons such as being behind an application-level firewall that will not let any IP packets pass. For these sites, mixing may not be necessary, in which case another type of RTP-level relay called a translator may be used.

Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through asecure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site’s internal network.

Other cases :

video mixers can scales the images of individual people in separate video streams and composites them into one video stream to simulate a group scene.

Translator usage when connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.


An application that receives RTCP packets sent by participants in an RTP session, in particular the reception reports, and estimates the current quality of service for distribution monitoring, fault diagnosis and long-term statistics.

Layered Encodings

In conflicting bandwidth requirements of heterogeneous receivers, Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion.
Rate-adaptation should be done by a layered encoding with a layered transmission system.

In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Receivers can then adapt to network heterogeneity and control their reception bandwidth by joining only the appropriate subset of the multicast groups.

RTP Session

In an RTP session, each particpant maintains a full, separate space of SSRC identifiers. The set of participants included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP.

Real-Time Transport Protocol
    [Stream setup by SDP (frame 554)]
        [Setup frame: 554]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 .... = Extension: False
    .... 0000 = Contributing source identifiers count: 0
    0... .... = Marker: False
    Payload type: ITU-T G.711 PCMU (0)
    Sequence number: 39644
    [Extended sequence number: 39644]
    Timestamp: 2256601824
    Synchronization Source identifier: 0x78006c62 (2013293666)
    Payload: 7efefefe7efefe7e7efefe7e7efefe7e7efefe7e7efefe7e...

Synchronization source (SSRC)

32-bit numeric SSRC identifier for source of a stream of RTP packets.
All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.

the binding of the SSRC identifiers is provided through RTCP.
If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC.

Contributing source (CSRC)

A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer.
The mixer inserts a list of the SSRC identifiers of the sources , called CSRC list, that contributed to the generation of a particular packet into the RTP header of that packet.

An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer).


periodic trnsmission of control packet
underlying protocol must provide multiplexing of the data and control packets
-provide feedback on the quality of the data distribution , congestion control , fault dialoginis , control of adaptive encodings
-carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME , which is used to keep track of each participant
-observer number of particpants to rate of senidng packets for scaling up
-convey minimal session control information

Exmample of RTCP sender and receiver reports on transmission and reception statistics

Real-time Transport Control Protocol (Receiver Report)
    [Stream setup by SDP (frame 4)]
        [Setup frame: 4]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 0001 = Reception report count: 1
    Packet type: Receiver Report (201)
    Length: 7 (32 bytes)
    Sender SSRC: 0x796dd0d6 (2037240022)
    Source 1
        Identifier: 0x00000000 (0)
        SSRC contents
            Fraction lost: 0 / 256
            Cumulative number of packets lost: 1
        Extended highest sequence number received: 6534
            Sequence number cycles count: 0
            Highest sequence number received: 6534
        Interarrival jitter: 0
        Last SR timestamp: 0 (0x00000000)
        Delay since last SR timestamp: 0 (0 milliseconds)
Real-time Transport Control Protocol (Source description)
    [Stream setup by SDP (frame 4)]
        [Setup frame: 4]
        [Setup Method: SDP]
    10.. .... = Version: RFC 1889 Version (2)
    ..0. .... = Padding: False
    ...0 0001 = Source count: 1
    Packet type: Source description (202)
    Length: 6 (28 bytes)
    Chunk 1, SSRC/CSRC 0x796DD0D6
        Identifier: 0x796dd0d6 (2037240022)
        SDES items
            Type: CNAME (user and domain) (1)
            Length: 8
            Text: 796dd0d6
            Type: NOTE (note about source) (7)
            Length: 5
            Text: plivo
            Type: END (0)

Multiplexing RTP Sessions

In RTP, multiplexing is provided by the destination transport address (network address and port number) which is different for each RTP session ( seprate for audio and video ). This helps in cases where there is chaneg in encodings , change of clockrates , detection of packet loss suffered and RTCP reporting .
Moreover RTP mixer would not be able to combine interleaved streams of incompatible media into one stream.

Interleaving packets with different RTP media types but using the same SSRC would introduce several problems.
But multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions.


RFC 3550 – RTP: A Transport Protocol for Real-Time Applications


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s