Real-time Transport Protocol (RTP) is the standard network protocol used to deliver audio and video over IP-based networks. Whenever a VoIP call is made, a video conference is held, or a radio transmission is carried over an IP network, RTP is almost certainly the mechanism carrying that media from one point to another. It is, in practical terms, the backbone of real-time voice communication on modern networks.
Defined in IETF RFC 3550, RTP works by packaging audio or video data into a stream of packets, each stamped with timing information and a sequence number. These two pieces of metadata are what make real-time delivery possible: the timestamp allows the receiving end to reconstruct audio at the correct pace, while the sequence number makes it possible to detect when packets have arrived out of order or gone missing entirely. RTP typically runs over UDP rather than TCP, a deliberate design choice that prioritises speed over guaranteed delivery, since a slightly imperfect packet arriving on time is far more useful in a voice call than a perfect packet that arrives too late.
RTP and the metrics that matter
Because RTP carries the actual voice data, not just signalling about a call, but the audio itself, it is the primary source of truth for in-call quality. The key metrics derived from RTP streams are the same ones that define whether a call sounds good or bad:
Packet loss occurs when RTP packets fail to reach their destination. Even modest loss rates, above 1 to 2 percent, are audible to most listeners as clicks, gaps, or missing syllables. Higher loss rates make calls effectively unusable.
Jitter refers to variation in the arrival time of RTP packets. Networks don’t deliver packets with perfect regularity, and most endpoints use a jitter buffer to smooth out this variability. But when jitter exceeds the buffer’s capacity, packets are discarded, which has the same effect as packet loss.
Latency is the end-to-end delay between a packet being sent and received. For conversational voice, one-way latency above around 150ms starts to feel unnatural; above 400ms, conversations become genuinely difficult to hold.
These three metrics – loss, jitter, and latency – are the diagnostic foundation of voice quality monitoring. None of them are visible without inspecting the RTP stream itself, which is precisely why RTP monitoring is so central to understanding real-world call quality.
RTP in telecom networks
In telecom environments, RTP streams are flowing constantly and at enormous scale. Every active call generates at least two RTP streams, one in each direction, and a single carrier may be handling hundreds of thousands of concurrent calls at any given moment. This scale makes RTP both indispensable and challenging to monitor effectively.
The difficulty is compounded by the fact that RTP issues don’t always originate where they manifest. A packet loss event visible in an RTP stream at the receiving end might be caused by congestion several network hops upstream, a misconfigured QoS policy, a codec incompatibility, or a problem on an interconnect link between carriers. Without visibility into the RTP stream at multiple points across the network, isolating the root cause is more guesswork than engineering.
This is where the depth of RTP monitoring makes a real operational difference. Capturing every packet, correlating RTP media streams with SIP signalling, and applying high-resolution time-slicing to detect short-lived degradation events, rather than averaging them away, is what separates actionable insight from a dashboard that just confirms things are broadly fine.
RTP in air traffic control
The adoption of IP-based voice infrastructure in air traffic control has brought RTP into one of the most demanding environments it operates in. Modern ATC voice systems, built to the EUROCAE ED-137 standard for IP radio and telephony, use RTP to carry communications between ground stations, voice switches, and ultimately to aircraft via VHF/UHF radio.
In this context, RTP monitoring isn’t just a tool for improving service quality, it’s a component of safety assurance. ATC communication has zero tolerance for the kinds of degradation that might be considered acceptable in commercial telephony. A controller who mishears a clearance, or a pilot who receives a broken transmission, represents a risk that the industry works extremely hard to eliminate.
What makes RTP monitoring in ATC particularly important is the need to catch degradation proactively. By the time a controller reports a audio problem, the event has already happened. Continuous analysis of RTP streams – tracking packet loss, jitter, and latency in real time against defined thresholds – allows operations teams to identify deteriorating conditions before they reach the point of impact.
RTP and RTCP: the monitoring companion
RTP is almost always accompanied by its sibling protocol, RTP Control Protocol (RTCP). While RTP carries the media, RTCP carries periodic control messages between endpoints, including statistics on packet loss, jitter, and round-trip time as seen by each side of the call. RTCP provides a useful high-level view of call health, but it has limitations: it reports averages over intervals, which can mask short bursts of degradation that are nonetheless clearly audible. Comprehensive RTP monitoring goes beyond what RTCP reports, inspecting the packet stream directly to capture the full picture of in-call quality at the resolution that matters.