How can you take VoIP monitoring to the next level?

Michael Wallbaum

About Voipfuture 

Voipfuture is a voice over IP service monitoring and analytics company, offering the only carrier-grade dual-visibility platform on the market.

The use of VoIP services has exploded over recent years. The rise of remote work and the expansion of the 5G network mean that the industry will only keep growing. In 2021, the global VoIP market was valued at $85.2 billion (USD), with an estimated compound annual growth rate (CAGR) of 3.8%; the industry is projected to reach $102.5 billion (USD) by 2026.

Wider use in a range of new scenarios means providers need accurate VoIP monitoring to ensure the quality of their service. Given the increasingly crowded marketplace, it has never been easier for clients to find a different option and switch CSP (Communication Service Provider). With this in mind, providers are looking for new tools and metrics to improve VoIP monitoring, ensure superior in-call quality, and pursue the best user experience possible.

The gap between signaling and media KPIs

There are established and well-defined IETF standards for SIP (Session Initiation Protocol). Signaling KPIs such as SEER (Session Establishment Effectiveness Ratio) and SRD (Session Request Delay) are good tools for reflecting the quality of establishing and maintaining the call. 

However, there is considerable room for improvement when it comes to media KPIs. 

The difference in quality between media and signaling KPIs is in some sense understandable. The task of assessing the quality of a call’s content (audio) is harder to distill into a small number of convenient KPIs. Many competing factors play a role, all contributing to what the end-user hears.

On the other hand, signaling KPIs only have to reflect whether the call was established and maintained, not the specific media the user received. SIP signaling is also easier to monitor because a) it uses dedicated ports (one knows where to look for) and b) there are far fewer packets to analyze as it only accounts for 1-2% of a call’s bandwidth.

Conventional media KPIs for VoIP monitoring, management, and troubleshooting include:

  • MOS (Mean Opinion Score)
  • Packet Loss
  • Jitter
  • Delay

With these metrics alone, providers struggle to truly get a full impression of the actual user experience. Instead, these metrics often feel like just figures quoted in service-level agreements, with little connection to the reality of the actual in-call quality provided.

Real-life examples where this gap becomes a reality

Take MOS, for example, It was originally an empirical subjective measurement. People would be played audio files and asked to rate the quality on a scale of one to five. This process has since been emulated using machines. The ITU (International Telecommunication Union) recommends MOS be determined for 8 to 30-second calls.

But what is the MOS of a one-hour call? 

There is no ITU definition to quantify this. Systems may provide the MOS for an entire call, but what does this truly represent? Is it the minimum for a given media flow direction? Is it an average? What if one direction is shorter than the other?

Other persistent issues with media KPIs include:

  • Codecs – Modern mobile networks utilize packets with different modes and bit rates throughout the call. How is this accounted for during assessment?
  • Packet loss – Quoting a packet loss rate of 1% doesn’t provide enough information to know its actual effect. Losing every one-hundredth packet is not a problem, but grouped packet losses can frustrate users and cause them to give up on their call.
  • Aggregating KPIs – How do you combine metrics describing a large number of calls of different durations? As it stands, there is no agreed-upon approach for aggregating and generating KPIs or statistics that take this into account.

Effectively translating user experience into meaningful KPIs for VoIP monitoring is not simple. However, a new approach is showing the way.

New: How does time slicing transform VoIP monitoring?

Time slicing was developed to handle the large number of packets in RTP (Real-time Transport Protocol) monitoring. This approach breaks down a call into more manageable chunks and assesses each on its own merits. This is done by segmenting the RTP media stream (i.e., the packets it consists of) into fixed timeslices (e.g., 5 seconds) and producing a Quality Data Record (QDR), summarizing performance during that period.

RTP stream image
Source: Voipfuture

For VoIP monitoring, every single time slice contains a significant amount of information, such as:

  • Source/recipient of packets
  • Codecs used
  • Packet Loss
  • Jitter
  • Policy conformance
  • And much more

With fixed-length time slices, media KPIs determined for each are directly comparable.

Taking a time slicing approach to in-call quality assessment facilitates two different layers of analysis:

Detailed View

This provides in-call metrics to understand performance and simplify troubleshooting by determining the root causes behind any problems present.

Aggregated View

With the call segmented into fixed timeslices, providers can also precisely assess service performance with a bird’s eye view facilitated by effective QDR aggregation. 

Impact of timeslicing image
Source: Voipfuture

Time Slicing in action

Troubleshooting simplified

Analyzing the QDRs that make up an RTP stream allows for patterns affecting in-call quality to be easily identified.

Image of QDRs
Source: Voipfuture

The above example shows the QDRs from an access router. By visualizing temporal details, a clear pattern emerges with periodic packet loss and jitter. In this example, the network management system was polling the router once every minute via SNMP (Simple Network Management Protocol), causing recurring users complaints.

A new media KPI for continual MOS assessment

Through assessing each individual QDR, time slicing allows MOS to be mapped continually during a call.

Image of MOS
Source: Voipfuture

By setting a MOS threshold, providers can create a new media KPI known as the Good Minute Ratio (GMR) that helps describe in-call quality in greater detail. GMR is simply the ratio of good minutes (minutes exceeding a defined MOS threshold) to the total number of minutes in the call – 97% in the example above (97 good minutes/100 total minutes).

This can be linked to any group of media streams, for example, specific locations or incoming RTP from a single interconnection partner.

Grouping and aggregation to provide meaningful analysis

While time slicing is the tool, the statistics and improved KPIs it produces are what transform VoIP monitoring. To do this, you need to implement effective grouping and aggregation.

The simplest form of this would be a media trunk that assesses stream quality between two endpoints (IP addresses for sender/receiver) in both directions. A more advanced approach incorporates SIP trunks to correlate the signaling data with the media stream in real-time to provide statistics about both, for different call setups and media directions.

Source: Voipfuture


Successful VoIP monitoring to improve user experience requires effective signaling AND media KPIs. While existing signaling KPIs adequately reflect in-call quality, metrics regarding media streams lack detail and don’t come close to describing the real-life experience they intend to.

Every packet of data transferred using VoIP has a very different context. By segmenting the RTP stream with time slicing, providers can better understand their in-call quality performance and simplify troubleshooting. Plus, time slicing allows for the creation of advanced tools to effectively aggregate data and create new media KPIs that genuinely reflect user experience. This is exactly what we do at Voipfuture

WordPress Cookie Notice by Real Cookie Banner