Mean Opinion Score (MOS) is a numerical measure of perceived audio quality in voice communications. It expresses, on a scale from 1 to 5, how good a voice call sounds to the person listening: 5 is excellent, indistinguishable from talking to someone in the same room, while 1 is so degraded that communication is effectively impossible. The scores in between represent the range of real-world experience: slight but noticeable impairments, clearly audible degradation, and calls that are difficult to follow but technically still connected.
What makes MOS useful is that it reduces a complex, subjective experience to a single number that can be tracked over time, compared across call paths, and set as a threshold in service level agreements. Rather than relying on complaint rates or the impressions of operations staff, MOS gives teams a consistent, objective reference point for voice quality. It is widely used across the telecoms industry and has become a standard metric wherever the intelligibility of a voice path needs to be measured and managed.
How MOS is determined
The original MOS methodology, defined in ITU-T recommendation P.800, was based on panels of human listeners rating audio samples under controlled conditions. Test subjects would listen to recordings of speech passed through a system under evaluation and score what they heard. Averaged across a sufficiently large group, those scores produced a reliable indication of perceptual quality that correlated well with real-world user experience.
Running subjective listening tests at scale is not practical for live network monitoring, so the industry developed algorithmic models that estimate MOS from measurable signal and network parameters. The two most widely adopted are PESQ (Perceptual Evaluation of Speech Quality, ITU-T P.862) and its successor POLQA (Perceptual Objective Listening Quality Analysis, ITU-T P.863). Both work by analysing an audio signal, accounting for impairments such as noise, distortion, clipping, and codec artefacts, and producing a score that correlates closely with what a human listener would give.
A further approach, used extensively in network monitoring, estimates MOS from network-layer metrics rather than the audio signal itself. Models in this category, including those derived from the ITU-T E-model, calculate a predicted MOS from values such as packet loss, jitter, and codec type. This makes it possible to estimate voice quality continuously from RTP stream data without needing to decode or analyse the audio, which is particularly valuable at scale.
MOS in telecom networks
In telecom environments, MOS functions as a key performance indicator for voice quality across interconnects, carrier routes, and customer-facing services. Operations teams use it to monitor the quality of call paths in real time, identify underperforming routes, and investigate the source of quality complaints before they escalate.
The score thresholds that define acceptable quality in telecom vary by context, but a useful rule of thumb is that scores above 4.0 represent good quality, scores in the 3.5 to 4.0 range are acceptable for most services, and anything below 3.5 will generate noticeable dissatisfaction. Scores below 3.0 indicate serious degradation that most users will find difficult to tolerate.
MOS does not exist in isolation. It is shaped by the same underlying conditions that define overall voice quality: packet loss raises it or lowers it depending on severity and pattern; jitter affects it through the compensating behaviour of jitter buffers; codec choice sets an upper ceiling on what is achievable regardless of how clean the network path is. Monitoring MOS alongside the individual metrics that drive it gives operations teams both a top-level indicator and the diagnostic detail needed to act on it.
One limitation worth understanding is that averaged MOS scores can mask short-lived impairments. A call that is mostly excellent but contains a five-second burst of severe degradation may still average out to a score that looks acceptable. This is why the granularity of measurement matters: tracking MOS at fine time resolution, rather than as a single figure per call, reveals the kind of transient events that listeners notice but averages hide.
MOS in air traffic control
In air traffic control, MOS takes on a significance that goes beyond the service quality concerns of commercial telephony. ATC voice communication is a safety-critical system, and the intelligibility of every exchange between a controller and a pilot is part of the safety case for the operation of controlled airspace. A MOS score that would be considered merely suboptimal in a consumer context represents a genuine operational risk in an ATC environment.
The technical standards that govern IP-based ATC voice systems, in particular EUROCAE ED-137 for voice over IP in aeronautical communications, set requirements for audio quality that translate directly into MOS thresholds. These are not aspirational targets; they are defined minimum levels that must be maintained across every operational voice path, at all times. Compliance is not a one-time certification exercise but an ongoing operational obligation.
This is where continuous voice quality monitoring becomes essential. Tracking MOS in real time across all active voice paths allows operations teams to identify when quality is approaching a threshold before it crosses one. A channel that shows a gradual decline in MOS over several hours is giving an early warning of something changing in the network or radio environment. Catching that pattern proactively, rather than waiting for a controller to report a problem, is the difference between a maintenance action and a safety event.
The challenge specific to ATC is scale combined with the requirement for continuous availability. A facility may have dozens or hundreds of active frequencies, each carrying its own RTP stream, each subject to its own network path and radio conditions. Monitoring MOS meaningfully across that environment requires automated analysis at a level of granularity that reflects how ATC voice actually behaves, including the ability to detect impairments that last only seconds but are nonetheless clearly audible to a controller or pilot.
MOS and its practical limits
MOS is a valuable and widely trusted metric, but it is worth understanding what it does not capture. It measures perceptual quality at a point in time and from the perspective of the listener, but it does not directly reflect whether an instruction was understood correctly. A call can score reasonably well on MOS and still contain a moment of degradation severe enough to cause a mishearing. This is why MOS works best as one component of a broader quality assurance framework, alongside metrics that capture the specific impairments that drive degradation rather than just their combined perceptual effect.
It is also worth noting that different MOS estimation methods do not always produce identical scores for the same audio. PESQ, POLQA, and network-model-based estimates each have their own calibration and their own strengths. Understanding which method a monitoring system uses, and what assumptions it makes, is important context for interpreting the numbers it produces.