Join the webinar on ‘Demystifying Audiovisual (AV) Experience Testing Using AI’ on Aug 6th.
close

Measure User Experience with Mean Opinion Score

Assess user experience quantitatively with Mean Opinion Score (MOS), a standardized metric that evaluates user satisfaction and perception of quality.
Mean Opinion Score or MOSMean Opinion Score or MOS

Mean Opinion Score or MOS: What it Means and Why it Matters

June 14, 2024
 by 
 Rohith Ramesh Rohith Ramesh
Rohith Ramesh

Introduction

Do you want to avoid experiencing dropped calls or needing help to hear clearly during essential conversations? If so, you're not alone. The quality of your VoIP calls can significantly impact your communication experience. This is where the MOS score, or Mean Opinion Score, comes into play.

MOS score is a crucial metric that helps you measure and improve the quality of your VoIP calls. It considers various factors, such as packet loss, latency, and jitter, which can affect call clarity and consistency.

This blog will explore how MOS score can help you assess and enhance VoIP quality. Let's delve into the world of MOS scores and VoIP quality.

Understanding Mean Opinion Score

Mean Opinion Score (MOS) is a crucial metric in assessing the quality of voice and video sessions in telecommunications. It quantifies the human-judged overall quality of an event or experience, typically rated on a scale from 1 (bad) to 5 (excellent).

Initially, MOS was determined through surveys conducted by expert observers. However, modern methods often use Objective Measurement Methods to approximate human rankings, producing an MOS that reflects the average of various human-scored parameters.

Defining a Good MOS Score

MOS scores range from 1.0 to 5.0, with higher scores indicating better sound quality. A score of 3.5 suggests that about half of the users have experienced subpar voice quality. Most VoIP calls fall within the 3.5 to 4.2 range.

Achieving a perfect score of 5.0 is rare due to the human tendency only to give perfect ratings. A MOS score 4.3, indicating excellent voice quality, is a realistic and desirable target.

However, it's important to note that over 50% of remote workers still report audio quality as a significant issue during conference calls.

Read: Why is Audio/Video Testing Crucial? How is it Performed on Real Devices?

Applications of Mean Opinion Score

Mean Opinion Score can be applied wherever human subjective experience is valuable. It is often used to evaluate digital approximations of real-world phenomena.

Key areas where MOS is commonly used include:

  • Static image compression (e.g., JPG, GIF)
  • Audio codecs (e.g., MP3, Vorbis, AAC, Opus)
  • Video codecs (e.g., H.264, VP8)

MOS is also frequently employed in streaming sessions where network issues affect communication quality.

Determining Mean Opinion Score 

Originally, MOS was a subjective measurement reflecting listeners' perceptions of voice quality and clarity. It involved listeners scoring calls in a controlled environment meeting specific size and noise criteria.

In VoIP, MOS measurements are more objective, providing a quality measure of the network. The ITU-T PESQ P.862 standard defines MOS testing for VoIP networks. This algorithm-based method uses true voice samples as test signals to model subjective tests.

To ensure accuracy, modern telecom equipment must use speech-like signals optimized to avoid unpredictable or unreliable results.

Measuring Mean Opinion Score 

Assessing call quality for MOS can be done in various ways, including algorithms that predict MOS scores, which are common in VoIP networks. However, human assessment remains the most effective but may only sometimes be practical for more extensive networks. The ultimate MOS score is determined by averaging the scores of all participants, falling within the range of 0 to 5. A score of 5 signifies excellent call quality, while a score of 0 denotes a sound that is indecipherable.

Modern tests often use algorithms focusing on factors like response time, codec speed, and other metrics to predict how voice quality would be perceived. Real voice signals test clarity, delay, packet loss, and jitter, estimating a MOS score. While this is an estimate compared to human-based MOS, it is more practical, scalable, and quantifiable.

After data collection, calculations for the R-Factor (Rating-Factor) are performed. R-Factor metrics account for factors that can degrade call quality beyond network errors, such as propagation delay, packetization delay, and jitter buffer.

Also read: The Comprehensive Guide to Validating Audio-Visual Performances

Metrics Considered in Mean Opinion Score 

MOS evaluates three main metrics:

  • Listening Quality
  • Transmission Quality
  • Conversational Quality

Mean opinion score testing can assess all three metrics simultaneously or focus on one aspect at a time. Participants in controlled samples rate each metric individually during testing. Measurements are collected to determine factors like latency or one-way delay, aiming to identify areas for network improvement.

In automated tests, algorithms are employed to evaluate voice signals, predict how humans perceive quality, and estimate a MOS score.

Factors Contributing to a Low Mean Opinion Score in Video and Voice Calls

Various elements along the communication chain, from sender to receiver, can decrease mean opinion scores. Human factors such as health and issues with audio/video equipment and computer settings can contribute to a degradation in communication quality. However, network-related problems are often the most noticeable and quantifiable in these calls. Factors like jitter, latency, and packet loss can be measured numerically and directly impact the perceived quality of the call.

Check out: The HeadSpin AV Box - Unlocking the Future of Audio/Video Testing

Factors Influencing VoIP MOS Test Scores

Understanding VoIP MOS scores entails recognizing their relative nature, which various factors impact voice quality. Unlike traditional phone lines, VoIP systems are susceptible to unique elements affecting MOS scores, including:

  • Hardware
  • Bandwidth
  • Jitter
  • Latency
  • Packet Loss
  • Codec Version

The Codec version plays a significant role, as compression ratios can notably affect voice quality. Non-compressed codecs offer superior voice quality, minimizing susceptibility to audio quality loss.

While compression systems can conserve bandwidth, there's a trade-off with voice quality. Optimal VoIP codecs balance bandwidth conservation with minimal voice quality degradation. Recent codecs supporting HD-Voice can enhance quality but may sacrifice bandwidth efficiency.

MOS is a critical resource for VoIP providers and clients, ensuring high-quality service and providing insights for enhancing voice quality. Whether subjective or objective, MOS measures voice clarity and guides improvements in VoIP call quality.

Relationship Between MOS Score and VoIP Call Quality

Originally developed to assess the quality of service (QoS) for traditional voice calls, MOS has been adapted for Voice over IP (VoIP) calls. The International Telecommunication Union (ITU-T) has standardized MOS scores for VoIP, providing guidelines for calculating them based on factors such as the codec used.

Different VoIP codecs function in distinct ways; some are uncompressed, prioritizing quality, while others use compressed codecs to conserve bandwidth. These variations impact the overall VoIP call quality and are reflected in the calculated MOS scores.

Assessing VoIP Performance Using MOS

1. Overview of MOS Measurement

MOS scores, ranging from 1 (poor) to 5 (excellent), gauge the perceived quality of voice calls. Initially derived from expert surveys, modern MOS scores often employ Objective Measurement Methods.

ITU-T PESQ P.862 standardizes MOS for VoIP, factoring in variables like codec choice. For instance, the G.711 codec, widely used in VoIP, can achieve a maximum MOS of 4.4.

2. VoIP MOS Testing Process

During MOS testing, participants rate voice quality on a 1 to 5 scale based on pre-recorded samples or live calls. Factors considered include clarity, loudness, delay, jitter, packet loss, and background noise.

Averaging participants' ratings generates an overall MOS score, categorizing call quality from 1 (unacceptable) to 5 (excellent).

3. Importance of VoIP MOS Testing

Mean Opinion Score testing provides actionable insights for VoIP service providers and network administrators. Providers can address call quality issues by understanding user perceptions and ensuring a satisfactory user experience.

Regular Mean Opinion Score testing allows providers to optimize VoIP networks, delivering consistent, high-quality voice communication.

Leveraging Network Monitoring for MOS Score Assessment

In addition to Device Monitoring, employing a comprehensive end-to-end Network Monitoring tool enables a holistic understanding of VoIP Quality from the end-user perspective.

Synthetic traffic monitoring every 500ms, facilitated by tools like HeadSpin's Network Monitoring solution, enhances proactive network issue detection without requiring packet capture.

This approach facilitates a thorough network assessment, helping to:

  1. Identify Network Problems: Pinpoint issues such as packet loss, jitter, and bandwidth constraints that can impact VoIP performance.
  1. Determine Problem Locations: Locate VoIP issues within the network infrastructure across various locations, guiding troubleshooting efforts effectively.
  1. Identify Responsible Parties: Assign responsibility for addressing VoIP Quality issues to relevant stakeholders, whether users, applications, network administrators, or ISPs.
  1. Develop Solutions: Armed with comprehensive data, devise efficient solutions to resolve VoIP Quality issues promptly and effectively.

By leveraging network monitoring tools and employing a systematic approach to problem-solving, businesses can ensure optimal VoIP performance and enhance user experience.

Read: Automated Mobile App Performance Testing and Optimization Guide

How HeadSpin Effectively Evaluates MOS Performance

HeadSpin’s Waterfall UI

Video content is significant in various mobile applications, from live-streaming events to interactive gaming experiences. Ensuring a smooth and high-quality video experience is crucial for enhancing user satisfaction.

mos-performance-metric

Performance Session Link:
https://ui.headspin.io/sessions/de8d3768-c57d-11e9-bcde-f01898ea5299/waterfall

HeadSpin's AI Engine MOS Score

HeadSpin offers a sophisticated AI Engine that generates Mean Opinion Score (MOS) time series for videos captured directly on the HeadSpin Platform or supplied through an API. This algorithm estimates the MOS for each frame in the video, ranging from 1 (Very Poor) to 5 (Excellent) quality.

MOS Score Range

  • MOS 1: Very Poor
  • MOS 2: Poor
  • MOS 3: Fair
  • MOS 4: Good
  • MOS 5: Excellent

iPhone X Video Examples

1. MOS Score: 3.54 [YouTube Video]

iphone-x-video-optimization

Comments: The Chinese and QR characters may need more clarity, but there are no visible blockiness issues.

2. MOS Score: 2 [NBA Video with Blur and Blockiness]

NBA Video with blur and blockiness

Comments: Noticeable blockiness affects the player on the video screen capture, while the scoreboard appears blurry.

3. MOS Score: 0.57 [YouTube Video with Severe Blockiness]

MOS-Score-0.57-Youtube-Video-with-severe-blockiness

Comments: Severe blockiness renders the face of the person indistinguishable, impacting overall video clarity.

Summing Up

Various real-world factors, including device and OS specifications, network bandwidth, latency, jitter, packet losses, CDN misconfiguration, and mobile application client issues, influence the delivery of video content through mobile applications.

HeadSpin's platform offers a comprehensive solution for evaluating video Mean Opinion Score (MOS) and capturing network performance. By leveraging these tools, developers and QA teams can identify and address issues promptly, enhancing the mobile video viewing experience and ensuring customer satisfaction.

Connect Now

FAQs

Q1. What role does the content delivery network (CDN) play in video streaming?

Ans: The content delivery network (CDN) is essential for video streaming as it brings content closer to viewers than the origin server. This proximity reduces round-trip time (RTT), enhancing streaming efficiency. Moreover, utilizing a CDN minimizes the risk of bandwidth-related delays, ensuring smoother live streams for viewers.

Q2. What methods are used for video quality testing?

Ans: Video quality testing employs various approaches:

  1. Frame-Level Analysis: Utilized for Set-Top Boxes (STBs), it yields specific Key Performance Indicators (KPIs) regardless of codecs or protocols. Leveraging the Media Processing Unit (MPU), it records and processes live or on-demand video using a non-referenced model.
  1. IP-Level Analysis: KPI generation relies on bitstream analysis, compatible with encrypted and unencrypted transmission.
  1. Application-Level Analysis: This method involves high-level KPIs and user-defined scripts, offering insights into the end-user experience provided by a video application.
Share this

Mean Opinion Score or MOS: What it Means and Why it Matters

4 Parts

Close

Perfect Digital Experiences with Data Science Capabilities

Utilize HeadSpin's advanced capabilities to proactively improve performance and launch apps with confidence
popup image