Delivering high-quality multimedia experiences at scale requires objective Quality of Experience (QoE) models that reliably approximate human perception across heterogeneous contents, devices, and network conditions. Subjective user studies remain the reference standard for assessing perceived quality, but they are costly, time-consuming, and impractical for continuous or large-scale monitoring. Consequently, service providers and system designers increasingly rely on objective, learning-based models to predict and manage QoE in real time. Yet, existing approaches often fail to generalize across content types, devices, and operating conditions, leaving a persistent gap between measurable system parameters and what users actually experience. This dissertation addresses the general problem of defining, training, and deploying learning-based QoE models. It develops a two-layer conceptual framework encompassing both service-level QoE, which models user experience as a function of network and application behavior, and content-level QoE, which assesses the perceptual quality of visual media itself. Across these complementary layers, the thesis demonstrates how data-driven methods can extract task-relevant features, learn perceptual dependencies, and achieve resource-efficient deployment. Together, these studies outline a methodological paradigm for learning-based QoE estimation applicable to interactive and immersive media services across diverse multimedia contexts, including conversational, streaming, immersive, and biometric applications. The proposed methodology demonstrates how QoE modeling principles can be adapted to different media types, learning paradigms, and deployment settings. Each study corresponds to a peer-reviewed publication, collectively forming a systematic progression of research that advances toward a consistent methodology for perception-aware QoE prediction. The proposed framework is built around five pillars: (i) signal design, identifying task-relevant cues from application, network, and content while removing noise and redundancy; (ii) temporal modeling, capturing sequential dependencies and dynamic effects that shape perceived quality over time; (iii) multi-view fusion, enabling learning from integrating heterogeneous or partially shared data sources; (iv) multi-projection fusion, enabling learning from complementary feature spaces including multi-projection representations for 3D volumetric media; and (v) computational efficiency, ensuring the resulting models are lightweight, scalable, and suitable for real-time or edge deployment. Using this framework, the dissertation develops several specialized models: an application-telemetry-driven QoE predictor for real-time audiovisual Web Real-Time Communication (WebRTC) conversations; a transformer-based estimator for adaptive video streaming that encodes start-up delays, quality switches, and stalls; a collaborative multi-view learning approach that supports privacy-preserving QoE modeling under distributed data constraints; and a no-reference point cloud quality assessment (NR-PCQA) model that employs multi-projection features and adaptive view weighting to evaluate volumetric content without pristine references. A final case study on face image quality assessment (FIQA) demonstrates the adaptability of these design principles, highlighting the generality and deployability of the framework. Across these chapters, the models show strong correlation with subjective judgments, robustness to variations in content and devices, and favorable computational profiles for real-time operation. They establish consistent evaluation protocols and feature analysis procedures that promote interpretability, reproducibility, and cross-context reliability.
Learning Based Objective QoE Models Across Interactive and Immersive Media
HAMIDI, MOHAMMADALI
2026
Abstract
Delivering high-quality multimedia experiences at scale requires objective Quality of Experience (QoE) models that reliably approximate human perception across heterogeneous contents, devices, and network conditions. Subjective user studies remain the reference standard for assessing perceived quality, but they are costly, time-consuming, and impractical for continuous or large-scale monitoring. Consequently, service providers and system designers increasingly rely on objective, learning-based models to predict and manage QoE in real time. Yet, existing approaches often fail to generalize across content types, devices, and operating conditions, leaving a persistent gap between measurable system parameters and what users actually experience. This dissertation addresses the general problem of defining, training, and deploying learning-based QoE models. It develops a two-layer conceptual framework encompassing both service-level QoE, which models user experience as a function of network and application behavior, and content-level QoE, which assesses the perceptual quality of visual media itself. Across these complementary layers, the thesis demonstrates how data-driven methods can extract task-relevant features, learn perceptual dependencies, and achieve resource-efficient deployment. Together, these studies outline a methodological paradigm for learning-based QoE estimation applicable to interactive and immersive media services across diverse multimedia contexts, including conversational, streaming, immersive, and biometric applications. The proposed methodology demonstrates how QoE modeling principles can be adapted to different media types, learning paradigms, and deployment settings. Each study corresponds to a peer-reviewed publication, collectively forming a systematic progression of research that advances toward a consistent methodology for perception-aware QoE prediction. The proposed framework is built around five pillars: (i) signal design, identifying task-relevant cues from application, network, and content while removing noise and redundancy; (ii) temporal modeling, capturing sequential dependencies and dynamic effects that shape perceived quality over time; (iii) multi-view fusion, enabling learning from integrating heterogeneous or partially shared data sources; (iv) multi-projection fusion, enabling learning from complementary feature spaces including multi-projection representations for 3D volumetric media; and (v) computational efficiency, ensuring the resulting models are lightweight, scalable, and suitable for real-time or edge deployment. Using this framework, the dissertation develops several specialized models: an application-telemetry-driven QoE predictor for real-time audiovisual Web Real-Time Communication (WebRTC) conversations; a transformer-based estimator for adaptive video streaming that encodes start-up delays, quality switches, and stalls; a collaborative multi-view learning approach that supports privacy-preserving QoE modeling under distributed data constraints; and a no-reference point cloud quality assessment (NR-PCQA) model that employs multi-projection features and adaptive view weighting to evaluate volumetric content without pristine references. A final case study on face image quality assessment (FIQA) demonstrates the adaptability of these design principles, highlighting the generality and deployability of the framework. Across these chapters, the models show strong correlation with subjective judgments, robustness to variations in content and devices, and favorable computational profiles for real-time operation. They establish consistent evaluation protocols and feature analysis procedures that promote interpretability, reproducibility, and cross-context reliability.| File | Dimensione | Formato | |
|---|---|---|---|
|
Ph.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
3.71 MB
Formato
Adobe PDF
|
3.71 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/356189
URN:NBN:IT:UNICA-356189