Water quality monitoring is a critical aspect for ensuring environmental protection, industrial safety, and public health. Traditional laboratory based methods, though accurate, are often slow, expansive, and labor-intensive, making them unsuitable for real-time decision making in rapidly changing environments. This thesis presents a novel methodology for water quality assessment that leverages Ultraviolet-Visible (UV-Vis) spectroscopy combined with machine learning to develop soft sensing systems for real-time monitoring. The research focuses on two main applications: industrial wastewater from the highly polluting leather industry, and drinking water quality, where ensuring safety and regulatory compliance is fundamental. In industrial contexts, the aim is to predict key water quality indicators such as Chemical Oxygen Demand (COD), Total Suspended Solids (TSS), and chlorides in real time, while in drinking water contexts, the focus is on parameters such as Total Organic Carbon (TOC), volatile organic compounds, metals, anions, cations, and microbiological parameters. Key innovations of this research include robust preprocessing techniques to enhance data integrity and optimize model performance. Additionally, sophisticated feature extraction methods are developed, incorporating statistical measures, peak-based features, slope-based features, and Area Under the Curve (AUC) calculations to capture meaningful spectral information. The core of this work involves developing soft sensors that integrate UV-Vis spectroscopic data with machine learning models. A significant challenge addressed by this research is the limited availability of high-quality training data, particularly in highly polluted industrial environments. This issue is tackled using Conditional Generative Adversarial Networks (CGAN) for data augmentation. The results show significant improvements in predictive performance when synthetic data are used, demonstrating the potential of CGANs to supplement real datasets effectively. Furthermore, the research develops time series prediction models employing methods like one dimensional-Convolutional Neural Networks (1D-CNNs) and Echo State Networks (ESN) to forecast water quality indicators effectively, enhancing proactive monitoring capabilities. Moreover, to promote transparency and stakeholder adoption, techniques such as random forest feature importance and SHapley Additive exPlanations (SHAP) are employed to improve the interpretability of the machine learning models, providing insights into which spectral features are most important for predicting specific water quality parameters. Overall, this research confirms the viability of soft sensing technologies combined with machine learning for automated, real-time water quality monitoring.

Soft Sensors and Machine Learning for Automatic Water Quality Assessment

CARDIA, MARCO
2025

Abstract

Water quality monitoring is a critical aspect for ensuring environmental protection, industrial safety, and public health. Traditional laboratory based methods, though accurate, are often slow, expansive, and labor-intensive, making them unsuitable for real-time decision making in rapidly changing environments. This thesis presents a novel methodology for water quality assessment that leverages Ultraviolet-Visible (UV-Vis) spectroscopy combined with machine learning to develop soft sensing systems for real-time monitoring. The research focuses on two main applications: industrial wastewater from the highly polluting leather industry, and drinking water quality, where ensuring safety and regulatory compliance is fundamental. In industrial contexts, the aim is to predict key water quality indicators such as Chemical Oxygen Demand (COD), Total Suspended Solids (TSS), and chlorides in real time, while in drinking water contexts, the focus is on parameters such as Total Organic Carbon (TOC), volatile organic compounds, metals, anions, cations, and microbiological parameters. Key innovations of this research include robust preprocessing techniques to enhance data integrity and optimize model performance. Additionally, sophisticated feature extraction methods are developed, incorporating statistical measures, peak-based features, slope-based features, and Area Under the Curve (AUC) calculations to capture meaningful spectral information. The core of this work involves developing soft sensors that integrate UV-Vis spectroscopic data with machine learning models. A significant challenge addressed by this research is the limited availability of high-quality training data, particularly in highly polluted industrial environments. This issue is tackled using Conditional Generative Adversarial Networks (CGAN) for data augmentation. The results show significant improvements in predictive performance when synthetic data are used, demonstrating the potential of CGANs to supplement real datasets effectively. Furthermore, the research develops time series prediction models employing methods like one dimensional-Convolutional Neural Networks (1D-CNNs) and Echo State Networks (ESN) to forecast water quality indicators effectively, enhancing proactive monitoring capabilities. Moreover, to promote transparency and stakeholder adoption, techniques such as random forest feature importance and SHapley Additive exPlanations (SHAP) are employed to improve the interpretability of the machine learning models, providing insights into which spectral features are most important for predicting specific water quality parameters. Overall, this research confirms the viability of soft sensing technologies combined with machine learning for automated, real-time water quality monitoring.
30-mar-2025
Italiano
Artificial Intelligence
Data Augmentation
Generative Adversarial Network
Machine Learning
Soft Sensing
Spectroscopy
UV-Vis spectroscopy
Wastewater Analysis
Water Quality Monitoring
Chessa, Stefano
Micheli, Alessio
Gambineri, Francesca
File in questo prodotto:
File Dimensione Formato  
phd_thesis_def_etd.pdf

accesso aperto

Dimensione 5.72 MB
Formato Adobe PDF
5.72 MB Adobe PDF Visualizza/Apri
report_activities.pdf

non disponibili

Dimensione 46.17 kB
Formato Adobe PDF
46.17 kB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216557
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216557