Exploring the Potential of Multimodal (Deep) Learning

Tortora, Matteo

Multimodal Learning involves integrating heterogeneous data from multiple sources, all gained from observing the same phenomenon. By leveraging these multimodal data sources, we can potentially derive a more comprehensive, robust, and enriched representation to enhance the robustness and performance of AI algorithms compared to the use of a stand-alone modality. While formal proof is lacking, this intuitive approach has yielded promising results in numerous applications. Hence, in this doctoral thesis, we aim to investigate the potential of multimodal learning by examining its real-world applications in three diverse industries such as healthcare, well-being, and energy. In this respect, we first present a multimodal approach for predicting treatment outcomes in patients suffering from non-small-cell lung cancer undergoing radiotherapy. The approach integrates abstract representations computed from radiological images, whole-slide images, and tabular data. Second, we introduce an innovative method for early stress detection, which integrates multimodal deep learning and deep reinforcement learning (DRL). It works with time series data, integrating information gathered from a wearable device collecting physiological data. As a third industry, we focus on the smart energy communities, introducing a novel multimodal architecture for forecasting photovoltaic (PV) power generation. It combines the artificial intelligence paradigm with physical knowledge of photovoltaic power generation, feeding the architecture with historical PV data and historical and forecast weather data. A global take-home message arises from these three real-world applications: multimodal learning has the potential to improve outcomes in various critical fields thanks to the inherent possibility to merge data representation from different sources as well as to the opportunity to learn boundaries in these abstract spaces, using both traditional machine learning and deep learning algorithms. Specifically, in the context of predicting outcomes in radiation therapy using a multimodal framework, the concurrent fusion of three modalities significantly enhances performance compared to using standalone data flows. In the well-being application, the synergy between the dynamic nature of the DRL paradigm and multimodal data collected from wearable devices yields impressive results in early stress detection. Lastly, the integration of historical PV data with historical and forecast weather data enables our model to consistently outperform the current state-of-the-art, even in challenging weather conditions.

Exploring the Potential of Multimodal (Deep) Learning

TORTORA, MATTEO

2024

Abstract

Multimodal Learning involves integrating heterogeneous data from multiple sources, all gained from observing the same phenomenon. By leveraging these multimodal data sources, we can potentially derive a more comprehensive, robust, and enriched representation to enhance the robustness and performance of AI algorithms compared to the use of a stand-alone modality. While formal proof is lacking, this intuitive approach has yielded promising results in numerous applications. Hence, in this doctoral thesis, we aim to investigate the potential of multimodal learning by examining its real-world applications in three diverse industries such as healthcare, well-being, and energy. In this respect, we first present a multimodal approach for predicting treatment outcomes in patients suffering from non-small-cell lung cancer undergoing radiotherapy. The approach integrates abstract representations computed from radiological images, whole-slide images, and tabular data. Second, we introduce an innovative method for early stress detection, which integrates multimodal deep learning and deep reinforcement learning (DRL). It works with time series data, integrating information gathered from a wearable device collecting physiological data. As a third industry, we focus on the smart energy communities, introducing a novel multimodal architecture for forecasting photovoltaic (PV) power generation. It combines the artificial intelligence paradigm with physical knowledge of photovoltaic power generation, feeding the architecture with historical PV data and historical and forecast weather data. A global take-home message arises from these three real-world applications: multimodal learning has the potential to improve outcomes in various critical fields thanks to the inherent possibility to merge data representation from different sources as well as to the opportunity to learn boundaries in these abstract spaces, using both traditional machine learning and deep learning algorithms. Specifically, in the context of predicting outcomes in radiation therapy using a multimodal framework, the concurrent fusion of three modalities significantly enhances performance compared to using standalone data flows. In the well-being application, the synergy between the dynamic nature of the DRL paradigm and multimodal data collected from wearable devices yields impressive results in early stress detection. Lastly, the integration of historical PV data with historical and forecast weather data enables our model to consistently outperform the current state-of-the-art, even in challenging weather conditions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Dottorato in scienze e ingegneria per l'uomo e l'ambiente
			
	Data di pubblicazione
	
				29-gen-2024
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				SODA, PAOLO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				IANNELLO, GIULIO
			
	Nome Editore
	
				Università Campus Bio-Medico
			
	Collezione di appartenenza
	
				Università Campus Bio-medico di Roma

File in questo prodotto:

File	Dimensione	Formato
final_manuscript_mtortora.pdf accesso aperto Dimensione 3.46 MB Formato Adobe PDF Visualizza/Apri	3.46 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/122849

Il codice NBN di questa tesi è URN:NBN:IT:UNICAMPUS-122849