Multimodal Learning involves integrating heterogeneous data from multiple sources, all gained from observing the same phenomenon. By leveraging these multimodal data sources, we can potentially derive a more comprehensive, robust, and enriched representation to enhance the robustness and performance of AI algorithms compared to the use of a stand-alone modality. While formal proof is lacking, this intuitive approach has yielded promising results in numerous applications. Hence, in this doctoral thesis, we aim to investigate the potential of multimodal learning by examining its real-world applications in three diverse industries such as healthcare, well-being, and energy. In this respect, we first present a multimodal approach for predicting treatment outcomes in patients suffering from non-small-cell lung cancer undergoing radiotherapy. The approach integrates abstract representations computed from radiological images, whole-slide images, and tabular data. Second, we introduce an innovative method for early stress detection, which integrates multimodal deep learning and deep reinforcement learning (DRL). It works with time series data, integrating information gathered from a wearable device collecting physiological data. As a third industry, we focus on the smart energy communities, introducing a novel multimodal architecture for forecasting photovoltaic (PV) power generation. It combines the artificial intelligence paradigm with physical knowledge of photovoltaic power generation, feeding the architecture with historical PV data and historical and forecast weather data. A global take-home message arises from these three real-world applications: multimodal learning has the potential to improve outcomes in various critical fields thanks to the inherent possibility to merge data representation from different sources as well as to the opportunity to learn boundaries in these abstract spaces, using both traditional machine learning and deep learning algorithms. Specifically, in the context of predicting outcomes in radiation therapy using a multimodal framework, the concurrent fusion of three modalities significantly enhances performance compared to using standalone data flows. In the well-being application, the synergy between the dynamic nature of the DRL paradigm and multimodal data collected from wearable devices yields impressive results in early stress detection. Lastly, the integration of historical PV data with historical and forecast weather data enables our model to consistently outperform the current state-of-the-art, even in challenging weather conditions.
Exploring the Potential of Multimodal (Deep) Learning
TORTORA, MATTEO
2024
Abstract
Multimodal Learning involves integrating heterogeneous data from multiple sources, all gained from observing the same phenomenon. By leveraging these multimodal data sources, we can potentially derive a more comprehensive, robust, and enriched representation to enhance the robustness and performance of AI algorithms compared to the use of a stand-alone modality. While formal proof is lacking, this intuitive approach has yielded promising results in numerous applications. Hence, in this doctoral thesis, we aim to investigate the potential of multimodal learning by examining its real-world applications in three diverse industries such as healthcare, well-being, and energy. In this respect, we first present a multimodal approach for predicting treatment outcomes in patients suffering from non-small-cell lung cancer undergoing radiotherapy. The approach integrates abstract representations computed from radiological images, whole-slide images, and tabular data. Second, we introduce an innovative method for early stress detection, which integrates multimodal deep learning and deep reinforcement learning (DRL). It works with time series data, integrating information gathered from a wearable device collecting physiological data. As a third industry, we focus on the smart energy communities, introducing a novel multimodal architecture for forecasting photovoltaic (PV) power generation. It combines the artificial intelligence paradigm with physical knowledge of photovoltaic power generation, feeding the architecture with historical PV data and historical and forecast weather data. A global take-home message arises from these three real-world applications: multimodal learning has the potential to improve outcomes in various critical fields thanks to the inherent possibility to merge data representation from different sources as well as to the opportunity to learn boundaries in these abstract spaces, using both traditional machine learning and deep learning algorithms. Specifically, in the context of predicting outcomes in radiation therapy using a multimodal framework, the concurrent fusion of three modalities significantly enhances performance compared to using standalone data flows. In the well-being application, the synergy between the dynamic nature of the DRL paradigm and multimodal data collected from wearable devices yields impressive results in early stress detection. Lastly, the integration of historical PV data with historical and forecast weather data enables our model to consistently outperform the current state-of-the-art, even in challenging weather conditions.File | Dimensione | Formato | |
---|---|---|---|
final_manuscript_mtortora.pdf
accesso aperto
Dimensione
3.46 MB
Formato
Adobe PDF
|
3.46 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/122849
URN:NBN:IT:UNICAMPUS-122849