The proliferation of unstructured, multimodal data presents a significant challenge for effective knowledge extraction, due to the heterogeneous nature and the complexity of extracting meaningful patterns in environments presenting diverse data types. This thesis proposes SHIFT, the first seed-guided hierarchical topic modelling framework specifically designed for heterogeneous data environments. It combines unsupervised information extraction with advanced representation learning techniques, incorporating external knowledge bases to enhance semantic understanding. SHIFT modular architecture provides seamless adaptation to different data modalities and domain requirements. The framework’s adaptability is demonstrated through comprehensive applications across distinct domains, from legal text analysis, through scientific literature understanding, to digital humanities research. Beyond the core SHIFT framework, this thesis presents the development and application of complementary approaches tailored to domain-specific challenges. Through extensive evaluation, the research validates the technical effectiveness and practical utility of these frameworks for real-world knowledge extraction challenges. This work contributes to advances in multimodal topic modelling while demonstrating the critical importance of adaptive, modular approaches for handling the complexity and diversity of contemporary unstructured data across multiple academic and professional contexts.
ADAPTIVE FRAMEWORKS FOR KNOWLEDGE EXTRACTION IN HETEROGENEOUS DATA ENVIRONMENTS
PICASCIA, SERGIO
2025
Abstract
The proliferation of unstructured, multimodal data presents a significant challenge for effective knowledge extraction, due to the heterogeneous nature and the complexity of extracting meaningful patterns in environments presenting diverse data types. This thesis proposes SHIFT, the first seed-guided hierarchical topic modelling framework specifically designed for heterogeneous data environments. It combines unsupervised information extraction with advanced representation learning techniques, incorporating external knowledge bases to enhance semantic understanding. SHIFT modular architecture provides seamless adaptation to different data modalities and domain requirements. The framework’s adaptability is demonstrated through comprehensive applications across distinct domains, from legal text analysis, through scientific literature understanding, to digital humanities research. Beyond the core SHIFT framework, this thesis presents the development and application of complementary approaches tailored to domain-specific challenges. Through extensive evaluation, the research validates the technical effectiveness and practical utility of these frameworks for real-world knowledge extraction challenges. This work contributes to advances in multimodal topic modelling while demonstrating the critical importance of adaptive, modular approaches for handling the complexity and diversity of contemporary unstructured data across multiple academic and professional contexts.| File | Dimensione | Formato | |
|---|---|---|---|
|
phd_unimi_R13753.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
6.08 MB
Formato
Adobe PDF
|
6.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/353113
URN:NBN:IT:UNIMI-353113