A MACHINE LEARNING ENHANCED FRAMEWORK FOR NAVIGATING THE DIGITAL SPACE: RESULTS FROM THE CASE OF FINFLUENCERS

De Matteo, Francesco

This thesis demonstrates how machine learning-based web data extraction can serve as both a methodological foundation and a research enabler for computational social science, particularly in domains shaped by multimodal, high-velocity content. Building on a modular and reproducible architecture, the work integrates automated web scraping, transformer-based language models, and advanced object detection to collect and analyze large-scale social media data. It argues for the broader adoption of these tools within the social sciences, highlighting their growing accessibility and alignment with ethical research standards. Using this infrastructure, the thesis turns to the case of financial influencers (finfluencers), an emergent and under-theorized category of digital actors who communicate financial content on platforms such as Instagram. A comprehensive desk review maps the definitional contours, content strategies, monetization models, regulatory tensions, and ethical risks associated with finfluencer activity. This section establishes the conceptual grounding required for empirical analysis and identifies key research gaps in the literature. In the final section, two interlinked experiments are presented. First, a large-scale thematic analysis is conducted on 22.854 Instagram captions using an LLM-assisted coding approach, revealing 36 dominant content themes. Second, a predictive model employing random forest classifiers is developed to test whether visual cues from post images, extracted via the YOLOv11 object detection model, can predict the thematic category associated with the captions. Results indicate a strong correspondence between visual and textual elements, offering new pathways for analyzing digital content where text is limited or absent.

A MACHINE LEARNING ENHANCED FRAMEWORK FOR NAVIGATING THE DIGITAL SPACE: RESULTS FROM THE CASE OF FINFLUENCERS

DE MATTEO, FRANCESCO

2026

Abstract

This thesis demonstrates how machine learning-based web data extraction can serve as both a methodological foundation and a research enabler for computational social science, particularly in domains shaped by multimodal, high-velocity content. Building on a modular and reproducible architecture, the work integrates automated web scraping, transformer-based language models, and advanced object detection to collect and analyze large-scale social media data. It argues for the broader adoption of these tools within the social sciences, highlighting their growing accessibility and alignment with ethical research standards. Using this infrastructure, the thesis turns to the case of financial influencers (finfluencers), an emergent and under-theorized category of digital actors who communicate financial content on platforms such as Instagram. A comprehensive desk review maps the definitional contours, content strategies, monetization models, regulatory tensions, and ethical risks associated with finfluencer activity. This section establishes the conceptual grounding required for empirical analysis and identifies key research gaps in the literature. In the final section, two interlinked experiments are presented. First, a large-scale thematic analysis is conducted on 22.854 Instagram captions using an LLM-assisted coding approach, revealing 36 dominant content themes. Second, a predictive model employing random forest classifiers is developed to test whether visual cues from post images, extracted via the YOLOv11 object detection model, can predict the thematic category associated with the captions. Results indicate a strong correspondence between visual and textual elements, offering new pathways for analyzing digital content where text is limited or absent.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMMUNICATION, MARKETS AND SOCIETY
			
	Data di pubblicazione
	
				18-mar-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				MASSARA, FRANCESCO
			
	Collezione di appartenenza
	
				Libera Università di Lingue e Comunicazione (IULM)

File in questo prodotto:

File	Dimensione	Formato
Francesco De Matteo THESIS REVIEWED.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 1.41 MB Formato Adobe PDF Visualizza/Apri	1.41 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/361617

Il codice NBN di questa tesi è URN:NBN:IT:IULM-361617