Machine Learning and Cryptocurrency Markets: Methods and Evidence

Pennella, Luca

Cryptocurrency markets and blockchain-based financial infrastructures generate data at unprecedented scale and granularity, while also introducing novel risks, market microstructures, and fast-evolving regulatory debates. In parallel, machine learning (ML) delivers strong predictive performance but is frequently criticized for limited transparency, an issue that becomes central when model outputs can affect economic, legal, or policy decisions. This thesis develops reproducible, interpretable, and domain methodologies at the intersection of explainable ML and digital-asset economics, organized around three complementary objectives: (i) designing explainable ML pipelines for complex socio-economic phenomena, (ii) characterizing investor heterogeneity and regulatory attitudes in crypto markets using international survey data, and (iii) measuring micro-level token circulation and systematizing decentralized derivatives protocols in DeFi. On the methodological side, the thesis proposes explainability workflows for risk-sensitive classification that preserve predictive quality while enabling credible interpretation at both local and global levels. It introduces X-SPIDE, an explainable pipeline for detecting smart Ponzi contracts on Ethereum, and demonstrates how interpretability-first principles can be transferred to voting-intention prediction using survey data and SHAP diagnostics. It further studies the interaction between class imbalance, class overlap, and oversampling, showing via controlled simulations that oversampling effectiveness depends on data geometry rather than imbalance alone. The thesis also proposes Decision Predicate Graphs, a model-specific global interpretability method for tree ensembles that supports structure-aware inspection when rule-based summaries become impractical. On the empirical side, the thesis leverages international survey evidence to profile crypto users and link beliefs to behaviors. It documents that memecoin holders constitute a distinct segment with recognizable demographic and psychological patterns, and it maps heterogeneous support for regulatory domains to perceived market illegitimacy and to individual exposure to crypto wealth. At the Decentralized Finance (DeFi) protocol level, the thesis introduces a micro-velocity methodology tailored to liquid staking tokens and provides evidence on tokens circulation intensity, concentration of turnover, and a progressive shift toward wstETH consistent with composability. It also systematizes DeFi derivatives through a unified representation of actors, flows, and design principles, operationalized via a tuple-based formalism and a reproducible simulation environment for comparative analysis.

Machine Learning and Cryptocurrency Markets: Methods and Evidence

PENNELLA, LUCA

2026

Abstract

Cryptocurrency markets and blockchain-based financial infrastructures generate data at unprecedented scale and granularity, while also introducing novel risks, market microstructures, and fast-evolving regulatory debates. In parallel, machine learning (ML) delivers strong predictive performance but is frequently criticized for limited transparency, an issue that becomes central when model outputs can affect economic, legal, or policy decisions. This thesis develops reproducible, interpretable, and domain methodologies at the intersection of explainable ML and digital-asset economics, organized around three complementary objectives: (i) designing explainable ML pipelines for complex socio-economic phenomena, (ii) characterizing investor heterogeneity and regulatory attitudes in crypto markets using international survey data, and (iii) measuring micro-level token circulation and systematizing decentralized derivatives protocols in DeFi. On the methodological side, the thesis proposes explainability workflows for risk-sensitive classification that preserve predictive quality while enabling credible interpretation at both local and global levels. It introduces X-SPIDE, an explainable pipeline for detecting smart Ponzi contracts on Ethereum, and demonstrates how interpretability-first principles can be transferred to voting-intention prediction using survey data and SHAP diagnostics. It further studies the interaction between class imbalance, class overlap, and oversampling, showing via controlled simulations that oversampling effectiveness depends on data geometry rather than imbalance alone. The thesis also proposes Decision Predicate Graphs, a model-specific global interpretability method for tree ensembles that supports structure-aware inspection when rule-based summaries become impractical. On the empirical side, the thesis leverages international survey evidence to profile crypto users and link beliefs to behaviors. It documents that memecoin holders constitute a distinct segment with recognizable demographic and psychological patterns, and it maps heterogeneous support for regulatory domains to perceived market illegitimacy and to individual exposure to crypto wealth. At the Decentralized Finance (DeFi) protocol level, the thesis introduces a micro-velocity methodology tailored to liquid staking tokens and provides evidence on tokens circulation intensity, concentration of turnover, and a progressive shift toward wstETH consistent with composability. It also systematizes DeFi derivatives through a unified representation of actors, flows, and design principles, operationalized via a tuple-based formalism and a reproducible simulation environment for comparative analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				APPLIED DATA SCIENCE AND ARTIFICIAL INTELLIGENCE
			
	Data di pubblicazione
	
				24-mar-2026
			
	Lingua
	
				Inglese
			
	Abstract in italiano
	
				Cryptocurrency markets and blockchain-based financial infrastructures generate data at unprecedented scale and granularity, while also introducing novel risks, market microstructures, and fast-evolving regulatory debates. In parallel, machine learning (ML) delivers strong predictive performance but is frequently criticized for limited transparency, an issue that becomes central when model outputs can affect economic, legal, or policy decisions. This thesis develops reproducible, interpretable, and domain methodologies at the intersection of explainable ML and digital-asset economics, organized around three complementary objectives: (i) designing explainable ML pipelines for complex socio-economic phenomena, (ii) characterizing investor heterogeneity and regulatory attitudes in crypto markets using international survey data, and (iii) measuring micro-level token circulation and systematizing decentralized derivatives protocols in DeFi.
On the methodological side, the thesis proposes explainability workflows for risk-sensitive classification that preserve predictive quality while enabling credible interpretation at both local and global levels. It introduces X-SPIDE, an explainable pipeline for detecting smart Ponzi contracts on Ethereum, and demonstrates how interpretability-first principles can be transferred to voting-intention prediction using survey data and SHAP diagnostics. It further studies the interaction between class imbalance, class overlap, and oversampling, showing via controlled simulations that oversampling effectiveness depends on data geometry rather than imbalance alone. The thesis also proposes Decision Predicate Graphs, a model-specific global interpretability method for tree ensembles that supports structure-aware inspection when rule-based summaries become impractical.
On the empirical side, the thesis leverages international survey evidence to profile crypto users and link beliefs to behaviors. It documents that memecoin holders constitute a distinct segment with recognizable demographic and psychological patterns, and it maps heterogeneous support for regulatory domains to perceived market illegitimacy and to individual exposure to crypto wealth.
At the Decentralized Finance (DeFi) protocol level, the thesis introduces a micro-velocity methodology tailored to liquid staking tokens and provides evidence on tokens circulation intensity, concentration of turnover, and a progressive shift toward wstETH consistent with composability. It also systematizes DeFi derivatives through a unified representation of actors, flows, and design principles, operationalized via a tuple-based formalism and a reproducible simulation environment for comparative analysis.
			
	Parola chiave
	
				Machine Learning; Blockchain; Cryptocurrencies; DeFi; Survey Data
			
	Relatore, Supervisor, Advisor o Tutor
	
				BIASIOL FRANCESCO
GALLETTA LETTERIO
TORELLI, Nicola
			
	Nome Editore
	
				Università degli Studi di Trieste
			
	Collezione di appartenenza
	
				Università degli Studi di Trieste

File in questo prodotto:

File	Dimensione	Formato
PhD_Thesis_ADSAI_Pennella_final.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 9.21 MB Formato Adobe PDF Visualizza/Apri	9.21 MB	Adobe PDF	Visualizza/Apri
PhD_Thesis_ADSAI_Pennella_final_1.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 9.21 MB Formato Adobe PDF Visualizza/Apri	9.21 MB	Adobe PDF	Visualizza/Apri
PhD_Thesis_ADSAI_Pennella_final_2.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 9.21 MB Formato Adobe PDF Visualizza/Apri	9.21 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/363718

Il codice NBN di questa tesi è URN:NBN:IT:UNITS-363718