From Pipeline Optimization To Problem-oriented Automl: Advancing Clustering Automation

Camilo Da Silva, Matheus

Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.

From Pipeline Optimization To Problem-oriented Automl: Advancing Clustering Automation

CAMILO DA SILVA, MATHEUS

2026

Abstract

Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				APPLIED DATA SCIENCE AND ARTIFICIAL INTELLIGENCE
			
	Data di pubblicazione
	
				22-gen-2026
			
	Lingua
	
				Inglese
			
	Abstract in italiano
	
				Automated Machine Learning (AutoML) aims to lower the entry barrier of machine
learning by automating the design of pipelines, including the selection of techniques,
algorithms and their parameters. While substantial progress has been made in supervised
learning, unsupervised learning remains challenging due to the absence of universal goals
such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior
knowledge to recommend algorithms or configurations based on dataset characteristics.
Yet clustering is inherently subjective: success often depends on user goals. Since Au-
toML’s mission is to place the user at the center, this thesis explores how AutoML and
meta-learning can be unified to automatically provide users with problem-oriented clus-
tering pipelines.
We first investigate pipeline synthesis by extending evolutionary optimisation meth-
ods from supervised learning to clustering. Benchmarking across diverse datasets shows
that optimising for individual clustering validity indices or their ensembles is insufficient.
These results motivate the use of meta-objectives and surrogate models to flexibly guide
search in alignment with user intent.
Next, we study what is required to build robust meta-spaces and meta-objectives.
Through a systematic review of AutoClustering literature, we propose a taxonomy of
datasets and meta-features, analyse their influence, and show how meta-models can be
simplified without substantial performance loss.
Finally, we integrate these insights into the Problem-oriented AutoML in Clustering
(PoAC) framework, which aligns meta-features, objectives, and optimisation strategies
with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au-
tomation.
			
	Parola chiave
	
				AutoML; Meta-learning; Clustering; Pipeline Synthesis; Unsupervised
			
	Relatore, Supervisor, Advisor o Tutor
	
				MEDVET, Eric
Barbon Junior, Sylvio
			
	Nome Editore
	
				Università degli Studi di Trieste
			
	Collezione di appartenenza
	
				Università degli Studi di Trieste

File in questo prodotto:

File	Dimensione	Formato
Thesis_Matheus_review.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 2.72 MB Formato Adobe PDF Visualizza/Apri	2.72 MB	Adobe PDF	Visualizza/Apri
Thesis_Matheus_review_1.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 2.72 MB Formato Adobe PDF Visualizza/Apri	2.72 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/355116

Il codice NBN di questa tesi è URN:NBN:IT:UNITS-355116