Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.

Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.

From Pipeline Optimization To Problem-oriented Automl: Advancing Clustering Automation

CAMILO DA SILVA, MATHEUS
2026

Abstract

Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.
22-gen-2026
Inglese
Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.
AutoML; Meta-learning; Clustering; Pipeline Synthesis; Unsupervised
MEDVET, Eric
Barbon Junior, Sylvio
Università degli Studi di Trieste
File in questo prodotto:
File Dimensione Formato  
Thesis_Matheus_review.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.72 MB
Formato Adobe PDF
2.72 MB Adobe PDF Visualizza/Apri
Thesis_Matheus_review_1.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.72 MB
Formato Adobe PDF
2.72 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/355116
Il codice NBN di questa tesi è URN:NBN:IT:UNITS-355116