Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.
Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.
From Pipeline Optimization To Problem-oriented Automl: Advancing Clustering Automation
CAMILO DA SILVA, MATHEUS
2026
Abstract
Automated Machine Learning (AutoML) aims to lower the entry barrier of machine learning by automating the design of pipelines, including the selection of techniques, algorithms and their parameters. While substantial progress has been made in supervised learning, unsupervised learning remains challenging due to the absence of universal goals such as accuracy. In this context, meta-learning plays a crucial role by leveraging prior knowledge to recommend algorithms or configurations based on dataset characteristics. Yet clustering is inherently subjective: success often depends on user goals. Since Au- toML’s mission is to place the user at the center, this thesis explores how AutoML and meta-learning can be unified to automatically provide users with problem-oriented clus- tering pipelines. We first investigate pipeline synthesis by extending evolutionary optimisation meth- ods from supervised learning to clustering. Benchmarking across diverse datasets shows that optimising for individual clustering validity indices or their ensembles is insufficient. These results motivate the use of meta-objectives and surrogate models to flexibly guide search in alignment with user intent. Next, we study what is required to build robust meta-spaces and meta-objectives. Through a systematic review of AutoClustering literature, we propose a taxonomy of datasets and meta-features, analyse their influence, and show how meta-models can be simplified without substantial performance loss. Finally, we integrate these insights into the Problem-oriented AutoML in Clustering (PoAC) framework, which aligns meta-features, objectives, and optimisation strategies with problem-specific requirements, enabling adaptive, algorithm-agnostic clustering au- tomation.| File | Dimensione | Formato | |
|---|---|---|---|
|
Thesis_Matheus_review.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
2.72 MB
Formato
Adobe PDF
|
2.72 MB | Adobe PDF | Visualizza/Apri |
|
Thesis_Matheus_review_1.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
2.72 MB
Formato
Adobe PDF
|
2.72 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/355116
URN:NBN:IT:UNITS-355116