RECENT PROGRESS IN DEEP LEARNING HAS LARGELY BEEN ACHIEVED BY SCALING MODEL CAPACITY AND TRAINING DATA, LEVERAGING EVER-INCREASING COMPUTATIONAL RESOURCES TO OBTAIN INCREMENTAL PERFORMANCE GAINS. IN PARALLEL WITH THIS TREND, A COMPLEMENTARY DIRECTION IS OPENED BY STRUCTURALLY IMPROVING THE ATOMIC COMPONENTS THAT CONSTITUTE DEEP ARCHITECTURES, SO THAT EACH LOCAL ENHANCEMENT CAN PROPAGATE THROUGH THE ENTIRE NETWORK AND YIELD GLOBAL BENEFITS IN EFFICIENCY, STABILITY, AND ACCURACY. IN THIS THESIS, ATTENTION IS DIRECTED TO TWO FUNDAMENTAL CLASSES OF ARCHITECTURAL BLOCKS: POOLING OPERATORS AND CONVOLUTIONAL MODULES. FIRST, A FAMILY OF TRAINABLE POOLING FUNCTIONS INSPIRED BY FUZZY AGGREGATION THEORY IS INTRODUCED, IN WHICH FEATURE AGGREGATION IS ADAPTIVELY MODULATED RATHER THAN FIXED. WHEN THESE OPERATORS ARE EMBEDDED IN AUTOENCODER-STYLE ARCHITECTURES FOR BOTH IMAGES AND TIME SERIES, MORE INFORMATIVE LATENT REPRESENTATIONS ARE OBTAINED AND RECONSTRUCTION ERROR IS REDUCED WITH RESPECT TO TRADITIONAL AND CONTEMPORARY POOLING STRATEGIES, WITHOUT A SUBSTANTIAL INCREASE IN PARAMETER COUNT OR COMPUTATIONAL COST. SECOND, THE ROLE OF CONVOLUTIONAL LAYERS IN AUDIO SOURCE SEPARATION IS RECONSIDERED BY DESIGNING AN ENCODER–DECODER ARCHITECTURE BASED ON MULTI-PATH RESIDUAL PROCESSING. PARALLEL CONVOLUTIONAL PATHWAYS OPERATING AT MULTIPLE TEMPORAL SCALES ARE EMPLOYED, WHOSE OUTPUTS ARE COMBINED THROUGH LEARNABLE FUSION MECHANISMS THAT EMPHASIZE CONTEXT-RELEVANT INFORMATION. THIS DESIGN LEADS TO MORE STABLE TRAINING DYNAMICS AND IMPROVED SEPARATION QUALITY OF HUMAN VOICES FROM COMPLEX ACOUSTIC BACKGROUNDS WHEN COMPARED UNDER CONTROLLED CONDITIONS TO BOTH CONVOLUTIONAL AND TRANSFORMER-BASED BASELINES. OVERALL, THE THESIS ARGUES THAT SYSTEMATIC INNOVATION AT THE LEVEL OF THESE MICROSCOPIC ARCHITECTURAL PRIMITIVES PROVIDES A POWERFUL AND SCALABLE ROUTE FOR ADVANCING DEEP LEARNING PERFORMANCE, COMPLEMENTING LARGE-SCALE MODEL AND DATASET GROWTH AND ENABLING MORE EFFICIENT EXPLOITATION OF AVAILABLE COMPUTATIONAL BUDGETS ACROSS DIVERSE DOMAINS.

RETHINKING CORE ARCHITECTURAL BLOCKS FOR ADVANCING DEEP LEARNING PERFORMANCE

CAROLLO, MATTEO
2026

Abstract

RECENT PROGRESS IN DEEP LEARNING HAS LARGELY BEEN ACHIEVED BY SCALING MODEL CAPACITY AND TRAINING DATA, LEVERAGING EVER-INCREASING COMPUTATIONAL RESOURCES TO OBTAIN INCREMENTAL PERFORMANCE GAINS. IN PARALLEL WITH THIS TREND, A COMPLEMENTARY DIRECTION IS OPENED BY STRUCTURALLY IMPROVING THE ATOMIC COMPONENTS THAT CONSTITUTE DEEP ARCHITECTURES, SO THAT EACH LOCAL ENHANCEMENT CAN PROPAGATE THROUGH THE ENTIRE NETWORK AND YIELD GLOBAL BENEFITS IN EFFICIENCY, STABILITY, AND ACCURACY. IN THIS THESIS, ATTENTION IS DIRECTED TO TWO FUNDAMENTAL CLASSES OF ARCHITECTURAL BLOCKS: POOLING OPERATORS AND CONVOLUTIONAL MODULES. FIRST, A FAMILY OF TRAINABLE POOLING FUNCTIONS INSPIRED BY FUZZY AGGREGATION THEORY IS INTRODUCED, IN WHICH FEATURE AGGREGATION IS ADAPTIVELY MODULATED RATHER THAN FIXED. WHEN THESE OPERATORS ARE EMBEDDED IN AUTOENCODER-STYLE ARCHITECTURES FOR BOTH IMAGES AND TIME SERIES, MORE INFORMATIVE LATENT REPRESENTATIONS ARE OBTAINED AND RECONSTRUCTION ERROR IS REDUCED WITH RESPECT TO TRADITIONAL AND CONTEMPORARY POOLING STRATEGIES, WITHOUT A SUBSTANTIAL INCREASE IN PARAMETER COUNT OR COMPUTATIONAL COST. SECOND, THE ROLE OF CONVOLUTIONAL LAYERS IN AUDIO SOURCE SEPARATION IS RECONSIDERED BY DESIGNING AN ENCODER–DECODER ARCHITECTURE BASED ON MULTI-PATH RESIDUAL PROCESSING. PARALLEL CONVOLUTIONAL PATHWAYS OPERATING AT MULTIPLE TEMPORAL SCALES ARE EMPLOYED, WHOSE OUTPUTS ARE COMBINED THROUGH LEARNABLE FUSION MECHANISMS THAT EMPHASIZE CONTEXT-RELEVANT INFORMATION. THIS DESIGN LEADS TO MORE STABLE TRAINING DYNAMICS AND IMPROVED SEPARATION QUALITY OF HUMAN VOICES FROM COMPLEX ACOUSTIC BACKGROUNDS WHEN COMPARED UNDER CONTROLLED CONDITIONS TO BOTH CONVOLUTIONAL AND TRANSFORMER-BASED BASELINES. OVERALL, THE THESIS ARGUES THAT SYSTEMATIC INNOVATION AT THE LEVEL OF THESE MICROSCOPIC ARCHITECTURAL PRIMITIVES PROVIDES A POWERFUL AND SCALABLE ROUTE FOR ADVANCING DEEP LEARNING PERFORMANCE, COMPLEMENTING LARGE-SCALE MODEL AND DATASET GROWTH AND ENABLING MORE EFFICIENT EXPLOITATION OF AVAILABLE COMPUTATIONAL BUDGETS ACROSS DIVERSE DOMAINS.
25-mar-2026
TAGLIAFERRI, Roberto
Università degli Studi di Salerno
File in questo prodotto:
File Dimensione Formato  
Abstract.pdf

embargo fino al 24/03/2028

Licenza: Tutti i diritti riservati
Dimensione 71.25 kB
Formato Adobe PDF
71.25 kB Adobe PDF
Tesi Elettronica.pdf

embargo fino al 24/03/2028

Licenza: Tutti i diritti riservati
Dimensione 1.41 MB
Formato Adobe PDF
1.41 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362517
Il codice NBN di questa tesi è URN:NBN:IT:UNISA-362517