The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.

The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.

SUSTAINABLE COMPUTING IN THE AI ERA: ENERGY PROFILING AND RECONFIGURABLE HIGH-PERFORMANCE COMPUTING

LEON VEGA, LUIS GERARDO
2026

Abstract

The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.
20-gen-2026
Inglese
The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.
approximate computin; hardware acceleratio; fpga; machine learning; reconfigurable compu
COZZINI, STEFANO
Università degli Studi di Trieste
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis-3.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 5.64 MB
Formato Adobe PDF
5.64 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/354568
Il codice NBN di questa tesi è URN:NBN:IT:UNITS-354568