SUSTAINABLE COMPUTING IN THE AI ERA: ENERGY PROFILING AND RECONFIGURABLE HIGH-PERFORMANCE COMPUTING

Leon Vega, Luis Gerardo

The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.

SUSTAINABLE COMPUTING IN THE AI ERA: ENERGY PROFILING AND RECONFIGURABLE HIGH-PERFORMANCE COMPUTING

LEON VEGA, LUIS GERARDO

2026

Abstract

The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				APPLIED DATA SCIENCE AND ARTIFICIAL INTELLIGENCE
			
	Data di pubblicazione
	
				20-gen-2026
			
	Lingua
	
				Inglese
			
	Abstract in italiano
	
				The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs).

First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments.

Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively.

The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x.

Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.
			
	Parola chiave
	
				approximate computin; hardware acceleratio; fpga; machine learning; reconfigurable compu
			
	Relatore, Supervisor, Advisor o Tutor
	
				COZZINI, STEFANO
			
	Nome Editore
	
				Università degli Studi di Trieste
			
	Collezione di appartenenza
	
				Università degli Studi di Trieste

File in questo prodotto:

File	Dimensione	Formato
PhD_Thesis-3.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 5.64 MB Formato Adobe PDF Visualizza/Apri	5.64 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/354568

Il codice NBN di questa tesi è URN:NBN:IT:UNITS-354568