A multi-FPGA high performance computing system for 3D FFT-based numerical simulations

Ammendola, Roberto

In the field of High Performance Computing, communications among processes represent a typical bottleneck for massively parallel scientific applications. Object of this research is the development of a network interface card with specific offloading capabilities that could help large scale simulations in terms of communication latency and scalability with the number of computing elements. Until the early 2000s, general purpose single-core CPU-based systems were the processing systems of choice for HPC applications. They replaced exotic supercomputing architectures because they were inexpensive, and performance scaled with frequency in line with Moore’s Law. After the mid2000s the multi-core architecture era started as the only viable solution to keep up with predicted performance scaling. It is around year 2010 that CPU-based systems augmented with hardware accelerators as co-processors started to emerge as an alternative to CPU-only systems. This has opened up opportunities for accelerators, mainly General Purpose Graphics Processing Units (GPGPUs) to advance HPC to previously unattainable performance levels [1]. Since then, programmable device technology (namely Field Programmable Gate Arrays, or FPGA), while sharing the same silicon complexity of a GPGPU, has struggled to emerge as a real accelerator competitor, mainly due (i) to the lack of well-established high level synthesis tools, (ii) higher costs and slower lead times, (iii) an actually poor result in terms of time-to-solution [2]. By mitigating these negative aspects, the FPGA technology started becoming known and widespread lately by leveraging its own peculiarities, which are the re-configurable computing approach and the high power efficiency [3] [4]. Moreover nowadays FPGAs, thanks to the variety of embedded on-chip resources, allow offloading of increasingly complex tasks: not only for the pure computational part on an algorithm, but also for the communication part in the case of distributed parallel systems [5]. In particular in this thesis a specific computational task has been addressed, the three-dimensional Fast Fourier Transform (3D FFT), which is peculiarly weighty for the interconnection network when parallel systems are involved. The main goal of this study is finding a clever way to move part of the computational weight closer to the network, in order to exploit the communication patterns peculiarities and eventually take advantage of data reuse within the process of transmission.

A multi-FPGA high performance computing system for 3D FFT-based numerical simulations

AMMENDOLA, ROBERTO

2018

Abstract

In the field of High Performance Computing, communications among processes represent a typical bottleneck for massively parallel scientific applications. Object of this research is the development of a network interface card with specific offloading capabilities that could help large scale simulations in terms of communication latency and scalability with the number of computing elements. Until the early 2000s, general purpose single-core CPU-based systems were the processing systems of choice for HPC applications. They replaced exotic supercomputing architectures because they were inexpensive, and performance scaled with frequency in line with Moore’s Law. After the mid2000s the multi-core architecture era started as the only viable solution to keep up with predicted performance scaling. It is around year 2010 that CPU-based systems augmented with hardware accelerators as co-processors started to emerge as an alternative to CPU-only systems. This has opened up opportunities for accelerators, mainly General Purpose Graphics Processing Units (GPGPUs) to advance HPC to previously unattainable performance levels [1]. Since then, programmable device technology (namely Field Programmable Gate Arrays, or FPGA), while sharing the same silicon complexity of a GPGPU, has struggled to emerge as a real accelerator competitor, mainly due (i) to the lack of well-established high level synthesis tools, (ii) higher costs and slower lead times, (iii) an actually poor result in terms of time-to-solution [2]. By mitigating these negative aspects, the FPGA technology started becoming known and widespread lately by leveraging its own peculiarities, which are the re-configurable computing approach and the high power efficiency [3] [4]. Moreover nowadays FPGAs, thanks to the variety of embedded on-chip resources, allow offloading of increasingly complex tasks: not only for the pure computational part on an algorithm, but also for the communication part in the case of distributed parallel systems [5]. In particular in this thesis a specific computational task has been addressed, the three-dimensional Fast Fourier Transform (3D FFT), which is peculiarly weighty for the interconnection network when parallel systems are involved. The main goal of this study is finding a clever way to move part of the computational weight closer to the network, in order to exploit the communication patterns peculiarities and eventually take advantage of data reuse within the process of transmission.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Ingegneria elettronica
			
	Data di pubblicazione
	
				2018
			
	Lingua
	
				Inglese
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				LORETI, PIERPAOLO
			
	Nome Editore
	
				Università degli Studi di Roma "Tor Vergata"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma Tor Vergata

File in questo prodotto:

File	Dimensione	Formato
tesi_ammendola.pdf accesso solo da BNCF e BNCR Dimensione 3.62 MB Formato Adobe PDF	3.62 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/214476

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA2-214476