This thesis discuss the design and the implementation of an FPGA-Based Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs (LQCD) and fluid-dynamics applications based on Lattice Boltzmann Methods (LBM). State-of-the-art programs in this (and other similar) applications have a large degree of available parallelism, that can be easily exploited on massively parallel systems, provided the underlying communication network has not only high-bandwidth but also low-latency. I have designed in details, built and tested in hardware, firmware and software an implementation of a Network Processor, tailored for the most recent families of multi-core processors. The implementation has been developed on an FPGA device to easily interface the logic of NWP with the CPU I/O sub-system. In this work I have assessed several ways to move data between the main memory of the CPU and the I/O sub-system to exploit high data throughput and low latency, enabling the use of “Programmed Input Output” (PIO), “Direct Memory Access” (DMA) and “Write Combining” memory-settings. On the software side, I developed and test a device driver for the Linux operating system to access the NWP device, as well as a system library to efficiently access the network device from user-applications. This thesis demonstrates the feasibility of a network infrastructure that saturates the maximum bandwidth of the I/O sub-systems available on recent CPUs, and reduces communication latencies to values very close to those needed by the processor to move data across the chip boundary.
A Scalable Parallel Architecture with FPGA-Based Network Processor for Scientific Computing
2012
Abstract
This thesis discuss the design and the implementation of an FPGA-Based Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs (LQCD) and fluid-dynamics applications based on Lattice Boltzmann Methods (LBM). State-of-the-art programs in this (and other similar) applications have a large degree of available parallelism, that can be easily exploited on massively parallel systems, provided the underlying communication network has not only high-bandwidth but also low-latency. I have designed in details, built and tested in hardware, firmware and software an implementation of a Network Processor, tailored for the most recent families of multi-core processors. The implementation has been developed on an FPGA device to easily interface the logic of NWP with the CPU I/O sub-system. In this work I have assessed several ways to move data between the main memory of the CPU and the I/O sub-system to exploit high data throughput and low latency, enabling the use of “Programmed Input Output” (PIO), “Direct Memory Access” (DMA) and “Write Combining” memory-settings. On the software side, I developed and test a device driver for the Linux operating system to access the NWP device, as well as a system library to efficiently access the network device from user-applications. This thesis demonstrates the feasibility of a network infrastructure that saturates the maximum bandwidth of the I/O sub-systems available on recent CPUs, and reduces communication latencies to values very close to those needed by the processor to move data across the chip boundary.I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/147625
URN:NBN:IT:UNIFE-147625