The High-Luminosity LHC (HL-LHC) will confront the CMS experiment with unprecedented instantaneous luminosity and pileup, requiring a fundamental redesign of the real-time selection chain. This thesis advances along two complementary axes: the design and hardware realization of fast machine-learning (ML) conditions in the Phase-2 CMS Global Trigger (GT), and the development of FERoCE, a high-throughput, low-latency RDMA-based data path tailored to future Trigger/DAQ needs. This thesis work presents a scalable, non–time-multiplexed Phase-2 GT architecture and a Final-OR stage provisioned for O(103) algorithms, with per-algorithm pre-scales, preview pre-scales, dead-time aware monitoring, and trigger-type mapping. Within this framework, ML-based Level-1 conditions are implemented directly in the Global Trigger FPGA fabric, including all pre-processing (normalization and invariant mass) and fixed-latency re-timing. Across representative topologies (vector-boson fusion Higgs and di-Higgs final states: H → b¯b, H → τ +τ −, H → inv., HH → 4b, HH → 2b2τ ), BDTs and DNNs achieve comparable discrimination at fixed L1 output rate; DNNs are modestly heavier in resources, whereas BDTs remain notably compact (no DSP usage) with sub-100 ns inference. Including pre-processing, all designs fit comfortably within the Global trigger latency budget and use little FPGA resources, enabling multiple concurrent ML conditions. End-to-end “slice tests” validate functionality and timing from upstream subsystems through the GT and with collision data in a Drift-Tube slice at Point-5. On the data-movement side, the FERoCE (Front-End RoCE) project is introduced, an FPGA RoCEv2 stack for DAQ that sustains line-rate throughput up to 400 Gbps, integrates GPUDirect for FPGA- to-GPU transfers, and offers congestion control through layer 2 flow control. Benchmarks and system tests demonstrate ≳ 99% link utilization, expected latency scaling with payload, and fairness under multi-endpoint load, establishing RDMA as a viable, scalable option for moving detector data and trigger objects closer to heterogeneous accelerators. Taken together, these results show that ML conditions in hardware, coupled with RDMA-class transport, can raise physics efficiency at fixed bandwidth (or reduce rate at fixed efficiency) and provide a practical path to accelerator-aware Trigger/DAQ pipelines for the HL-LHC era reducing latency and increasing throughput.
Fast Machine Learning for the Phase-2 CMS Global Trigger and FERoCE RDMA transmitter for the next generation of Trigger and DAQ systems
BORTOLATO, GABRIELE
2025
Abstract
The High-Luminosity LHC (HL-LHC) will confront the CMS experiment with unprecedented instantaneous luminosity and pileup, requiring a fundamental redesign of the real-time selection chain. This thesis advances along two complementary axes: the design and hardware realization of fast machine-learning (ML) conditions in the Phase-2 CMS Global Trigger (GT), and the development of FERoCE, a high-throughput, low-latency RDMA-based data path tailored to future Trigger/DAQ needs. This thesis work presents a scalable, non–time-multiplexed Phase-2 GT architecture and a Final-OR stage provisioned for O(103) algorithms, with per-algorithm pre-scales, preview pre-scales, dead-time aware monitoring, and trigger-type mapping. Within this framework, ML-based Level-1 conditions are implemented directly in the Global Trigger FPGA fabric, including all pre-processing (normalization and invariant mass) and fixed-latency re-timing. Across representative topologies (vector-boson fusion Higgs and di-Higgs final states: H → b¯b, H → τ +τ −, H → inv., HH → 4b, HH → 2b2τ ), BDTs and DNNs achieve comparable discrimination at fixed L1 output rate; DNNs are modestly heavier in resources, whereas BDTs remain notably compact (no DSP usage) with sub-100 ns inference. Including pre-processing, all designs fit comfortably within the Global trigger latency budget and use little FPGA resources, enabling multiple concurrent ML conditions. End-to-end “slice tests” validate functionality and timing from upstream subsystems through the GT and with collision data in a Drift-Tube slice at Point-5. On the data-movement side, the FERoCE (Front-End RoCE) project is introduced, an FPGA RoCEv2 stack for DAQ that sustains line-rate throughput up to 400 Gbps, integrates GPUDirect for FPGA- to-GPU transfers, and offers congestion control through layer 2 flow control. Benchmarks and system tests demonstrate ≳ 99% link utilization, expected latency scaling with payload, and fairness under multi-endpoint load, establishing RDMA as a viable, scalable option for moving detector data and trigger objects closer to heterogeneous accelerators. Taken together, these results show that ML conditions in hardware, coupled with RDMA-class transport, can raise physics efficiency at fixed bandwidth (or reduce rate at fixed efficiency) and provide a practical path to accelerator-aware Trigger/DAQ pipelines for the HL-LHC era reducing latency and increasing throughput.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_Gabriele_Bortolato.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
17.64 MB
Formato
Adobe PDF
|
17.64 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/354499
URN:NBN:IT:UNIPD-354499