The High-Luminosity LHC (HL-LHC) will confront the CMS experiment with unprecedented instantaneous luminosity and pileup, requiring a fundamental redesign of the real-time selection chain. This thesis advances along two complementary axes: the design and hardware realization of fast machine-learning (ML) conditions in the Phase-2 CMS Global Trigger (GT), and the development of FERoCE, a high-throughput, low-latency RDMA-based data path tailored to future Trigger/DAQ needs. This thesis work presents a scalable, non–time-multiplexed Phase-2 GT architecture and a Final-OR stage provisioned for O(103) algorithms, with per-algorithm pre-scales, preview pre-scales, dead-time aware monitoring, and trigger-type mapping. Within this framework, ML-based Level-1 conditions are implemented directly in the Global Trigger FPGA fabric, including all pre-processing (normalization and invariant mass) and fixed-latency re-timing. Across representative topologies (vector-boson fusion Higgs and di-Higgs final states: H → b¯b, H → τ +τ −, H → inv., HH → 4b, HH → 2b2τ ), BDTs and DNNs achieve comparable discrimination at fixed L1 output rate; DNNs are modestly heavier in resources, whereas BDTs remain notably compact (no DSP usage) with sub-100 ns inference. Including pre-processing, all designs fit comfortably within the Global trigger latency budget and use little FPGA resources, enabling multiple concurrent ML conditions. End-to-end “slice tests” validate functionality and timing from upstream subsystems through the GT and with collision data in a Drift-Tube slice at Point-5. On the data-movement side, the FERoCE (Front-End RoCE) project is introduced, an FPGA RoCEv2 stack for DAQ that sustains line-rate throughput up to 400 Gbps, integrates GPUDirect for FPGA- to-GPU transfers, and offers congestion control through layer 2 flow control. Benchmarks and system tests demonstrate ≳ 99% link utilization, expected latency scaling with payload, and fairness under multi-endpoint load, establishing RDMA as a viable, scalable option for moving detector data and trigger objects closer to heterogeneous accelerators. Taken together, these results show that ML conditions in hardware, coupled with RDMA-class transport, can raise physics efficiency at fixed bandwidth (or reduce rate at fixed efficiency) and provide a practical path to accelerator-aware Trigger/DAQ pipelines for the HL-LHC era reducing latency and increasing throughput.

Fast Machine Learning for the Phase-2 CMS Global Trigger and FERoCE RDMA transmitter for the next generation of Trigger and DAQ systems

BORTOLATO, GABRIELE
2025

Abstract

The High-Luminosity LHC (HL-LHC) will confront the CMS experiment with unprecedented instantaneous luminosity and pileup, requiring a fundamental redesign of the real-time selection chain. This thesis advances along two complementary axes: the design and hardware realization of fast machine-learning (ML) conditions in the Phase-2 CMS Global Trigger (GT), and the development of FERoCE, a high-throughput, low-latency RDMA-based data path tailored to future Trigger/DAQ needs. This thesis work presents a scalable, non–time-multiplexed Phase-2 GT architecture and a Final-OR stage provisioned for O(103) algorithms, with per-algorithm pre-scales, preview pre-scales, dead-time aware monitoring, and trigger-type mapping. Within this framework, ML-based Level-1 conditions are implemented directly in the Global Trigger FPGA fabric, including all pre-processing (normalization and invariant mass) and fixed-latency re-timing. Across representative topologies (vector-boson fusion Higgs and di-Higgs final states: H → b¯b, H → τ +τ −, H → inv., HH → 4b, HH → 2b2τ ), BDTs and DNNs achieve comparable discrimination at fixed L1 output rate; DNNs are modestly heavier in resources, whereas BDTs remain notably compact (no DSP usage) with sub-100 ns inference. Including pre-processing, all designs fit comfortably within the Global trigger latency budget and use little FPGA resources, enabling multiple concurrent ML conditions. End-to-end “slice tests” validate functionality and timing from upstream subsystems through the GT and with collision data in a Drift-Tube slice at Point-5. On the data-movement side, the FERoCE (Front-End RoCE) project is introduced, an FPGA RoCEv2 stack for DAQ that sustains line-rate throughput up to 400 Gbps, integrates GPUDirect for FPGA- to-GPU transfers, and offers congestion control through layer 2 flow control. Benchmarks and system tests demonstrate ≳ 99% link utilization, expected latency scaling with payload, and fairness under multi-endpoint load, establishing RDMA as a viable, scalable option for moving detector data and trigger objects closer to heterogeneous accelerators. Taken together, these results show that ML conditions in hardware, coupled with RDMA-class transport, can raise physics efficiency at fixed bandwidth (or reduce rate at fixed efficiency) and provide a practical path to accelerator-aware Trigger/DAQ pipelines for the HL-LHC era reducing latency and increasing throughput.
16-dic-2025
Inglese
TRIOSSI, ANDREA
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
tesi_Gabriele_Bortolato.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 17.64 MB
Formato Adobe PDF
17.64 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/354499
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-354499