In 2012, at the Large Hadron Collider, the largest particle accelerator built by humans, the Higgs boson was discovered by the ATLAS and CMS collaborations, being the last discovered piece of the Standard Model. With a measured mass of approximately 125 GeV and no spin, the Higgs boson has the unique feature of being the only known fundamental scalar particle. Its unique role originates from the structure of its scalar potential, whose characteristic symmetry-breaking shape leads to a non-zero vacuum expectation value, resulting in electroweak symmetry breaking and enabling mass generation for elementary particles while maintaining Lorentz invariance. After the discovery, the ATLAS and CMS collaborations continued the analyses with the increasing datasets provided by the LHC entering into a precision era of Higgs-boson property measurements. Among its properties, the Higgs boson’s couplings to other particles is being studied with high precision, since even small deviations from the Standard Model predictions could provide hints for new physics. The Higgs boson decay in a beauty quark pair, having a branching fraction of approximately 58%, has a dominant influence on the total Higgs width, and precise measurements of the Yukawa coupling of the Higgs boson to b quarks provide one of the most sensitive probes for looking at deviations from the Standard Model predictions. In this context, a particularly suitable channel to study this coupling is the production of the Higgs boson in association with a vector boson V H (with V = W, Z bosons), followed by its H → b ¯b decay. In fact, leptonic decay channels of the accompanying vector boson V provide clean experimental signatures which allow for a strong suppression of the otherwise overwhelming background processes. As potential deviations from the Standard Model are enhanced at high energy scales, measurements in the boosted Higgs regime provide increased sensitivity to modifications of Higgs interactions, where higher-order effective operators introduce momentum-dependent corrections. In this regime, the cross section hierarchy changes with respect to the inclusive one where gluon-gluon fusion is nearly 90% of the total, and the contribution of V H production becomes increasingly significant while also offering clean leptonic signatures that improve the selection of V H, H → b ¯b events. As a correct classification of the final state is fundamental for high precision measurements, over the years, several of the so-called flavour-tagging algorithms, i.e. algorithms able to identify and classify the flavour of the quark which produced the final state jets, were developed. Considering that in the boosted regime the two b-quarks in the final state become highly collimated, leading to two overlapping jet signatures, a dedicated flavour-tagging procedure is needed to accurately identify and classify the flavour of the resulting jets. In fact, while at low energies the two final state jets are reconstructed and classified separately, in the boosted case the final state is reconstructed as a single jet that contains the products of both b-quark hadronizations. The most recent algorithm developed by the ATLAS Collaboration is the GN2X tagger, a transformer-based model that relies exclusively on track-level information within the reconstructed large-radius jet to classify the flavour of the full reconstructed jet. Thanks to its superior performance compared to previous approaches, and even to state-of-the-art algorithms used in regimes where the two jets can be resolved separately, its use in analyses targeting H → b ¯b 4 CONTENTS events can significantly enhance signal sensitivity, leading to more precise measurements of the Higgs coupling to b-quarks. In this context, this thesis focuses on the identification of processes in which the Higgs boson decays into a pair of b-quarks in the boosted regime. During my PhD, my work has been dedicated first to the calibration of the GN2X tagger and subsequently to its integration into physics analyses targeting boosted V H, H → b ¯b final states. Because the tagger is developed and tested on Monte Carlo samples where its behavior may differ from real data, it must be calibrated before use in physics analyses so that its performance in simulation matches that observed in actual data. Two calibration strategies were developed and implemented in this thesis for this preliminary study. The first strategy is purely based on Monte Carlo simulations, comparing tagger performance across different MC samples where systematic variations of input variables, driven by auxiliary measurements of tracking variables, are applied. This method proves particularly valuable for calibrating the tagger over processes like multijet backgrounds, where the simulated flavour fraction compositions suffer from large uncertainties. The second calibration approach exploits the data collected by the ATLAS experiment in between 2015 and 2018, focusing on Z → b ¯b+jets events as a high-statistics proxy for H → b ¯b processes. This choice is strategically motivated: Z → b ¯b is a color-singlet resonance with a mass close to that of the Higgs boson, making its b-jet kinematics very similar to the H → b ¯b signal, while offering a much higher available statistic. This data-driven approach allows direct comparison of tagger efficiency between simulation and data, yielding an independent set of scale factors that account for residual modeling uncertainties. After obtaining these correction factors and their associated systematic uncertainties, I worked on integrating the GN2X tagger into the physics analysis targeting boosted decays of the Higgs boson in V H, H → b ¯b processes. I developed the analysis framework in the boosted regime in order to incorporate the GN2X tagger into the signal event preselection, together with the corresponding correction factors derived from the calibration studies. A multivariate analysis was then performed to discriminate signal from background events using boosted decision trees (BDTs), whose output score distributions constitute the primary inputs to the final statistical fit and therefore play a central role in determining the analysis sensitivity. While the full statistical inference was not performed within the scope of this work, the expected sensitivity was evaluated by comparing the BDT output score distributions obtained when including the GN2X tagger in the event preselection to those from previous ATLAS results. This comparison demonstrates a substantial improvement with respect to earlier analyses, with the enhanced sensitivity driven by the tagger’s superior background rejection capabilities while maintaining high signal efficiency across the relevant kinematic phase space. Motivated by the strong impact of the tagger at the event selection level, additional studies were carried out to investigate the potential benefit of incorporating GN2X-related information not only in the preselection but also directly as input variables to the BDT for each event. These studies were designed to assess whether a tighter integration of the tagger information within the multivariate framework could further enhance the analysis performance, providing indications of possible improvements beyond the current approach. The final part of the thesis presents a research project that explores a differentiable endto-end machine-learning approach to charged-particle track reconstruction, intended as a proof of concept demonstrating the potential of this paradigm. In particular, the proposed approach adopts differentiable programming techniques to enable the joint training of the entire reconstruction pipeline, allowing physics-motivated priors to be incorporated directly into the training process. This approach differs from conventional tracking strategies. In standard reconstruction pipelines, including those that incorporate machine-learning techniques, the tracking 5 CONTENTS problem is typically factorized into multiple sequential steps, such as hit selection, seeding, pattern recognition, and track fitting. In contrast, the model that I developed formulates each stage of the reconstruction procedure in a differentiable way, allowing gradients to propagate through the successive steps of the workflow during training, enabling the different components of the model to share parameters and to be optimized jointly with respect to the global objective. As a result, the training process is aware of the role and impact of each step in the reconstruction chain, enabling a truly end-toend optimization. The results show that this jointly optimized approach can outperform a more traditional factorized strategy, highlighting the advantages of treating track reconstruction as a fully differentiable learning problem.

Boosting Higgs Boson Physics at High Momenta and a New End-to-End Approach to Muon Tracking at the LHC

RAMBELLI, LUCREZIA
2026

Abstract

In 2012, at the Large Hadron Collider, the largest particle accelerator built by humans, the Higgs boson was discovered by the ATLAS and CMS collaborations, being the last discovered piece of the Standard Model. With a measured mass of approximately 125 GeV and no spin, the Higgs boson has the unique feature of being the only known fundamental scalar particle. Its unique role originates from the structure of its scalar potential, whose characteristic symmetry-breaking shape leads to a non-zero vacuum expectation value, resulting in electroweak symmetry breaking and enabling mass generation for elementary particles while maintaining Lorentz invariance. After the discovery, the ATLAS and CMS collaborations continued the analyses with the increasing datasets provided by the LHC entering into a precision era of Higgs-boson property measurements. Among its properties, the Higgs boson’s couplings to other particles is being studied with high precision, since even small deviations from the Standard Model predictions could provide hints for new physics. The Higgs boson decay in a beauty quark pair, having a branching fraction of approximately 58%, has a dominant influence on the total Higgs width, and precise measurements of the Yukawa coupling of the Higgs boson to b quarks provide one of the most sensitive probes for looking at deviations from the Standard Model predictions. In this context, a particularly suitable channel to study this coupling is the production of the Higgs boson in association with a vector boson V H (with V = W, Z bosons), followed by its H → b ¯b decay. In fact, leptonic decay channels of the accompanying vector boson V provide clean experimental signatures which allow for a strong suppression of the otherwise overwhelming background processes. As potential deviations from the Standard Model are enhanced at high energy scales, measurements in the boosted Higgs regime provide increased sensitivity to modifications of Higgs interactions, where higher-order effective operators introduce momentum-dependent corrections. In this regime, the cross section hierarchy changes with respect to the inclusive one where gluon-gluon fusion is nearly 90% of the total, and the contribution of V H production becomes increasingly significant while also offering clean leptonic signatures that improve the selection of V H, H → b ¯b events. As a correct classification of the final state is fundamental for high precision measurements, over the years, several of the so-called flavour-tagging algorithms, i.e. algorithms able to identify and classify the flavour of the quark which produced the final state jets, were developed. Considering that in the boosted regime the two b-quarks in the final state become highly collimated, leading to two overlapping jet signatures, a dedicated flavour-tagging procedure is needed to accurately identify and classify the flavour of the resulting jets. In fact, while at low energies the two final state jets are reconstructed and classified separately, in the boosted case the final state is reconstructed as a single jet that contains the products of both b-quark hadronizations. The most recent algorithm developed by the ATLAS Collaboration is the GN2X tagger, a transformer-based model that relies exclusively on track-level information within the reconstructed large-radius jet to classify the flavour of the full reconstructed jet. Thanks to its superior performance compared to previous approaches, and even to state-of-the-art algorithms used in regimes where the two jets can be resolved separately, its use in analyses targeting H → b ¯b 4 CONTENTS events can significantly enhance signal sensitivity, leading to more precise measurements of the Higgs coupling to b-quarks. In this context, this thesis focuses on the identification of processes in which the Higgs boson decays into a pair of b-quarks in the boosted regime. During my PhD, my work has been dedicated first to the calibration of the GN2X tagger and subsequently to its integration into physics analyses targeting boosted V H, H → b ¯b final states. Because the tagger is developed and tested on Monte Carlo samples where its behavior may differ from real data, it must be calibrated before use in physics analyses so that its performance in simulation matches that observed in actual data. Two calibration strategies were developed and implemented in this thesis for this preliminary study. The first strategy is purely based on Monte Carlo simulations, comparing tagger performance across different MC samples where systematic variations of input variables, driven by auxiliary measurements of tracking variables, are applied. This method proves particularly valuable for calibrating the tagger over processes like multijet backgrounds, where the simulated flavour fraction compositions suffer from large uncertainties. The second calibration approach exploits the data collected by the ATLAS experiment in between 2015 and 2018, focusing on Z → b ¯b+jets events as a high-statistics proxy for H → b ¯b processes. This choice is strategically motivated: Z → b ¯b is a color-singlet resonance with a mass close to that of the Higgs boson, making its b-jet kinematics very similar to the H → b ¯b signal, while offering a much higher available statistic. This data-driven approach allows direct comparison of tagger efficiency between simulation and data, yielding an independent set of scale factors that account for residual modeling uncertainties. After obtaining these correction factors and their associated systematic uncertainties, I worked on integrating the GN2X tagger into the physics analysis targeting boosted decays of the Higgs boson in V H, H → b ¯b processes. I developed the analysis framework in the boosted regime in order to incorporate the GN2X tagger into the signal event preselection, together with the corresponding correction factors derived from the calibration studies. A multivariate analysis was then performed to discriminate signal from background events using boosted decision trees (BDTs), whose output score distributions constitute the primary inputs to the final statistical fit and therefore play a central role in determining the analysis sensitivity. While the full statistical inference was not performed within the scope of this work, the expected sensitivity was evaluated by comparing the BDT output score distributions obtained when including the GN2X tagger in the event preselection to those from previous ATLAS results. This comparison demonstrates a substantial improvement with respect to earlier analyses, with the enhanced sensitivity driven by the tagger’s superior background rejection capabilities while maintaining high signal efficiency across the relevant kinematic phase space. Motivated by the strong impact of the tagger at the event selection level, additional studies were carried out to investigate the potential benefit of incorporating GN2X-related information not only in the preselection but also directly as input variables to the BDT for each event. These studies were designed to assess whether a tighter integration of the tagger information within the multivariate framework could further enhance the analysis performance, providing indications of possible improvements beyond the current approach. The final part of the thesis presents a research project that explores a differentiable endto-end machine-learning approach to charged-particle track reconstruction, intended as a proof of concept demonstrating the potential of this paradigm. In particular, the proposed approach adopts differentiable programming techniques to enable the joint training of the entire reconstruction pipeline, allowing physics-motivated priors to be incorporated directly into the training process. This approach differs from conventional tracking strategies. In standard reconstruction pipelines, including those that incorporate machine-learning techniques, the tracking 5 CONTENTS problem is typically factorized into multiple sequential steps, such as hit selection, seeding, pattern recognition, and track fitting. In contrast, the model that I developed formulates each stage of the reconstruction procedure in a differentiable way, allowing gradients to propagate through the successive steps of the workflow during training, enabling the different components of the model to share parameters and to be optimized jointly with respect to the global objective. As a result, the training process is aware of the role and impact of each step in the reconstruction chain, enabling a truly end-toend optimization. The results show that this jointly optimized approach can outperform a more traditional factorized strategy, highlighting the advantages of treating track reconstruction as a fully differentiable learning problem.
22-mag-2026
Inglese
Francesco Armando Di Bello
COCCARO, ANDREA
TOSI, SILVANO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_5526161.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 64.99 MB
Formato Adobe PDF
64.99 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/372721
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-372721