Background: Atrial arrhythmias, particularly atrial fibrillation (AF), are prevalent cardiovascular disorders characterized by irregular heart rhythms originating from the atria, and affecting approximately 2-3% of the global population. These conditions are associated with increased risks of stroke, heart failure, and other severe complications. Traditional detection methods, primarily based on electrocardiograms (ECGs) analyzed by clinicians, are often time-consuming and prone to human error, especially when dealing with long-term monitoring (Holter recordings) and subtle, intermittent atrial arrhythmias. Recently, the development of artificial intelligence (AI)-based methods has garnered significant attention for automated AF detection from ECGs. Challenges: Developing AI-based, particularly deep learning (DL), models to accurately detect atrial arrhythmias presents several significant challenges. First, extracting invariant representations across subjects of these arrhythmias is complex, necessitating high-quality annotated data and a substantial cohort of patients to ensure robust model training. Second, ECG datasets are typically imbalanced due to the scarcity of abnormal cases, complicating model training and evaluation, which can lead to bias and reduced performance in detecting rare arrhythmic events. Third, current methods are often validated on smaller patient populations of Holter recordings, which limits their clinical applicability and generalization, thereby restricting their effectiveness in diverse real-world settings. Finally, despite the promising performance of DL models in arrhythmia detection, their susceptibility to overfitting necessitates the exploration of uncertainty quantification to ensure safe integration into clinical practice. Objectives: This thesis aims to design and develop DL models applied to ECG data for the automatic detection of AF from Holter recordings. Furthermore, it seeks to compare the performance of state-of-the-art models and commercial software solutions with the proposed model using a large, retrospective cohort of clinical data. Another key objective is to quantify the uncertainty in AF detection to assess the model's prediction confidence and improve its clinical reliability. Methods: We obtained 1,346 Holter recordings from 1,346 distinct patients at Groupe Hospitalier Ambroise Paré in Paris, France, each with diverse cardiac conditions. We developed a DL model for arrhythmia detection, focusing on residual attention models for comprehensive cross-comparisons with state-of-the-art models and two rule-based algorithms: (1) ABILE, a commercial software by AMPS LLC, New York, and (2) CBR, a research-based solution developed at the Center for Biological Research at UCSF, San Francisco. To enhance model performance, we systematically reviewed and applied various data augmentation techniques to improve the diversity and robustness of the training data. Furthermore, we investigated the impact of annotation errors (noisy labels) on model accuracy and implemented strategies to mitigate their effects. Additionally, we quantified the uncertainty in our DL model to assess prediction confidence and benchmarked 11 uncertainty quantification (UQ) methods for robust AF detection. Results: The proposed DL model achieved 92.8% sensitivity and 91.5% specificity, outperforming state-of-the-art DL models. Moreover, when compared with the ABILE model, the proposed model achieved 95.1% sensitivity and 96.3% specificity, demonstrating superior specificity relative to ABILE's 48.9%, though with a slight reduction in sensitivity from ABILE's 98.4\%. Additionally, in comparison with the CBR, which obtained 44.2% sensitivity and 99.9% specificity, the proposed model delivered a more balanced performance. Data augmentation techniques may improve the model's generalization and accuracy; however, in this context, they showed limited performance gains. Data augmentation techniques may improve model generalization and accuracy. However, in this context, it showed limited performance increase. The study of noisy labels provided valuable insights into model resilience. The model was found resilient up to 40% of a random change in label annotation. Finally, integrating UQ showed improved model's prediction confidence. Conclusions: This research advances atrial arrhythmia detection through DL, offering potential improvements in clinical diagnostics and patient monitoring using Holter recordings. The methodologies and insights presented in this thesis lay a foundation for future research in cardiac arrhythmia detection using DL, addressing key challenges and enhancing the applicability of these models in clinical settings.
AI-DRIVEN ATRIAL ARRHYTHMIA DETECTION: DEVELOPMENT, CROSS-COMPARISON AND UNCERTAINTY QUANTIFICATION OF ALGORITHMS FOR CLINICAL CONTINUOUS ECGS
RAHMAN, MD MOKLESUR
2024
Abstract
Background: Atrial arrhythmias, particularly atrial fibrillation (AF), are prevalent cardiovascular disorders characterized by irregular heart rhythms originating from the atria, and affecting approximately 2-3% of the global population. These conditions are associated with increased risks of stroke, heart failure, and other severe complications. Traditional detection methods, primarily based on electrocardiograms (ECGs) analyzed by clinicians, are often time-consuming and prone to human error, especially when dealing with long-term monitoring (Holter recordings) and subtle, intermittent atrial arrhythmias. Recently, the development of artificial intelligence (AI)-based methods has garnered significant attention for automated AF detection from ECGs. Challenges: Developing AI-based, particularly deep learning (DL), models to accurately detect atrial arrhythmias presents several significant challenges. First, extracting invariant representations across subjects of these arrhythmias is complex, necessitating high-quality annotated data and a substantial cohort of patients to ensure robust model training. Second, ECG datasets are typically imbalanced due to the scarcity of abnormal cases, complicating model training and evaluation, which can lead to bias and reduced performance in detecting rare arrhythmic events. Third, current methods are often validated on smaller patient populations of Holter recordings, which limits their clinical applicability and generalization, thereby restricting their effectiveness in diverse real-world settings. Finally, despite the promising performance of DL models in arrhythmia detection, their susceptibility to overfitting necessitates the exploration of uncertainty quantification to ensure safe integration into clinical practice. Objectives: This thesis aims to design and develop DL models applied to ECG data for the automatic detection of AF from Holter recordings. Furthermore, it seeks to compare the performance of state-of-the-art models and commercial software solutions with the proposed model using a large, retrospective cohort of clinical data. Another key objective is to quantify the uncertainty in AF detection to assess the model's prediction confidence and improve its clinical reliability. Methods: We obtained 1,346 Holter recordings from 1,346 distinct patients at Groupe Hospitalier Ambroise Paré in Paris, France, each with diverse cardiac conditions. We developed a DL model for arrhythmia detection, focusing on residual attention models for comprehensive cross-comparisons with state-of-the-art models and two rule-based algorithms: (1) ABILE, a commercial software by AMPS LLC, New York, and (2) CBR, a research-based solution developed at the Center for Biological Research at UCSF, San Francisco. To enhance model performance, we systematically reviewed and applied various data augmentation techniques to improve the diversity and robustness of the training data. Furthermore, we investigated the impact of annotation errors (noisy labels) on model accuracy and implemented strategies to mitigate their effects. Additionally, we quantified the uncertainty in our DL model to assess prediction confidence and benchmarked 11 uncertainty quantification (UQ) methods for robust AF detection. Results: The proposed DL model achieved 92.8% sensitivity and 91.5% specificity, outperforming state-of-the-art DL models. Moreover, when compared with the ABILE model, the proposed model achieved 95.1% sensitivity and 96.3% specificity, demonstrating superior specificity relative to ABILE's 48.9%, though with a slight reduction in sensitivity from ABILE's 98.4\%. Additionally, in comparison with the CBR, which obtained 44.2% sensitivity and 99.9% specificity, the proposed model delivered a more balanced performance. Data augmentation techniques may improve the model's generalization and accuracy; however, in this context, they showed limited performance gains. Data augmentation techniques may improve model generalization and accuracy. However, in this context, it showed limited performance increase. The study of noisy labels provided valuable insights into model resilience. The model was found resilient up to 40% of a random change in label annotation. Finally, integrating UQ showed improved model's prediction confidence. Conclusions: This research advances atrial arrhythmia detection through DL, offering potential improvements in clinical diagnostics and patient monitoring using Holter recordings. The methodologies and insights presented in this thesis lay a foundation for future research in cardiac arrhythmia detection using DL, addressing key challenges and enhancing the applicability of these models in clinical settings.File | Dimensione | Formato | |
---|---|---|---|
phd_unimi_R13397.pdf
accesso aperto
Dimensione
747 kB
Formato
Adobe PDF
|
747 kB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/187962
URN:NBN:IT:UNIMI-187962