The European Union (EU) Medical Device Regulation (MDR) 2017/745, effective since 2021, has introduced more stringent requirements for high-risk medical devices (MDs), emphasizing the need for proactive post-market surveillance (PMS) through real-world data sources such as Field Safety Notices (FSNs). FSNs that report device problems, communicated by manufacturers and published on national competent authorities' websites, could play a fundamental role in deriving trends to allow signalling possible devices or categories of devices at higher risk of problems once on the market. However, EU-specific challenges — such as jurisdictional complexity, fragmented data, use of different languages and nomenclatures among Member States, and delays and uncertainties in the availability of the EUropean DAtabase on Medical Devices (EUDAMED) — hinder efficient retrieval and utilization of the information embedded in the FSNs for enhanced PMS analysis. Within the EU-funded CORE-MD project, our aim was to address these barriers by designing and developing a novel IT tool to automatically retrieve, structure, re-classify using the European Medical Device Nomenclature (EMDN) system based on a hierarchical classification of medical categories structured up to 7 levels with increasing detail, and display in an aggregated way the publicly accessible FSNs. To do so, country-specific web scraping techniques were developed to collect FSNs from both EU and non-EU countries, with only 16 of 27 EU countries consistently updating FSNs. By leveraging Natural Language Processing (NLP) techniques, unstructured text from FSNs was transformed into structured data. As a first step, if the manufacturer and device name were not provided as structured fields, such information needed to be recognized within the retrieved text using Named Entity Recognition techniques. Since linkage to the EMDN codes was rarely included in FSNs, except for about one-third of the Italian ones, Entity Resolution was required to assign the appropriate EMDN codes across all FSNs, including those from countries lacking a device dataset with EMDN classifications. By applying such framework, 65,036 FSNs published up to 31/12/2023 were retrieved from 16 EU countries, of which 40,212 (61.83\%) were successfully assigned the proper EMDN. The framework's performance was tested, using the Italian FSNs for which the EMDN was publicly provided, with accuracies ranging from 87.34\% to 98.71\% for EMDN level 1 and from 64.15\% to 85.71\% even for level 4. Similarly, 71,180 FSNs published up to 31/12/2023 from non-EU countries (Australia, Brazil, Canada, the UK and the USA) were retrieved, of which 36,597 (51.41\%) were assigned the appropriated EMDN codes. The database of all retrieved FSNs is now updated monthly by automatically checking for newly published FSNs, thus ensuring a consistent and reliable dataset for ongoing analysis. To enhance data quality, a structured methodological framework combining NLP, vision transformers, and community detection algorithms in graphs was developed to identify duplicated FSNs across countries. Vision transformers, a deep learning architecture designed for image recognition and analysis, were used to process and extract image-based features from raw PDF and scanned files, while text-based features were also derived for each document. These features were combined to compute an adjacency matrix that captured both image and text similarities between documents. This matrix was used to represent each document, and consequently each FSN, as a node within a graph, where community detection algorithms were applied to identify clusters corresponding to potential groups of duplicated FSNs. By balancing image and text data similarity, this approach effectively addressed challenges related to language and format variations in the underlying PDF files. An interactive and user-friendly desktop application, the CORE-MD PMS Tool, was developed using Flutter, a cross-platform framework for building responsive interfaces, as the front-end and MongoDB, a NoSQL database designed for flexible and scalable data management, as the back-end, to facilitate the retrieval and exploration of the generated centralized database. Users can customize queries based on different criteria (e.g., country, manufacturer, device, EMDN levels, time interval, etc.), and assess tailored results in real-time. The usability of the developed application was preliminary tested in six potential users by quantitative assessment using the System Usability Scale, reaching a score equal to 82.92, with further evaluations by Notified Bodies underway. To enhance risk assessment, we proposed redefined pharmacovigilance indices, like the Proportional Reporting Ratio and Reporting Odds Ratio, applied to a subset of the centralized FSN dataset focusing on orthopaedic prostheses. This helped to identify device subcategories and manufacturers with higher risk profiles within the same EMDN level. Comparative analyses with other real-world data sources, such as registries and scientific literature, demonstrated that each source is complementary to each other as it can identify unique safety concerns. For instance, 55\% of total knee implants were jointly identified by both FSNs and registry data, while each source also detected unique safety signals. Furthermore, 70\% of a randomly selected group of hip and knee prostheses had at least one safety signal across different sources, each providing distinct insights. In a different application scenario focused on implantable pacemakers, both literature and FSNs identified overlapping issues as well as source-specific problems. By leveraging NLP, we demonstrated the potential of IT tools applied to the medical device regulatory field, by developing the scalable and efficient CORE-MD PMS Tool, capable of providing a unified platform with practical utility for real-world insights. The tool has been designed to support key stakeholders, including manufacturers and Expert Panels, by enabling trend visualization and efficient data retrieval. By integrating such derived information with different real-world data sources, such as registries and literature, the CORE-MD PMS Tool can enhance risk assessment, improve early safety signal detection, and support informed regulatory decision-making. This multi-source and data-driven approach creates a solid foundation for proactive PMS, as well as future advancements toward evidence-based practices and informed decision-making in the field of regulatory science.
Il Regolamento sui Dispositivi Medici (MDR) 2017/745 dell'Unione Europea (EU), in vigore dal 2021, ha introdotto requisiti più stringenti per i dispositivi medici ad alto rischio, sottolineando la necessità di una sorveglianza post-commercializzazione (PMS) proattiva attraverso fonti di dati reali, come gli Avvisi di Sicurezza (FSN). Gli FSN, che segnalano problemi dei dispositivi e vengono comunicati dai fabbricanti e pubblicati sui siti web delle autorità competenti nazionali, potrebbero svolgere un ruolo fondamentale nell'individuazione di tendenze utili a segnalare dispositivi o categorie di dispositivi a maggiore rischio di problemi, una volta disponibili sul mercato. Tuttavia, sfide specifiche dell'UE – come la complessità giurisdizionale, la frammentazione dei dati, l'uso di lingue e nomenclature differenti tra gli Stati membri, nonché ritardi e incertezze nella disponibilità della banca dati europea dei dispositivi medici (EUDAMED) – ostacolano un recupero e un utilizzo efficiente delle informazioni contenute negli FSN per una PMS più avanzata. Nell'ambito del progetto CORE-MD, finanziato dall'UE, il nostro obiettivo era mitigare queste difficoltà progettando e sviluppando un nuovo strumento informatico in grado di recuperare automaticamente, strutturare, riclassificare utilizzando la European Medical Device Nomenclature (EMDN) – basata su una classificazione gerarchica delle categorie mediche articolata fino a 7 livelli di dettaglio crescente – e visualizzare in modo aggregato gli FSN accessibili pubblicamente. Per raggiungere questo obiettivo, sono state sviluppate tecniche di web scraping specifiche per paese, al fine di raccogliere FSN sia da paesi UE che non UE, con solo 16 dei 27 paesi dell'UE che aggiornano regolarmente gli FSN. Grazie all'impiego di tecniche di Elaborazione del Linguaggio Naturale (NLP), il testo non strutturato degli FSN è stato trasformato in dati strutturati. Come primo passo, nel caso in cui i nomi del fabbriante e del dispositivo non fossero forniti come campi strutturati, tali informazioni dovevano essere riconosciute all'interno del testo recuperato utilizzando tecniche di Named Entity Recognition. Poiché il collegamento ai codici EMDN era raramente incluso negli FSN, ad eccezione di circa un terzo di quelli italiani, è stato necessario applicare tecniche di Entity Resolution per assegnare i codici EMDN appropriati a tutti gli FSN, inclusi quelli provenienti da paesi che non dispongono di dataset di dispositivi con classificazione EMDN. Applicando questo framework, sono stati recuperati 65.036 FSNs pubblicati fino al 31/12/2023 da 16 paesi dell'UE, di cui 40.212 (61,83%) sono stati correttamente assegnati ai relativi codici EMDN. Le prestazioni del framework sono state testate utilizzando gli FSN italiani, per i quali l'EMDN era pubblicamente disponibile, ottenendo accuratezze comprese tra 87,34% e 98,71% per il livello 1 e tra 64,15% e 85,71% anche per il livello 4. Analogamente, sono stati recuperati 71.180 FSN pubblicati fino al 31/12/2023 da paesi non UE (Australia, Brasile, Canada, Regno Unito e Stati Uniti), di cui 36.597 (51,41%) sono stati assegnati ai codici EMDN appropriati. Il database di tutti gli FSN recuperati viene ora aggiornato mensilmente attraverso un controllo automatico dei nuovi FSN pubblicati, garantendo così un dataset coerente e affidabile per le analisi in corso. Per migliorare la qualità dei dati, è stato sviluppato un framework metodologico strutturato che combina NLP, vision transformers e algoritmi di identificazione di comunità nei grafi per identificare FSN duplicati tra diversi paesi. I vision transformers, un’architettura di deep learning progettata per il riconoscimento e l’analisi delle immagini, sono stati impiegati per elaborare ed estrarre caratteristiche visive dai file PDF grezzi e dai documenti scansionati, mentre per ogni documento sono state estratte anche caratteristiche testuali. Queste caratteristiche sono state combinate per calcolare una matrice di adiacenza, che cattura le similarità sia a livello testuale che visivo tra i documenti. Tale matrice è stata utilizzata per rappresentare ogni documento, e quindi ogni FSN, come un nodo all’interno di un grafo, su cui sono stati applicati algoritmi di identificazione di comunità per individuare cluster corrispondenti a potenziali gruppi di FSN duplicati. Bilanciando la similarità tra dati visivi e testuali, questo approccio ha permesso di affrontare efficacemente le sfide legate alle variazioni di lingua e formato nei file PDF sottostanti. È stata sviluppata un’applicazione desktop interattiva e intuitiva, il CORE-MD PMS Tool, utilizzando Flutter, un framework multipiattaforma per interfacce reattive, come front-end, e MongoDB, un database NoSQL flessibile e scalabile, come back-end, per facilitare il recupero e l'esplorazione del database centralizzato creato. Gli utenti possono personalizzare le query in base a diversi criteri (ad esempio, paese, fabbricante, dispositivo, livelli EMDN, intervallo temporale, ecc.) e valutare i risultati in tempo reale. L’usabilità dell’applicazione è stata testata preliminarmente su sei utenti potenziali, con una valutazione quantitativa tramite la System Usability Scale (SUS), ottenendo un punteggio di 82,92. Ulteriori valutazioni da parte degli Organismi Notificati sono attualmente in corso. Per migliorare la valutazione del rischio, abbiamo ridefinito alcuni indici di farmacovigilanza, come il Proportional Reporting Ratio e il Reporting Odds Ratio, applicandoli ad un sottoinsieme del database centralizzato di FSN focalizzato sulle protesi ortopediche. Questo ha permesso di identificare sottocategorie di dispositivi e fabbricanti con profili di rischio più elevati all'interno dello stesso livello EMDN. Analisi comparative con altre fonti di dati reali, come registri e letteratura scientifica, hanno dimostrato che ogni fonte fornisce informazioni complementari e può identificare problematiche di sicurezza specifiche. Ad esempio, il 55% delle protesi totali di ginocchio è stato identificato sia dai FSN che dai registri, mentre ciascuna fonte ha rilevato anche segnali di sicurezza unici. Inoltre, il 70% di un gruppo casuale di protesi d’anca e ginocchio presentava almeno un segnale di sicurezza tra le diverse fonti, ognuna delle quali offriva prospettive distinte. In un altro scenario applicativo focalizzato sui pacemaker impiantabili, sia la letteratura che gli FSN hanno evidenziato problematiche comuni, oltre a criticità specifiche a ciascuna fonte. Attraverso l’utilizzo di tecniche di NLP, abbiamo dimostrato il potenziale degli strumenti IT applicati alla regolamentazione dei dispositivi medici, sviluppando il CORE-MD PMS Tool, una piattaforma scalabile ed efficiente capace di fornire un ambiente unificato con un’utilità pratica per le evidenze del mondo reale. Il tool è stato progettato per supportare gli stakeholder chiave, inclusi fabbricanti ed Expert Panels, consentendo la visualizzazione delle tendenze e un recupero efficiente dei dati. Integrando le informazioni derivate con diverse fonti di dati reali, come registri e letteratura, il CORE-MD PMS Tool può migliorare la valutazione del rischio, potenziare il rilevamento precoce dei segnali di sicurezza e supportare decisioni regolatorie più informate. Questo approccio multi-sorgente e data-driven crea una solida base per una PMS proattiva, oltre a rappresentare un passo avanti verso pratiche basate sull’evidenza e decisioni informate nel campo della regulatory science.
Natural language processing for regulatory science: aggregating field safety notices to support post-market surveillance of medical devices
YIJUN, REN
2025
Abstract
The European Union (EU) Medical Device Regulation (MDR) 2017/745, effective since 2021, has introduced more stringent requirements for high-risk medical devices (MDs), emphasizing the need for proactive post-market surveillance (PMS) through real-world data sources such as Field Safety Notices (FSNs). FSNs that report device problems, communicated by manufacturers and published on national competent authorities' websites, could play a fundamental role in deriving trends to allow signalling possible devices or categories of devices at higher risk of problems once on the market. However, EU-specific challenges — such as jurisdictional complexity, fragmented data, use of different languages and nomenclatures among Member States, and delays and uncertainties in the availability of the EUropean DAtabase on Medical Devices (EUDAMED) — hinder efficient retrieval and utilization of the information embedded in the FSNs for enhanced PMS analysis. Within the EU-funded CORE-MD project, our aim was to address these barriers by designing and developing a novel IT tool to automatically retrieve, structure, re-classify using the European Medical Device Nomenclature (EMDN) system based on a hierarchical classification of medical categories structured up to 7 levels with increasing detail, and display in an aggregated way the publicly accessible FSNs. To do so, country-specific web scraping techniques were developed to collect FSNs from both EU and non-EU countries, with only 16 of 27 EU countries consistently updating FSNs. By leveraging Natural Language Processing (NLP) techniques, unstructured text from FSNs was transformed into structured data. As a first step, if the manufacturer and device name were not provided as structured fields, such information needed to be recognized within the retrieved text using Named Entity Recognition techniques. Since linkage to the EMDN codes was rarely included in FSNs, except for about one-third of the Italian ones, Entity Resolution was required to assign the appropriate EMDN codes across all FSNs, including those from countries lacking a device dataset with EMDN classifications. By applying such framework, 65,036 FSNs published up to 31/12/2023 were retrieved from 16 EU countries, of which 40,212 (61.83\%) were successfully assigned the proper EMDN. The framework's performance was tested, using the Italian FSNs for which the EMDN was publicly provided, with accuracies ranging from 87.34\% to 98.71\% for EMDN level 1 and from 64.15\% to 85.71\% even for level 4. Similarly, 71,180 FSNs published up to 31/12/2023 from non-EU countries (Australia, Brazil, Canada, the UK and the USA) were retrieved, of which 36,597 (51.41\%) were assigned the appropriated EMDN codes. The database of all retrieved FSNs is now updated monthly by automatically checking for newly published FSNs, thus ensuring a consistent and reliable dataset for ongoing analysis. To enhance data quality, a structured methodological framework combining NLP, vision transformers, and community detection algorithms in graphs was developed to identify duplicated FSNs across countries. Vision transformers, a deep learning architecture designed for image recognition and analysis, were used to process and extract image-based features from raw PDF and scanned files, while text-based features were also derived for each document. These features were combined to compute an adjacency matrix that captured both image and text similarities between documents. This matrix was used to represent each document, and consequently each FSN, as a node within a graph, where community detection algorithms were applied to identify clusters corresponding to potential groups of duplicated FSNs. By balancing image and text data similarity, this approach effectively addressed challenges related to language and format variations in the underlying PDF files. An interactive and user-friendly desktop application, the CORE-MD PMS Tool, was developed using Flutter, a cross-platform framework for building responsive interfaces, as the front-end and MongoDB, a NoSQL database designed for flexible and scalable data management, as the back-end, to facilitate the retrieval and exploration of the generated centralized database. Users can customize queries based on different criteria (e.g., country, manufacturer, device, EMDN levels, time interval, etc.), and assess tailored results in real-time. The usability of the developed application was preliminary tested in six potential users by quantitative assessment using the System Usability Scale, reaching a score equal to 82.92, with further evaluations by Notified Bodies underway. To enhance risk assessment, we proposed redefined pharmacovigilance indices, like the Proportional Reporting Ratio and Reporting Odds Ratio, applied to a subset of the centralized FSN dataset focusing on orthopaedic prostheses. This helped to identify device subcategories and manufacturers with higher risk profiles within the same EMDN level. Comparative analyses with other real-world data sources, such as registries and scientific literature, demonstrated that each source is complementary to each other as it can identify unique safety concerns. For instance, 55\% of total knee implants were jointly identified by both FSNs and registry data, while each source also detected unique safety signals. Furthermore, 70\% of a randomly selected group of hip and knee prostheses had at least one safety signal across different sources, each providing distinct insights. In a different application scenario focused on implantable pacemakers, both literature and FSNs identified overlapping issues as well as source-specific problems. By leveraging NLP, we demonstrated the potential of IT tools applied to the medical device regulatory field, by developing the scalable and efficient CORE-MD PMS Tool, capable of providing a unified platform with practical utility for real-world insights. The tool has been designed to support key stakeholders, including manufacturers and Expert Panels, by enabling trend visualization and efficient data retrieval. By integrating such derived information with different real-world data sources, such as registries and literature, the CORE-MD PMS Tool can enhance risk assessment, improve early safety signal detection, and support informed regulatory decision-making. This multi-source and data-driven approach creates a solid foundation for proactive PMS, as well as future advancements toward evidence-based practices and informed decision-making in the field of regulatory science.File | Dimensione | Formato | |
---|---|---|---|
2025_03_Ren.pdf
accesso solo da BNCF e BNCR
Dimensione
20.99 MB
Formato
Adobe PDF
|
20.99 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/220071
URN:NBN:IT:POLIMI-220071