Natural Language Processing, Knowledge Graphs, and Artificial Intelligence have deeply impacted multiple disciplines, with biomedicine standing out due to its significant societal impact and overall research effort. Crucially, the vastness of the biomedical field, evident from the overwhelming publication rates on repositories like PubMed, presents challenges in manually accessing and synthesizing up-to-date research. Due to this growing issue, researchers are increasingly leveraging AI-driven computational tools for efficient navigation through various biomedical knowledge repositories.In this thesis, we focus on the study, design, implementation, and evaluation of platforms aimed at constructing comprehensive and accurate biomedical Knowledge Graphs (KG). The initial effort culminated in the creation of BioTagME, a large biomedical KG containing nodes representing bio-entities, and edges representing the strength of their relationships, sourced from ontologies and PubMed abstracts. While BioTagME showcased promising results, it highlighted pitfalls and limitations related to the task of entity retrieval, and (lack of) labeling for their edges.We then introduced OntoTagME, a tailor-made biomedical entity linker, whose goal was to surpass some of BioTagME’s limitations. OntoTagME performs high-quality filtering of biomedical entities from Wikidata and then merges its annotations with another state-of-the-art tool: PubTator. This integration obtains a boost in F1-metrics over those individual tools when evaluated on two ground-truth datasets of genes and diseases.We finally deployed OntoTagME with state-of-the-art edge extraction and labeling techniques to design a novel, sophisticated, publicly-available platform offering on-the-fly biomedical network construction capabilities, named NetME. Notably, NetME enriches its KG with a Retrieval Augmented Generation (RAG) tool that allows users to generate succinct, contextually relevant, linguistically well-formed summaries about the entities modeled in that graph, thus extending the explainability capabilities of the knowledge distilled by our tool. Through several rigorous evaluations on gold standard datasets, NetME demonstrated superior performance against known biomedical KGs on the task of gene-disease association discovery.We believe that this work represents a step forward in the design of AI-driven computational tools for the efficient and easy navigation of the biomedical literature.

Building a Biomedical Knowledge Graph from PubMed Central articles

BELLOMO, Lorenzo
2024

Abstract

Natural Language Processing, Knowledge Graphs, and Artificial Intelligence have deeply impacted multiple disciplines, with biomedicine standing out due to its significant societal impact and overall research effort. Crucially, the vastness of the biomedical field, evident from the overwhelming publication rates on repositories like PubMed, presents challenges in manually accessing and synthesizing up-to-date research. Due to this growing issue, researchers are increasingly leveraging AI-driven computational tools for efficient navigation through various biomedical knowledge repositories.In this thesis, we focus on the study, design, implementation, and evaluation of platforms aimed at constructing comprehensive and accurate biomedical Knowledge Graphs (KG). The initial effort culminated in the creation of BioTagME, a large biomedical KG containing nodes representing bio-entities, and edges representing the strength of their relationships, sourced from ontologies and PubMed abstracts. While BioTagME showcased promising results, it highlighted pitfalls and limitations related to the task of entity retrieval, and (lack of) labeling for their edges.We then introduced OntoTagME, a tailor-made biomedical entity linker, whose goal was to surpass some of BioTagME’s limitations. OntoTagME performs high-quality filtering of biomedical entities from Wikidata and then merges its annotations with another state-of-the-art tool: PubTator. This integration obtains a boost in F1-metrics over those individual tools when evaluated on two ground-truth datasets of genes and diseases.We finally deployed OntoTagME with state-of-the-art edge extraction and labeling techniques to design a novel, sophisticated, publicly-available platform offering on-the-fly biomedical network construction capabilities, named NetME. Notably, NetME enriches its KG with a Retrieval Augmented Generation (RAG) tool that allows users to generate succinct, contextually relevant, linguistically well-formed summaries about the entities modeled in that graph, thus extending the explainability capabilities of the knowledge distilled by our tool. Through several rigorous evaluations on gold standard datasets, NetME demonstrated superior performance against known biomedical KGs on the task of gene-disease association discovery.We believe that this work represents a step forward in the design of AI-driven computational tools for the efficient and easy navigation of the biomedical literature.
22-apr-2024
Inglese
Scuola Normale Superiore
Esperti anonimi
File in questo prodotto:
File Dimensione Formato  
Tesi.pdf

Open Access dal 23/04/2024

Licenza: Tutti i diritti riservati
Dimensione 4.69 MB
Formato Adobe PDF
4.69 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/304281
Il codice NBN di questa tesi è URN:NBN:IT:SNS-304281