Natural Language Processing, Knowledge Graphs, and Artificial Intelligence have deeply impacted multiple disciplines, with biomedicine standing out due to its significant societal impact and overall research effort. Crucially, the vastness of the biomedical field, evident from the overwhelming publication rates on repositories like PubMed, presents challenges in manually accessing and synthesizing up-to-date research. Due to this growing issue, researchers are increasingly leveraging AI-driven computational tools for efficient navigation through various biomedical knowledge repositories.In this thesis, we focus on the study, design, implementation, and evaluation of platforms aimed at constructing comprehensive and accurate biomedical Knowledge Graphs (KG). The initial effort culminated in the creation of BioTagME, a large biomedical KG containing nodes representing bio-entities, and edges representing the strength of their relationships, sourced from ontologies and PubMed abstracts. While BioTagME showcased promising results, it highlighted pitfalls and limitations related to the task of entity retrieval, and (lack of) labeling for their edges.We then introduced OntoTagME, a tailor-made biomedical entity linker, whose goal was to surpass some of BioTagME’s limitations. OntoTagME performs high-quality filtering of biomedical entities from Wikidata and then merges its annotations with another state-of-the-art tool: PubTator. This integration obtains a boost in F1-metrics over those individual tools when evaluated on two ground-truth datasets of genes and diseases.We finally deployed OntoTagME with state-of-the-art edge extraction and labeling techniques to design a novel, sophisticated, publicly-available platform offering on-the-fly biomedical network construction capabilities, named NetME. Notably, NetME enriches its KG with a Retrieval Augmented Generation (RAG) tool that allows users to generate succinct, contextually relevant, linguistically well-formed summaries about the entities modeled in that graph, thus extending the explainability capabilities of the knowledge distilled by our tool. Through several rigorous evaluations on gold standard datasets, NetME demonstrated superior performance against known biomedical KGs on the task of gene-disease association discovery.We believe that this work represents a step forward in the design of AI-driven computational tools for the efficient and easy navigation of the biomedical literature.
Building a Biomedical Knowledge Graph from PubMed Central articles
BELLOMO, Lorenzo
2024
Abstract
Natural Language Processing, Knowledge Graphs, and Artificial Intelligence have deeply impacted multiple disciplines, with biomedicine standing out due to its significant societal impact and overall research effort. Crucially, the vastness of the biomedical field, evident from the overwhelming publication rates on repositories like PubMed, presents challenges in manually accessing and synthesizing up-to-date research. Due to this growing issue, researchers are increasingly leveraging AI-driven computational tools for efficient navigation through various biomedical knowledge repositories.In this thesis, we focus on the study, design, implementation, and evaluation of platforms aimed at constructing comprehensive and accurate biomedical Knowledge Graphs (KG). The initial effort culminated in the creation of BioTagME, a large biomedical KG containing nodes representing bio-entities, and edges representing the strength of their relationships, sourced from ontologies and PubMed abstracts. While BioTagME showcased promising results, it highlighted pitfalls and limitations related to the task of entity retrieval, and (lack of) labeling for their edges.We then introduced OntoTagME, a tailor-made biomedical entity linker, whose goal was to surpass some of BioTagME’s limitations. OntoTagME performs high-quality filtering of biomedical entities from Wikidata and then merges its annotations with another state-of-the-art tool: PubTator. This integration obtains a boost in F1-metrics over those individual tools when evaluated on two ground-truth datasets of genes and diseases.We finally deployed OntoTagME with state-of-the-art edge extraction and labeling techniques to design a novel, sophisticated, publicly-available platform offering on-the-fly biomedical network construction capabilities, named NetME. Notably, NetME enriches its KG with a Retrieval Augmented Generation (RAG) tool that allows users to generate succinct, contextually relevant, linguistically well-formed summaries about the entities modeled in that graph, thus extending the explainability capabilities of the knowledge distilled by our tool. Through several rigorous evaluations on gold standard datasets, NetME demonstrated superior performance against known biomedical KGs on the task of gene-disease association discovery.We believe that this work represents a step forward in the design of AI-driven computational tools for the efficient and easy navigation of the biomedical literature.File | Dimensione | Formato | |
---|---|---|---|
Tesi.pdf
Open Access dal 23/04/2024
Licenza:
Tutti i diritti riservati
Dimensione
4.69 MB
Formato
Adobe PDF
|
4.69 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/304281
URN:NBN:IT:SNS-304281