Natural Language Processing (NLP) field has deeply benefit of recent improvements in Deep Learning (DL), mainly involving transformers and the Attention Mechanism. Those technologies were crucial to assess outstanding performances in Information Retrieval (IR) and Question-Answer (QA) tasks. Encoder-only transformers have obtained promising results in NLP traditional tasks, such as Relation Extraction (RE) and text classification. Language Models (LMs) if combined with suitable networks, have assessed their superior performance on these tasks, through the generation of contextual embeddings. On the other hand, Large Language Models (LLMs) have shown impressive textual generation capabilities that lead to a redefinition of NLP tasks. QA with LLM benefits of Retrieval Augmented Generation (RAG) approach, through which models are capable of providing complete answers utilizing the given query and associated context. Despite these promising results, a significant problem arises when dealing with both fine-tuning and evaluation of LLMs’ generated text. Fine-tuning of such models become impractical due to their size, and novel strategies, such as Low-Rank Adaptation (LoRA) and prompt engineering, were developed to overcome the significant computational resources needed. On the other hand, the evaluation on LLMs is challenging for scenarios for which a golden answer is not available. The development of effective evaluation strategies, both involving automatic methods which involves a superior LLM as a judge, is still an open question in the scientific community. Moreover, progress in LLM systems leads to the development of multimodal models, capable of jointly processing textual and visual inputs. On this basis, even more complex tasks can be performed, such as textual classification of the provided images, obtained by an analysis of the generated text. This work focuses on the following principal Research Question (RQ): how can NLP techniques, based on transformer models, be used to extract relevant semantic knowledge from unstructured text? At the beginning, the research objective was towards the usage of encoder-only models for classical NLP tasks such as RE and semantic classification. In this context, the main interest was in the development of simple architectures based on neural models in which the capabilities of LMs as feature extraction models, were investigated and stressed. As for RE task, English-only models were involved in a classical data set for the target task, while the Italian language was the core of semantic classification in the context of the EVALITA competition for the detection of homotransfobic and hateful content on social networks. As a result of the increasing popularity of generative models in the scientific community, the RQ was re-defined accordingly. In particular, an interest towards decoder-only models emerged towards the usage and performances of LLMs in complex tasks involving IR. In this context, tasks such as QA in a target domain with usage of the RAG approach were investigated. More in detail, considering the educational domain, LLMs’s capabilities were tested in both stand-alone and chatbot setups, using an ad hoc built data set. In these applications, selected LLM was queried with a custom instruction, the actual query, and the relevant context, thus a correct answer can be generated and the risks of hallucinations are reduced. As multimodal models started been developed, QA and classification applications were developed in a multimodal setting for the medical domain. After properly arranging a multimodal data set of clinical cases, a RAG based pipeline was built for a multimodal setup. In this application, the retrieved context serves to build a suitable prompt to exploit in-context learning capabilities of the chosen LLM. The objective is to generate the correct prediction for the scanning modality and the body part shown in the given clinical case. In addition, a Knowledge Graph (KG) was automatically created using an LLM prompted accordingly to, essentially, perform a RE task in a generative manner. Obtained KG, serves as external knowledge source and its usage is explored in the context of a hybrid retrieval approach in the previous RAG-based architecture build. In this case, the model was asked to generate a disease hypothesis for the given clinical case. To summarize the research objective presented in this work, the main problem of semantic IR in NLP field, was explored from two point of views, involving traditional NLP techniques and models, such as encoder-only LMs, and with the most recent developed LLMs. The use of such models, after the release of ChatGPT, cannot be explored or ignored, since it assessed a paradigm shift in the NLP field.
Towards the usage of Large Language Models in Information Retrieval and Question Answering task
SIRAGUSA, Irene
2025
Abstract
Natural Language Processing (NLP) field has deeply benefit of recent improvements in Deep Learning (DL), mainly involving transformers and the Attention Mechanism. Those technologies were crucial to assess outstanding performances in Information Retrieval (IR) and Question-Answer (QA) tasks. Encoder-only transformers have obtained promising results in NLP traditional tasks, such as Relation Extraction (RE) and text classification. Language Models (LMs) if combined with suitable networks, have assessed their superior performance on these tasks, through the generation of contextual embeddings. On the other hand, Large Language Models (LLMs) have shown impressive textual generation capabilities that lead to a redefinition of NLP tasks. QA with LLM benefits of Retrieval Augmented Generation (RAG) approach, through which models are capable of providing complete answers utilizing the given query and associated context. Despite these promising results, a significant problem arises when dealing with both fine-tuning and evaluation of LLMs’ generated text. Fine-tuning of such models become impractical due to their size, and novel strategies, such as Low-Rank Adaptation (LoRA) and prompt engineering, were developed to overcome the significant computational resources needed. On the other hand, the evaluation on LLMs is challenging for scenarios for which a golden answer is not available. The development of effective evaluation strategies, both involving automatic methods which involves a superior LLM as a judge, is still an open question in the scientific community. Moreover, progress in LLM systems leads to the development of multimodal models, capable of jointly processing textual and visual inputs. On this basis, even more complex tasks can be performed, such as textual classification of the provided images, obtained by an analysis of the generated text. This work focuses on the following principal Research Question (RQ): how can NLP techniques, based on transformer models, be used to extract relevant semantic knowledge from unstructured text? At the beginning, the research objective was towards the usage of encoder-only models for classical NLP tasks such as RE and semantic classification. In this context, the main interest was in the development of simple architectures based on neural models in which the capabilities of LMs as feature extraction models, were investigated and stressed. As for RE task, English-only models were involved in a classical data set for the target task, while the Italian language was the core of semantic classification in the context of the EVALITA competition for the detection of homotransfobic and hateful content on social networks. As a result of the increasing popularity of generative models in the scientific community, the RQ was re-defined accordingly. In particular, an interest towards decoder-only models emerged towards the usage and performances of LLMs in complex tasks involving IR. In this context, tasks such as QA in a target domain with usage of the RAG approach were investigated. More in detail, considering the educational domain, LLMs’s capabilities were tested in both stand-alone and chatbot setups, using an ad hoc built data set. In these applications, selected LLM was queried with a custom instruction, the actual query, and the relevant context, thus a correct answer can be generated and the risks of hallucinations are reduced. As multimodal models started been developed, QA and classification applications were developed in a multimodal setting for the medical domain. After properly arranging a multimodal data set of clinical cases, a RAG based pipeline was built for a multimodal setup. In this application, the retrieved context serves to build a suitable prompt to exploit in-context learning capabilities of the chosen LLM. The objective is to generate the correct prediction for the scanning modality and the body part shown in the given clinical case. In addition, a Knowledge Graph (KG) was automatically created using an LLM prompted accordingly to, essentially, perform a RE task in a generative manner. Obtained KG, serves as external knowledge source and its usage is explored in the context of a hybrid retrieval approach in the previous RAG-based architecture build. In this case, the model was asked to generate a disease hypothesis for the given clinical case. To summarize the research objective presented in this work, the main problem of semantic IR in NLP field, was explored from two point of views, involving traditional NLP techniques and models, such as encoder-only LMs, and with the most recent developed LLMs. The use of such models, after the release of ChatGPT, cannot be explored or ignored, since it assessed a paradigm shift in the NLP field.File | Dimensione | Formato | |
---|---|---|---|
Tesi_PhD_Siragusa_compressed.pdf
accesso aperto
Dimensione
9.61 MB
Formato
Adobe PDF
|
9.61 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/215218
URN:NBN:IT:UNIPA-215218