Utilizzo di Large Language Models in Domini con Pochi Dati: Una Prospettiva Industriale

Ul Haq, Muhammad Uzair

In recent years, there has been a significant shift toward online job postings and recruitment portals, allowing candidates to easily upload their data and documents—such as resumes and CVs—for specific job vacancies. These platforms have streamlined the application process for candidates but have also made the screening process more time-consuming and labor-intensive for recruiters. For a single job advertisement, the Human Resources (HR) department may receive a large number of applications. Natural Language Processing (NLP) tools have the potential to alleviate this burden, saving valuable HR resources by automating parts of the recruitment process. Specifically, automatic information extraction from text data can expedite recruiters’ tasks by rapidly identifying relevant candidate information, such as personal details, work experience, and education. Additionally, soft skills are a critical component that recruiters assess when screening candidates for a particular job profile. A job profile typically includes desired attributes such as educational background, technical and soft skills, and past roles held. By leveraging NLP, recruiters can more effectively match these qualifications to find the ideal candidate for each position. The primary objective of this thesis is to investigate and improve existing NLP approaches to meet the needs of the industry. We identify several key challenges in this domain, including scarcity of training data, complex information extraction requirements, and a lack of standardized approaches. To address these challenges, we have enhanced data augmentation techniques for complex information scenarios, particularly in low-resource contexts within Human Resource Management. By leveraging advanced NLP techniques, we generated synthetic data that mirrors the structure and nuances of real-world HR datasets, enriching the data where actual examples are limited. This approach not only increased the diversity and volume of training data but also improved the robustness of downstream models in handling complex and varied inputs. Additionally, we utilized recent Large Language Models (LLMs), such as GPT, to automate data annotation tasks, enabling faster and more accurate labeling of HR-specific information. This integration of LLMs has streamlined the annotation process, providing high-quality labeled datasets with minimal manual effort and making it feasible to train more sophisticated models in low-resource domains.

Utilizzo di Large Language Models in Domini con Pochi Dati: Una Prospettiva Industriale

UL HAQ, MUHAMMAD UZAIR

2025

Abstract

In recent years, there has been a significant shift toward online job postings and recruitment portals, allowing candidates to easily upload their data and documents—such as resumes and CVs—for specific job vacancies. These platforms have streamlined the application process for candidates but have also made the screening process more time-consuming and labor-intensive for recruiters. For a single job advertisement, the Human Resources (HR) department may receive a large number of applications. Natural Language Processing (NLP) tools have the potential to alleviate this burden, saving valuable HR resources by automating parts of the recruitment process. Specifically, automatic information extraction from text data can expedite recruiters’ tasks by rapidly identifying relevant candidate information, such as personal details, work experience, and education. Additionally, soft skills are a critical component that recruiters assess when screening candidates for a particular job profile. A job profile typically includes desired attributes such as educational background, technical and soft skills, and past roles held. By leveraging NLP, recruiters can more effectively match these qualifications to find the ideal candidate for each position. The primary objective of this thesis is to investigate and improve existing NLP approaches to meet the needs of the industry. We identify several key challenges in this domain, including scarcity of training data, complex information extraction requirements, and a lack of standardized approaches. To address these challenges, we have enhanced data augmentation techniques for complex information scenarios, particularly in low-resource contexts within Human Resource Management. By leveraging advanced NLP techniques, we generated synthetic data that mirrors the structure and nuances of real-world HR datasets, enriching the data where actual examples are limited. This approach not only increased the diversity and volume of training data but also improved the robustness of downstream models in handling complex and varied inputs. Additionally, we utilized recent Large Language Models (LLMs), such as GPT, to automate data annotation tasks, enabling faster and more accurate labeling of HR-specific information. This integration of LLMs has streamlined the annotation process, providing high-quality labeled datasets with minimal manual effort and making it feasible to train more sophisticated models in low-resource domains.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				BRAIN, MIND AND COMPUTER SCIENCE
			
	Data di pubblicazione
	
				26-mar-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				SPERDUTI, ALESSANDRO
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
PhD_Dissertation.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 5.39 MB Formato Adobe PDF Visualizza/Apri	5.39 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/202136

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-202136