Classification of cancer pathology reports with Deep Learning methods

Martina, Stefano

Natural Language Processing (NLP) is a discipline that involves the design of methods that process text. Deep learning, and Machine Learning (ML) in general, is the discipline that studies and implements methods that learn to make predictions from data. In the last years, many different ML methods have been presented in the context of NLP. In this work we focused in par- ticular on text classification methods. Cancer registries collect pathology re- ports from clinical data sources and combine them with administrative data sources to identify cancer diagnoses in a specific area. Here we present a large scale study on deep learning methods applied to cancer pathology reports in Italian language. In this study we developed several classifiers to predict to- pography and morphology ICD-O codes. We compared classic machine learn- ing approaches, i.e. Support Vector Machine (SVM), with recent deep learn- ing techniques, i.e. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Furthermore, we compared recent attention-based and hierar- chical techniques, e.g. Bidirectional Encoder Representations from Transform- ers (BERT), with a more simple hard attention method, showing that the latter is enough to perform slightly better in this specific domain.

Classification of cancer pathology reports with Deep Learning methods

Martina, Stefano

2020

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2020
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Paolo Frasconi
			
	Nome Editore
	
				Università degli Studi di Firenze
			
	Collezione di appartenenza
	
				Università degli Studi di Firenze

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/153117

Il codice NBN di questa tesi è URN:NBN:IT:UNIFI-153117