SAPIENT: Semantic and Automatic Processing of Information about Environment

Ammaturo, Eleonora

The aim of the project, in collaboration with the company Expert.ai, is to implement a new system called Sapient which stands for ‘Semantic and Automatic Processing of Information about Environment’. The project grew out of the need to process a multitude of complex documents, i.e. those containing information in multiple objects such as graphs, tables, etc. For this purpose, it is necessary to segment complex texts into homogeneous areas i.e. to identify the different parts that make up a document with its relative location. This system is part of a broader one able to recognise the characters of a document and is known as Optical Character Recognition (OCR). The analysis of the structure of a document by classifying it into its components such as title, figures, tables, main text etc. is of great importance and is the main objective of the project. In the Literature this topic is known as Document Layout Analysis (DLA). This project operates in the area of computer vision and specifically of pattern recognition in as much as documents are generally in PDF format and thus more related to an image than a text document. For the purposes of the system, the objective is not only to classify and locate the components of a text, but also to segment each component so that it can be extracted in an orderly manner. Therefore, Semantic Segmentation appears to be the best model for this purpose. In fact, it is not just an object detection problem, which is the mere identification and localisation of the document components within the same image, but also the capacity to classify the image pixel by pixel. The classification pipeline is initially divided into two consequential steps: layout analysis and text-only analysis. For the solution of the first phase, an end-to-end Convolutional Neural Network (CNN) implementing dilated convolution is used, while for the second phase, an end-to-end multiscale CNN is used; a heuristic within the framework of mathematical morphology is also defined for the same purpose. Finally, the segmentation of all classes simultaneously was achieved by means of another end-to-end CNN model. The final classification allows for the segmentation of both the text and the non-text parts, thus having a final breakdown of the document into: all text parts, tables and images for non-text components and title, authors, abstract, paragraphs and its title, header, footer, notes, caption and finally lists for the segmentation of the text alone. The same classes are found in the simultaneous segmentation of text and non-text components. The comparison with the vast Literature available, explains how this system describes an alternative overall model for DLA.

SAPIENT: Semantic and Automatic Processing of Information about Environment

AMMATURO, ELEONORA

2025

Abstract

The aim of the project, in collaboration with the company Expert.ai, is to implement a new system called Sapient which stands for ‘Semantic and Automatic Processing of Information about Environment’. The project grew out of the need to process a multitude of complex documents, i.e. those containing information in multiple objects such as graphs, tables, etc. For this purpose, it is necessary to segment complex texts into homogeneous areas i.e. to identify the different parts that make up a document with its relative location. This system is part of a broader one able to recognise the characters of a document and is known as Optical Character Recognition (OCR). The analysis of the structure of a document by classifying it into its components such as title, figures, tables, main text etc. is of great importance and is the main objective of the project. In the Literature this topic is known as Document Layout Analysis (DLA). This project operates in the area of computer vision and specifically of pattern recognition in as much as documents are generally in PDF format and thus more related to an image than a text document. For the purposes of the system, the objective is not only to classify and locate the components of a text, but also to segment each component so that it can be extracted in an orderly manner. Therefore, Semantic Segmentation appears to be the best model for this purpose. In fact, it is not just an object detection problem, which is the mere identification and localisation of the document components within the same image, but also the capacity to classify the image pixel by pixel. The classification pipeline is initially divided into two consequential steps: layout analysis and text-only analysis. For the solution of the first phase, an end-to-end Convolutional Neural Network (CNN) implementing dilated convolution is used, while for the second phase, an end-to-end multiscale CNN is used; a heuristic within the framework of mathematical morphology is also defined for the same purpose. Finally, the segmentation of all classes simultaneously was achieved by means of another end-to-end CNN model. The final classification allows for the segmentation of both the text and the non-text parts, thus having a final breakdown of the document into: all text parts, tables and images for non-text components and title, authors, abstract, paragraphs and its title, header, footer, notes, caption and finally lists for the segmentation of the text alone. The same classes are found in the simultaneous segmentation of text and non-text components. The comparison with the vast Literature available, explains how this system describes an alternative overall model for DLA.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI SCIENZE DI BASE ED APPLICATE PER L'INGEGNERIA
			
	Corso di studio
	
				Modelli matematici per l'ingegneria, elettromagnetismo e nanoscienze
			
	Data di pubblicazione
	
				25-set-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Vitulano, Domenico
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				GIACOMELLI, Lorenzo
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				97
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Ammaturo.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 10.45 MB Formato Adobe PDF Visualizza/Apri	10.45 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/312569

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-312569