META LEARNING IN PROCESS MINING: TOWARD A SYSTEMATIC APPROACH TO DESIGN DATA ANALYTICS PIPELINES WITH EVENT LOGS

MARQUES TAVARES, Gabriel

With the democratization of computational resources, organizations pay great attention to recording the execution of internal procedures to improve the quality of their services. Modern information systems track and record in-depth data regarding activities performed within their business processes. The event data describes the actual process performance along with several possible attributes. Process mining stands as a set of techniques that leverage insights from event data. With that, organizations can employ process mining-based approaches to understand the processes' behavior, increase the value of services, save resources and improve execution time. Given the multitude of tasks within process mining and the plethora of algorithms and solutions, deciding which methods to apply is a complex effort. Notwithstanding that process mining techniques have now achieved the maturity level to cover the entire stack of the data science pipeline, from raw data to decisions. On the one hand, stakeholders detain business and domain knowledge. On the other hand, they often lack the technical expertise to guide choices. Moreover, many tasks require the application of a combination of several steps, i.e., a pipeline. Designing a suitable pipeline becomes then a complex task, enhanced by the fact that domain experts and technical experts are often not the same people. In this thesis, we propose a task-agnostic framework to automate the design of process mining pipelines. Considering that there is no optimal pipeline for every observable phenomenon, we start from the hypothesis that process behavior might indicate which steps or algorithms are better suited. For that, we rely on a meta-learning approach that maps the relationships between event data and suitable solutions. The application of the proposed framework generates two main contributions. First, given a business process (event log) and a task (e.g., process discovery, trace clustering, anomaly detection), a user can retrieve a pipeline recommendation that best matches the underlying process behavior. The second byproduct of the framework is a systematic mapping of the relationship between event log characteristics and optimal pipelines. This mapping provides experts and data analysts with a solid foundation to better understand the task at hand. That is an enlightenment of the correlation between the problem space (event data), algorithm space (process mining pipelines), and performance space (quality criteria). We instantiated the framework in three different process tasks. Results indicate that indeed there is a relationship between the different spaces since guided recommendations overcome the current baselines. Therefore, showing the importance of investigating guided solutions and that mapping the spaces can be of interest to organizations. Moreover, we investigate which process features are most decisive for each problem. The presented solution is also suitable for users of different knowledge levels. When applying the framework, an inexperienced user has data-based pipeline recommendations whereas an expert is provided with a quantitative mapping that can be used to leverage the knowledge regarding the process task.

META LEARNING IN PROCESS MINING: TOWARD A SYSTEMATIC APPROACH TO DESIGN DATA ANALYTICS PIPELINES WITH EVENT LOGS

MARQUES TAVARES, GABRIEL

2023

Abstract

With the democratization of computational resources, organizations pay great attention to recording the execution of internal procedures to improve the quality of their services. Modern information systems track and record in-depth data regarding activities performed within their business processes. The event data describes the actual process performance along with several possible attributes. Process mining stands as a set of techniques that leverage insights from event data. With that, organizations can employ process mining-based approaches to understand the processes' behavior, increase the value of services, save resources and improve execution time. Given the multitude of tasks within process mining and the plethora of algorithms and solutions, deciding which methods to apply is a complex effort. Notwithstanding that process mining techniques have now achieved the maturity level to cover the entire stack of the data science pipeline, from raw data to decisions. On the one hand, stakeholders detain business and domain knowledge. On the other hand, they often lack the technical expertise to guide choices. Moreover, many tasks require the application of a combination of several steps, i.e., a pipeline. Designing a suitable pipeline becomes then a complex task, enhanced by the fact that domain experts and technical experts are often not the same people. In this thesis, we propose a task-agnostic framework to automate the design of process mining pipelines. Considering that there is no optimal pipeline for every observable phenomenon, we start from the hypothesis that process behavior might indicate which steps or algorithms are better suited. For that, we rely on a meta-learning approach that maps the relationships between event data and suitable solutions. The application of the proposed framework generates two main contributions. First, given a business process (event log) and a task (e.g., process discovery, trace clustering, anomaly detection), a user can retrieve a pipeline recommendation that best matches the underlying process behavior. The second byproduct of the framework is a systematic mapping of the relationship between event log characteristics and optimal pipelines. This mapping provides experts and data analysts with a solid foundation to better understand the task at hand. That is an enlightenment of the correlation between the problem space (event data), algorithm space (process mining pipelines), and performance space (quality criteria). We instantiated the framework in three different process tasks. Results indicate that indeed there is a relationship between the different spaces since guided recommendations overcome the current baselines. Therefore, showing the importance of investigating guided solutions and that mapping the spaces can be of interest to organizations. Moreover, we investigate which process features are most decisive for each problem. The presented solution is also suitable for users of different knowledge levels. When applying the framework, an inexperienced user has data-based pipeline recommendations whereas an expert is provided with a quantitative mapping that can be used to leverage the knowledge regarding the process task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Informatica Giovanni Degli Antoni
			
	Corso di studio
	
				INFORMATICA
			
	Data di pubblicazione
	
				27-apr-2023
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CERAVOLO, PAOLO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				SASSI, ROBERTO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R12636.pdf accesso aperto Dimensione 4.03 MB Formato Adobe PDF Visualizza/Apri	4.03 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/82105

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-82105