Machine learning for Network Intrusion Detection

Abaimov, Stanislav

Rapidly advancing cyber technologies have been assisting threat actors in offensive cyber operations since the creation of computers, computer networks and computerized control systems. Exponentially evolving infiltration techniques and publicly available hacking tools facilitate implementation of attacks and increase their variability. Though even AI-empowered, modern cyber defence software does not provide ultimate protection, and innovative multidisciplinary solutions are required to enhance cyber safety and security globally, especially for the strategic Chemical Biological Radio Nuclear and explosives (CBRNe) infrastructure. The thesis proposes innovative solutions to enhance network security through strengthening network intrusion detection systems. The area of studies covers network engineering, cyber security, secure programming, artificial neural networks, and industrial control systems. The research methodology involves the use of deep neural networks as a primary tool for analysis and detection of malicious network attacks. The first chapter presents a survey, that explores the existing knowledge and selected practices in data preprocessing in the context of Network Intrusion Detection using Machine Learning approach. Preprocessing of data for training is predominantly used to reduce complexity of input data and improve accuracy and precision of the final system by aiding the neural network. We review the difference between data pretreatment and preparation for training and data preprocessing for improvement of the training process and output accuracy. To visually present the findings and suggested classification of data preprocessing methods, the research explores code injection attacks, as one of the specific cases in network security. The second chapter focuses on Code Injection attacks, such as SQL Injection and Cross-Site Scripting (XSS), that are among the major threats for today's web applications and systems. The thesis proposes Code-injection Detection with Deep Learning (CODDLE), a Deep Learningbased intrusion detection system against web-based code injection attacks. CODDLE's main novelty consists in adopting a Convolutional Deep Neural Network (CNN) and in improving its effectiveness via a tailored pre-processing stage which encodes SQL/XSS-related symbols into type/value pairs. Numerical experiments performed on real-world datasets for both SQL and XSS attacks show that, with an identical training and with the same neural network shape, CODDLE's type/value encoding improves the detection rate from a baseline of about 75% up to 95% accuracy, 99% precision, and a 92% recall value. The third chapter is dedicated to the training of the Deep Neural Networks(DNN). We research the problem of memory degradation in intrusion detection implementation using CICIDS2017 dataset. We observe different neural network training configurations and statistics. The research reveals that with any new training the effect of minor memory degradation persists. The research also reveals that out of numerous attack vectors, a few specific types of attacks cause regression faults. The fourth chapter researches the ways to use multiple DNNs in order to transcend the limitations of a single DNN. A single neural network may not be always sufficient, may not train further than a certain accuracy, may be uncertain about many patterns, or may simply take too much computational power to train. We use multiple neural networks in identification of gradually more complex tasks, while reducing the input data set and changing the threshold of the post-processing identification values, allows for cyber attack detection. We test the method for intrusion detection using CICIDS2017 data set. Our method results in 99.1-99.6% accuracy with an ensemble of five DNNs below 96% accuracy each. The research revealed a few limitations of the suggested methods. Thus, the CODDLE preprocessing is irreversible in most of the cases and has collisions if the injection patters are similar. The Memory Regression Fault cannot be addressed by removing or adding samples of a specific attack type or changing the shape of the classifier. The Neural Network Ensembles require additional computational resources for training and may not be optimal for specific applications due to the time invested in training of additional neural networks. The research encounters the issues of the datasets scarcity, that are addressed with the use of different types of preprocessing and the use of neural network ensembling. The thesis also suggests selected directions for the potential future research, e.g. vulnerable code detection using DNN, vulnerability of preprocessing algorithms to code injection, and various types of DNN for ensembles for Network Intrusion Detection against specific types of attacks. The additional research of generative adversarial approaches to dataset expansion in the context of network security could contribute to the body of science. The research concludes with high contribution of Machine Learning methods in the area of network intrusion detection.

Machine learning for Network Intrusion Detection

ABAIMOV, STANISLAV

2020

Abstract

Rapidly advancing cyber technologies have been assisting threat actors in offensive cyber operations since the creation of computers, computer networks and computerized control systems. Exponentially evolving infiltration techniques and publicly available hacking tools facilitate implementation of attacks and increase their variability. Though even AI-empowered, modern cyber defence software does not provide ultimate protection, and innovative multidisciplinary solutions are required to enhance cyber safety and security globally, especially for the strategic Chemical Biological Radio Nuclear and explosives (CBRNe) infrastructure. The thesis proposes innovative solutions to enhance network security through strengthening network intrusion detection systems. The area of studies covers network engineering, cyber security, secure programming, artificial neural networks, and industrial control systems. The research methodology involves the use of deep neural networks as a primary tool for analysis and detection of malicious network attacks. The first chapter presents a survey, that explores the existing knowledge and selected practices in data preprocessing in the context of Network Intrusion Detection using Machine Learning approach. Preprocessing of data for training is predominantly used to reduce complexity of input data and improve accuracy and precision of the final system by aiding the neural network. We review the difference between data pretreatment and preparation for training and data preprocessing for improvement of the training process and output accuracy. To visually present the findings and suggested classification of data preprocessing methods, the research explores code injection attacks, as one of the specific cases in network security. The second chapter focuses on Code Injection attacks, such as SQL Injection and Cross-Site Scripting (XSS), that are among the major threats for today's web applications and systems. The thesis proposes Code-injection Detection with Deep Learning (CODDLE), a Deep Learningbased intrusion detection system against web-based code injection attacks. CODDLE's main novelty consists in adopting a Convolutional Deep Neural Network (CNN) and in improving its effectiveness via a tailored pre-processing stage which encodes SQL/XSS-related symbols into type/value pairs. Numerical experiments performed on real-world datasets for both SQL and XSS attacks show that, with an identical training and with the same neural network shape, CODDLE's type/value encoding improves the detection rate from a baseline of about 75% up to 95% accuracy, 99% precision, and a 92% recall value. The third chapter is dedicated to the training of the Deep Neural Networks(DNN). We research the problem of memory degradation in intrusion detection implementation using CICIDS2017 dataset. We observe different neural network training configurations and statistics. The research reveals that with any new training the effect of minor memory degradation persists. The research also reveals that out of numerous attack vectors, a few specific types of attacks cause regression faults. The fourth chapter researches the ways to use multiple DNNs in order to transcend the limitations of a single DNN. A single neural network may not be always sufficient, may not train further than a certain accuracy, may be uncertain about many patterns, or may simply take too much computational power to train. We use multiple neural networks in identification of gradually more complex tasks, while reducing the input data set and changing the threshold of the post-processing identification values, allows for cyber attack detection. We test the method for intrusion detection using CICIDS2017 data set. Our method results in 99.1-99.6% accuracy with an ensemble of five DNNs below 96% accuracy each. The research revealed a few limitations of the suggested methods. Thus, the CODDLE preprocessing is irreversible in most of the cases and has collisions if the injection patters are similar. The Memory Regression Fault cannot be addressed by removing or adding samples of a specific attack type or changing the shape of the classifier. The Neural Network Ensembles require additional computational resources for training and may not be optimal for specific applications due to the time invested in training of additional neural networks. The research encounters the issues of the datasets scarcity, that are addressed with the use of different types of preprocessing and the use of neural network ensembling. The thesis also suggests selected directions for the potential future research, e.g. vulnerable code detection using DNN, vulnerability of preprocessing algorithms to code injection, and various types of DNN for ensembles for Network Intrusion Detection against specific types of attacks. The additional research of generative adversarial approaches to dataset expansion in the context of network security could contribute to the body of science. The research concludes with high contribution of Machine Learning methods in the area of network intrusion detection.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Ingegneria elettronica
			
	Data di pubblicazione
	
				2020
			
	Lingua
	
				Inglese
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				BIANCHI, GIUSEPPE
			
	Nome Editore
	
				Università degli Studi di Roma "Tor Vergata"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma Tor Vergata

File in questo prodotto:

File	Dimensione	Formato
ABAIMOV_Stanislav_2020_03_10_Thesis.pdf accesso solo da BNCF e BNCR Dimensione 1.4 MB Formato Adobe PDF	1.4 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/199463

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA2-199463