Code analysis is a key topic for improving software quality and efficiency. This analysis becomes even more important for securing code against potential cyber-attacks. However, manual analysis of code, especially for the binary one, is complicated and error-prone. Therefore, the investigation of new automatic techniques for code analysis is research topic of great interest. As suggested by the "naturalness hypothesis", the code exhibits similar statistical properties to natural languages. As a consequence, techniques used for natural language processing can be also applied to analyze source and binary code. For this reason, recent research applies neural language models on code analysis, achieving significant results. In line with this research trend, the two contributions of the thesis are focused on the application of deep learning to analysis of code written in high-level and low-level programming languages. The first contribution of the thesis introduces a benchmark designed to evaluate models for binary code representation. The tool can be used to test and compare the performance of these models on various binary function tasks. The second contribution, on the other hand, focuses on the application of neural networks for analyzing source code. The contribution investigates the application of neural language models for detecting code smells, that represent poor design choices potentially impacting the code quality.
Application of language models on code analysis
CONSOLE, FRANCESCA
2024
Abstract
Code analysis is a key topic for improving software quality and efficiency. This analysis becomes even more important for securing code against potential cyber-attacks. However, manual analysis of code, especially for the binary one, is complicated and error-prone. Therefore, the investigation of new automatic techniques for code analysis is research topic of great interest. As suggested by the "naturalness hypothesis", the code exhibits similar statistical properties to natural languages. As a consequence, techniques used for natural language processing can be also applied to analyze source and binary code. For this reason, recent research applies neural language models on code analysis, achieving significant results. In line with this research trend, the two contributions of the thesis are focused on the application of deep learning to analysis of code written in high-level and low-level programming languages. The first contribution of the thesis introduces a benchmark designed to evaluate models for binary code representation. The tool can be used to test and compare the performance of these models on various binary function tasks. The second contribution, on the other hand, focuses on the application of neural networks for analyzing source code. The contribution investigates the application of neural language models for detecting code smells, that represent poor design choices potentially impacting the code quality.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Console.pdf
accesso aperto
Dimensione
3.08 MB
Formato
Adobe PDF
|
3.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/164463
URN:NBN:IT:UNIROMA1-164463