The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.

The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.

Mining Git based Software Repositories

ROVEDA, GIANLUCA
2018

Abstract

The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.
1-mar-2018
Inglese
The proposed thesis analyzes on the methods of analyzing Git-based software repositories, and focuses on mining GitHub based repositories. The introduction includes a summary of VCS history and usage, with many details that cover the interactions between users, Git and GitHub. The “related works” chapter presents the methods used by other researchers to mine Git data. Four different datasets are used in this thesis: the historical MSR14, and other three ad-hoc collected datasets. The datasets are analyzed and compared, both for a deeper understanding of the data and to validate the three “new” datasets with the already researched MSR14. Interesting findings are presented, and include considerations on the identification of a commit author. The main objective of the thesis is to present a graphic approach to analyze the interactions between different repositories through users. The results are shown to the researcher as an animated network graph through Gephi. Many examples are shown to investigate on the approach performances and capabilities, and are compared to expert knowledge on the repositories.
FACCHINETTI, TULLIO
Università degli studi di Pavia
File in questo prodotto:
File Dimensione Formato  
PhD thesis Roveda_revised.pdf

accesso aperto

Dimensione 6.24 MB
Formato Adobe PDF
6.24 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/84432
Il codice NBN di questa tesi è URN:NBN:IT:UNIPV-84432