The goal of this thesis is to understand the impact that the transition from analogue to born-digital sources will have on the way historians collect, analyse and select primary evidences. This thesis aims in particular at addressing the simultaneous scarcity and abundance of digital materials and at dealing with these issues by combining the historical method with methodologies from the fields of internet studies and natural language processing. The case study of this work is focused on recollecting sources on the recent past of Italian academic institutions, with specific attention to the University of Bologna. The dissertation is organised in three main parts. Part I offers an extensive overview of the academic background where this thesis is settled. Next, the so-called scarcity issue is addressed, by considering university websites as primary sources for the study of the recent past of academic institutions. With a combination of traditional sources and methods together with solutions from the field of internet studies, Part II presents how the digital past of the University of Bologna has been reconstructed. The collected resources allowed to address the second issue, namely the large abundance of born-digital sources. Part III focuses on collecting, analysing and selecting materials from large collections of academic publications. In particular, it is remarked on the importance of adopting methods from the field of natural language processing in a highly critical way. This point is stressed by presenting a case-study focused on identifying interdisciplinary collaborations through the analysis of a corpus of Ph.D. dissertations. Based on the case-studies presented, the final part of the dissertation describes how this work intends to be a contribution both to the research in digital humanities and in historiography.

The Web as a Historical Corpus: Collecting, Analysing and Selecting Sources on the Recent Past of Academic Institutions

2017

Abstract

The goal of this thesis is to understand the impact that the transition from analogue to born-digital sources will have on the way historians collect, analyse and select primary evidences. This thesis aims in particular at addressing the simultaneous scarcity and abundance of digital materials and at dealing with these issues by combining the historical method with methodologies from the fields of internet studies and natural language processing. The case study of this work is focused on recollecting sources on the recent past of Italian academic institutions, with specific attention to the University of Bologna. The dissertation is organised in three main parts. Part I offers an extensive overview of the academic background where this thesis is settled. Next, the so-called scarcity issue is addressed, by considering university websites as primary sources for the study of the recent past of academic institutions. With a combination of traditional sources and methods together with solutions from the field of internet studies, Part II presents how the digital past of the University of Bologna has been reconstructed. The collected resources allowed to address the second issue, namely the large abundance of born-digital sources. Part III focuses on collecting, analysing and selecting materials from large collections of academic publications. In particular, it is remarked on the importance of adopting methods from the field of natural language processing in a highly critical way. This point is stressed by presenting a case-study focused on identifying interdisciplinary collaborations through the analysis of a corpus of Ph.D. dissertations. Based on the case-studies presented, the final part of the dissertation describes how this work intends to be a contribution both to the research in digital humanities and in historiography.
2017
it
File in questo prodotto:
File Dimensione Formato  
nanni_federico_tesi.pdf

accesso solo da BNCF e BNCR

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati
Dimensione 13.26 MB
Formato Adobe PDF
13.26 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/322391
Il codice NBN di questa tesi è URN:NBN:IT:BNCF-322391