Classification of land cover in urban and periurban regions using supervised machine learning algorithms

Cecili, Giulia

Accurate classification of different land cover (LC) types is considered a prerequisite for effective and well-informed land monitoring. This process plays an essential role in understanding environmental dynamics and analysing potential anthropogenic disturbances to ecosystems. In particular, urban and peri-urban areas require detailed and timely monitoring as they are critical nodes of human activities. Despite considerable progress in the acquisition and processing of near-real-time remote sensing imagery, production times for land cover products are still long. This highlights the urgent need to develop more timely and efficient solutions. The current technological environment, driven by significant advances in the field of Machine Learning (ML), has opened up new possibilities for producing LC data at high spatial resolution and at a higher temporal cadence than in the past. Automation of machine learnings emerging as a promising avenue, enabling in-depth analysis with significant time and energy savings, although the initial resources required can be substantial. The research aims to develop an efficient methodology for processing large volumes of data, generating information layers with high spatial resolution and update rate. It is intended to demonstrate that the latest Machine Learning techniques, particularly Deep Learning techniques, can be highly effective tools for ensuring speed, accuracy, and processing power that LC data need. At the same time, the aim is to maintain these resources accessible to a wide audience. In the first phase of the research, an in-depth analysis of the existing land cover and land use products for the Italian territory was carried out. At national and European level, the products of ISPRA and Copernicus Land Monitoring Service emerge as leading figures, with efforts on different focuses and themes. This commitment contributes significantly to a deeper understanding of spatial dynamics and related environmental challenges. In contrast, the global products of Google and ESRI are of recent introduction and are characterised by their ML-based development. While they show considerable potential, they also have several limitations, including resource requirements that are onerous in several respects. The overall analysis revealed a wide range of techniques in use, underlining the complexity and richness of the context studied. In parallel, a comprehensive review of the main ML models for remote sensing applications was carried out. The analysis highlighted the considerable variety of approaches and the dynamic evolution of the field. The basic characteristics, advantages and limitations of these models were analysed in detail. The recent introduction of these technologies in the field of land cover has made it difficult to identify a common line between existing studies, highlighting the lack of a standardised approach. In addition to traditional algorithms such as Random Forest, Support Vector Machine and K-Nearest Neighbour, etc., more efficient models have recently emerged. The most prominent of these are the variants of artificial neural networks, with Convolutional Neural Networks (CNNs) as a prominent example in image analysis. A methodology that is versatile, cost-effective, and computationally efficient has been created to address this challenge. It was developed gradually, involving the exploration of different CNNs and data processing techniques. The method is designed to align with both national and European activities. Initially, the method adopted a classification system based on the semantics defined by ISPRA for the study of land consumption. An extended version was subsequently implemented to include a greater variety of LC classes, following the European specifications of the EAGLE group. These classes include artificial abiotic, natural abiotic, woody vegetation, herbaceous vegetation, and water surfaces. The research is distinctive for its use of freely available data for training, specifically the dataset that serves as reference information for CNNs in identifying the various land covers. For this purpose, widely recognised maps such as the ISPRA Land Consumption Map and the Copernicus Urban Atlas were employed. The methodology is described in detail in two articles published in the journal 'Land' and a third article has been submitted to the same journal. The content of these articles has been integrated into the thesis. In addition, part of the study is documented in the various editions of the ISPRA reports entitled ‘Land Consumption, territorial dynamics and ecosystem services’. A first focus was the mapping of consumed land in some areas of the Tuscany region, distributed between Siena, Arezzo and Florence. This analysis was carried out using a CNN model called ResNet50 on high-resolution aerial images. The selection of this model was based on a careful literature review. Binary and multi-class experiments were carried out, clearly demonstrating the superiority of the former approach, which achieved accuracies of up to 98%. The method showed considerable potential, in some cases exceeding the effectiveness of the ISPRA National Land Consumption Map in identifying permanently consumed land. The result underlines the potential of the model, especially in urban and peri-urban areas. This preliminary study was the subject of a first publication in the MDPI Land journal and served to establish the basic criteria for further development. The following phase of the project aimed to classify Copernicus Sentinel-2 satellite data to produce land cover maps for the cities of Rome and Pescara. Both single and multi-temporal approaches were used, taking advantage of the reduced revisit frequency of the Sentinel satellites. The LC map of Rome was the focus of a paper published in the journal 'Land'. For this study, a marginal pre-processing of the data was carried out. The satellite images were subjected to principal component analysis. As training data, the ISPRA Land consumption map for the city of Rome, which had previously been reclassified, was used. In the mapping phase, three sophisticated and widely studied algorithms were compared: ResNet50, chosen to ensure continuity and to provide a term of comparison with previous work, together with VGG16 and DenseNet121. The most promising model proved to be VGG16, with both single date and multi-temporal images. The trained model achieved an overall accuracy of up to 76% and was used to automatically generate an EAGLE-compliant land cover map of Rome for the year 2019. The results show that the overall results are promising, especially in vegetated areas, suggesting a significant potential application for monitoring urban green spaces. Subsequent experiments were conducted on Pescara, incorporating the NDVI and NDWI spectral indices into the studies. This new methodology adopted a distinctive approach compared to the previous one. A CNN algorithm with a simple structure was used, and special attention was given to data fitting, selection and cleaning techniques. Improvements included a gradual selection of Sentinel-2 bands, elimination of uncertain and noisy elements, and weight balancing of classes. Additionally, the training dataset was generated by merging the ISPRA Land Consumption Map with Urban Atlas. The 2021 results for Pescara city confirm the high accuracies achieved in the Rome study, with an overall accuracy of 75%. Moreover, an enhancement in identifying abiotic artificial surfaces is demonstrated. The results of the first two studies show that advanced algorithms can perform well if their parameters are appropriately optimised, reducing the need for intensive pre-processing of input data. However, it is important to note that neural networks with complex structures require considerable computational resources, which may not be sustainable. On the other hand, the Pescara study represents a significant advance, highlighting the importance of paying particular attention to the data used. This approach plays a crucial role in the accurate training of a neural network, even with a simpler structure, guaranteeing positive results not only in terms of accuracy, but also in terms of training time and optimal use of computational and energy resources. In general, the overall results of all three studies appear promising, but further experiments will be necessary to confirm and refine the results obtained.

Classification of land cover in urban and periurban regions using supervised machine learning algorithms

CECILI, Giulia

2024

Abstract

Accurate classification of different land cover (LC) types is considered a prerequisite for effective and well-informed land monitoring. This process plays an essential role in understanding environmental dynamics and analysing potential anthropogenic disturbances to ecosystems. In particular, urban and peri-urban areas require detailed and timely monitoring as they are critical nodes of human activities. Despite considerable progress in the acquisition and processing of near-real-time remote sensing imagery, production times for land cover products are still long. This highlights the urgent need to develop more timely and efficient solutions. The current technological environment, driven by significant advances in the field of Machine Learning (ML), has opened up new possibilities for producing LC data at high spatial resolution and at a higher temporal cadence than in the past. Automation of machine learnings emerging as a promising avenue, enabling in-depth analysis with significant time and energy savings, although the initial resources required can be substantial. The research aims to develop an efficient methodology for processing large volumes of data, generating information layers with high spatial resolution and update rate. It is intended to demonstrate that the latest Machine Learning techniques, particularly Deep Learning techniques, can be highly effective tools for ensuring speed, accuracy, and processing power that LC data need. At the same time, the aim is to maintain these resources accessible to a wide audience. In the first phase of the research, an in-depth analysis of the existing land cover and land use products for the Italian territory was carried out. At national and European level, the products of ISPRA and Copernicus Land Monitoring Service emerge as leading figures, with efforts on different focuses and themes. This commitment contributes significantly to a deeper understanding of spatial dynamics and related environmental challenges. In contrast, the global products of Google and ESRI are of recent introduction and are characterised by their ML-based development. While they show considerable potential, they also have several limitations, including resource requirements that are onerous in several respects. The overall analysis revealed a wide range of techniques in use, underlining the complexity and richness of the context studied. In parallel, a comprehensive review of the main ML models for remote sensing applications was carried out. The analysis highlighted the considerable variety of approaches and the dynamic evolution of the field. The basic characteristics, advantages and limitations of these models were analysed in detail. The recent introduction of these technologies in the field of land cover has made it difficult to identify a common line between existing studies, highlighting the lack of a standardised approach. In addition to traditional algorithms such as Random Forest, Support Vector Machine and K-Nearest Neighbour, etc., more efficient models have recently emerged. The most prominent of these are the variants of artificial neural networks, with Convolutional Neural Networks (CNNs) as a prominent example in image analysis. A methodology that is versatile, cost-effective, and computationally efficient has been created to address this challenge. It was developed gradually, involving the exploration of different CNNs and data processing techniques. The method is designed to align with both national and European activities. Initially, the method adopted a classification system based on the semantics defined by ISPRA for the study of land consumption. An extended version was subsequently implemented to include a greater variety of LC classes, following the European specifications of the EAGLE group. These classes include artificial abiotic, natural abiotic, woody vegetation, herbaceous vegetation, and water surfaces. The research is distinctive for its use of freely available data for training, specifically the dataset that serves as reference information for CNNs in identifying the various land covers. For this purpose, widely recognised maps such as the ISPRA Land Consumption Map and the Copernicus Urban Atlas were employed. The methodology is described in detail in two articles published in the journal 'Land' and a third article has been submitted to the same journal. The content of these articles has been integrated into the thesis. In addition, part of the study is documented in the various editions of the ISPRA reports entitled ‘Land Consumption, territorial dynamics and ecosystem services’. A first focus was the mapping of consumed land in some areas of the Tuscany region, distributed between Siena, Arezzo and Florence. This analysis was carried out using a CNN model called ResNet50 on high-resolution aerial images. The selection of this model was based on a careful literature review. Binary and multi-class experiments were carried out, clearly demonstrating the superiority of the former approach, which achieved accuracies of up to 98%. The method showed considerable potential, in some cases exceeding the effectiveness of the ISPRA National Land Consumption Map in identifying permanently consumed land. The result underlines the potential of the model, especially in urban and peri-urban areas. This preliminary study was the subject of a first publication in the MDPI Land journal and served to establish the basic criteria for further development. The following phase of the project aimed to classify Copernicus Sentinel-2 satellite data to produce land cover maps for the cities of Rome and Pescara. Both single and multi-temporal approaches were used, taking advantage of the reduced revisit frequency of the Sentinel satellites. The LC map of Rome was the focus of a paper published in the journal 'Land'. For this study, a marginal pre-processing of the data was carried out. The satellite images were subjected to principal component analysis. As training data, the ISPRA Land consumption map for the city of Rome, which had previously been reclassified, was used. In the mapping phase, three sophisticated and widely studied algorithms were compared: ResNet50, chosen to ensure continuity and to provide a term of comparison with previous work, together with VGG16 and DenseNet121. The most promising model proved to be VGG16, with both single date and multi-temporal images. The trained model achieved an overall accuracy of up to 76% and was used to automatically generate an EAGLE-compliant land cover map of Rome for the year 2019. The results show that the overall results are promising, especially in vegetated areas, suggesting a significant potential application for monitoring urban green spaces. Subsequent experiments were conducted on Pescara, incorporating the NDVI and NDWI spectral indices into the studies. This new methodology adopted a distinctive approach compared to the previous one. A CNN algorithm with a simple structure was used, and special attention was given to data fitting, selection and cleaning techniques. Improvements included a gradual selection of Sentinel-2 bands, elimination of uncertain and noisy elements, and weight balancing of classes. Additionally, the training dataset was generated by merging the ISPRA Land Consumption Map with Urban Atlas. The 2021 results for Pescara city confirm the high accuracies achieved in the Rome study, with an overall accuracy of 75%. Moreover, an enhancement in identifying abiotic artificial surfaces is demonstrated. The results of the first two studies show that advanced algorithms can perform well if their parameters are appropriately optimised, reducing the need for intensive pre-processing of input data. However, it is important to note that neural networks with complex structures require considerable computational resources, which may not be sustainable. On the other hand, the Pescara study represents a significant advance, highlighting the importance of paying particular attention to the data used. This approach plays a crucial role in the accurate training of a neural network, even with a simpler structure, guaranteeing positive results not only in terms of accuracy, but also in terms of training time and optimal use of computational and energy resources. In general, the overall results of all three studies appear promising, but further experiments will be necessary to confirm and refine the results obtained.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI BIOSCIENZE E TERRITORIO
			
	Corso di studio
	
				Dottorato di Ricerca in Ecologia e Territorio
			
	Data di pubblicazione
	
				6-giu-2024
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				MARCHETTI, Marco
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				LASSERRE, Bruno
			
	Nome Editore
	
				Università degli studi del Molise
			
	Collezione di appartenenza
	
				Università degli Studi del Molise

File in questo prodotto:

File	Dimensione	Formato
Tesi_G_Cecili.pdf accesso aperto Dimensione 6.35 MB Formato Adobe PDF Visualizza/Apri	6.35 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/189272

Il codice NBN di questa tesi è URN:NBN:IT:UNIMOL-189272