RELIABLE EVENT DISSEMINATION FOR TIME-SENSIBLE APPLICATIONS OVER WIDE-AREA NETWORKS

Esposito, Christiancarmine

Introduction Context In the recent decades we have witnessed a massive proliferation of the Internet, which succeeded to pervade all our daily activities and to be adopted throughout the entire world. The emergence of the Internet as a general communication channel is considerably affecting the scale of current software systems and deeply transforming the architecture of future critical systems. %In fact, a report, produced by Carnegie Mellon University's Software Engineering Institute (SEI) in June 2006, envisioned how future software systems are going to be architected, introducing the so-called Ultra Large Scale (ULS) systems, which are defined as federations of heterogeneous and independent systems glued together by a middleware solution. Such systems are characterized by (i) billions lines of code, (ii) several users, (iii) large amount of data stored, accessed, manipulated, and refined, (iv) many connections and interdependencies, and (v) extremely-high geographic distribution. Traditionally, a critical system consists of a monolithic, "close world'', architecture, i.e., several computing nodes interconnected by a dedicated network with limited or no connectivity towards the outside world. An example of such traditional architecture is Supervisory Control And Data Acquisition (SCADA), e.g., which is used in several current critical systems such as the control room of power plants or air traffic control systems. However, future critical systems will shift to an innovative federated, ``open world'', architecture, namely Large scale Complex Critical Infrastructure (LCCI), which belongs to the group of the so-called Ultra Large Scale (ULS) systems, which were envisioned in a report produced by Carnegie Mellon University's Software Engineering Institute (SEI) in June 2006. Specifically, an LCCI consists in a dynamic Internet-scale hierarchy / constellation of interacting heterogeneous, inconsistent, and changing systems, which cooperate to perform critical functionalities. Many of the ideas behind LCCIs are increasingly ``in the air'' in several current projects that aim to develop innovative critical systems. For example, EuroCONTROL has funded a project to device the novel European framework for Air Traffic Management (ATM) in Europe, called Single European Sky (SESAR). Current European airspace is fragmented in several areas, each one managed by a single control system. Such traditional ATM approach has been demonstrated to be not suitable to handle the future avionic traffic, so it is going to be substituted by a more integrated approach. In fact, SESAR aims to develop a seamless infrastructure that allows control systems to cooperate each other in order to have a wider vision of the airspace, which is no more limited only to their assigned fragment. As previously stated, traditional critical systems have been characterized by the use of dedicated machines and networks, so hardware and software faults were considered the only threats to the reliability and effectiveness of the system, while communication failures were assumed to be almost improbable to occur. Therefore, in the last decades the research has spent a lot of efforts investigating on how to deal with the former two kind of faults, paying less attention on how to treat communication failures. As a proof of this lack of attention, main standardized, and mature, commercial middleware used in building critical systems do not address them at all, such as Java Message Service (JMS), or provides very basic mechanisms, such as the recent OMG standard called Data Distribution Service (DDS). However, LCCIs cannot use dedicated networks due to their geographical extension, but they adopt wide-area networks that exhibit an availability between 95 percent and a little over 99 percent and do not provide any guarantees on the offered Quality-of-Service (QoS). So, when a federated architecture is adopted to device critical systems, communication failures have a high probability to occur, even greater than hardware and software failures, so guaranteeing an efficient data distribution constitutes the pivotal factor to accomplish the mission of LCCIs. The aim of this thesis is to bring a significant contribution in addressing such issue, with the goal of enabling the definition of novel strategies to support effective communication among several critical systems interconnected over wide-area networks. Problem Statement Mostly all the critical systems fall within the wider class of Monitor and Control (M&C) systems, i.e., the environment is continuously monitored and the system responds appropriately avoiding threats that may lead to losses of human lives and/or money. For example, an Air Traffic Management (ATM) system keeps track of all the flight in a given portion of the airspace (i.e., the sensing part of the system) and may change the routes of those aircraft that risk to collide (i.e., the responding part of the system). Therefore, one of the main measures to assess the effectiveness of a critical system is timeliness, i.e., a treat has to be detected on time in order to perform proper actions to avoid it. For example, a collision has to be detected within a certain time before its likely occurrence so that aircrafts have time to change their route and prevent the collision to happen. So, critical operations account the right answer delivered too late as the wrong answer, and this means that the adopted middleware has to cope with timing failures and to guarantee that deliveries occur within given deadlines, i.e., on-time information dissemination is required. For example, a radar scans a given area of the airspace hundred times in a second, and a control system usually combines the data received by several radars to view the position of all the aircrafts in a given portion of the airspace. If a message produced by a radar reaches an ACC later than 0,6 seconds, it is not usable since the current state of the flights does not match the content of the received message, and the control system that receives it has an out-of-time view of the position of the aircrafts. This can cause disastrous consequences: when late-received radar data are fused with the timely-delivered ones, several false positives and false negatives can be generated through the process of collision detection As previously asserted, message deliveries over wide-area networks exhibit not-negligible bursty loss patterns, i.e., a message has a considerable probability P to be lost during the delivery and the succession of consecutive dropped messages has an average length ABL greater than two. The critical nature of LCCIs demands that messages have to be delivered to all the destinations despite of the faulty behaviour of the network, so the adopted middleware has to provide some means to tolerate the message losses imposed by the network in order to achieve a reliable message distribution. However, the reliability gain is always achieved at the expenses of worsening the predictability of the delivery time and leading to timing failures. Since LCCIs require that messages are guaranteed to be timely delivered to all the interested consumers despite of the occurrence of several failures, it's needed to provide a trade-off between the achievable reliability and timeliness degree. The ultra large scale of LCCIs worsens the already-tough challenge to join reliability and timeliness since several solutions to tolerate message drops exhibit severe scalability limitations. In addiction, since LCCIs are smeared on several networking domains due to their geographic distribution, network conditions, i.e. propagation latency and loss pattern, do not result uniform all over the infrastructure, but the overall LCCI is composed of several portions each one characterized by a particular configuration of the network behaviour. Therefore, the approach "one solution fits all" does not work in the case of LCCIs, but the adopted middleware has to autonomously choose the proper message delivery strategy to the experienced network conditions in order to support a reliable and timely data distribution. Last, wide-area networks do not exhibit a stable behaviour but network conditions continuously change.

RELIABLE EVENT DISSEMINATION FOR TIME-SENSIBLE APPLICATIONS OVER WIDE-AREA NETWORKS

Esposito, Christiancarmine

2009

Abstract

Introduction Context In the recent decades we have witnessed a massive proliferation of the Internet, which succeeded to pervade all our daily activities and to be adopted throughout the entire world. The emergence of the Internet as a general communication channel is considerably affecting the scale of current software systems and deeply transforming the architecture of future critical systems. %In fact, a report, produced by Carnegie Mellon University's Software Engineering Institute (SEI) in June 2006, envisioned how future software systems are going to be architected, introducing the so-called Ultra Large Scale (ULS) systems, which are defined as federations of heterogeneous and independent systems glued together by a middleware solution. Such systems are characterized by (i) billions lines of code, (ii) several users, (iii) large amount of data stored, accessed, manipulated, and refined, (iv) many connections and interdependencies, and (v) extremely-high geographic distribution. Traditionally, a critical system consists of a monolithic, "close world'', architecture, i.e., several computing nodes interconnected by a dedicated network with limited or no connectivity towards the outside world. An example of such traditional architecture is Supervisory Control And Data Acquisition (SCADA), e.g., which is used in several current critical systems such as the control room of power plants or air traffic control systems. However, future critical systems will shift to an innovative federated, ``open world'', architecture, namely Large scale Complex Critical Infrastructure (LCCI), which belongs to the group of the so-called Ultra Large Scale (ULS) systems, which were envisioned in a report produced by Carnegie Mellon University's Software Engineering Institute (SEI) in June 2006. Specifically, an LCCI consists in a dynamic Internet-scale hierarchy / constellation of interacting heterogeneous, inconsistent, and changing systems, which cooperate to perform critical functionalities. Many of the ideas behind LCCIs are increasingly ``in the air'' in several current projects that aim to develop innovative critical systems. For example, EuroCONTROL has funded a project to device the novel European framework for Air Traffic Management (ATM) in Europe, called Single European Sky (SESAR). Current European airspace is fragmented in several areas, each one managed by a single control system. Such traditional ATM approach has been demonstrated to be not suitable to handle the future avionic traffic, so it is going to be substituted by a more integrated approach. In fact, SESAR aims to develop a seamless infrastructure that allows control systems to cooperate each other in order to have a wider vision of the airspace, which is no more limited only to their assigned fragment. As previously stated, traditional critical systems have been characterized by the use of dedicated machines and networks, so hardware and software faults were considered the only threats to the reliability and effectiveness of the system, while communication failures were assumed to be almost improbable to occur. Therefore, in the last decades the research has spent a lot of efforts investigating on how to deal with the former two kind of faults, paying less attention on how to treat communication failures. As a proof of this lack of attention, main standardized, and mature, commercial middleware used in building critical systems do not address them at all, such as Java Message Service (JMS), or provides very basic mechanisms, such as the recent OMG standard called Data Distribution Service (DDS). However, LCCIs cannot use dedicated networks due to their geographical extension, but they adopt wide-area networks that exhibit an availability between 95 percent and a little over 99 percent and do not provide any guarantees on the offered Quality-of-Service (QoS). So, when a federated architecture is adopted to device critical systems, communication failures have a high probability to occur, even greater than hardware and software failures, so guaranteeing an efficient data distribution constitutes the pivotal factor to accomplish the mission of LCCIs. The aim of this thesis is to bring a significant contribution in addressing such issue, with the goal of enabling the definition of novel strategies to support effective communication among several critical systems interconnected over wide-area networks. Problem Statement Mostly all the critical systems fall within the wider class of Monitor and Control (M&C) systems, i.e., the environment is continuously monitored and the system responds appropriately avoiding threats that may lead to losses of human lives and/or money. For example, an Air Traffic Management (ATM) system keeps track of all the flight in a given portion of the airspace (i.e., the sensing part of the system) and may change the routes of those aircraft that risk to collide (i.e., the responding part of the system). Therefore, one of the main measures to assess the effectiveness of a critical system is timeliness, i.e., a treat has to be detected on time in order to perform proper actions to avoid it. For example, a collision has to be detected within a certain time before its likely occurrence so that aircrafts have time to change their route and prevent the collision to happen. So, critical operations account the right answer delivered too late as the wrong answer, and this means that the adopted middleware has to cope with timing failures and to guarantee that deliveries occur within given deadlines, i.e., on-time information dissemination is required. For example, a radar scans a given area of the airspace hundred times in a second, and a control system usually combines the data received by several radars to view the position of all the aircrafts in a given portion of the airspace. If a message produced by a radar reaches an ACC later than 0,6 seconds, it is not usable since the current state of the flights does not match the content of the received message, and the control system that receives it has an out-of-time view of the position of the aircrafts. This can cause disastrous consequences: when late-received radar data are fused with the timely-delivered ones, several false positives and false negatives can be generated through the process of collision detection As previously asserted, message deliveries over wide-area networks exhibit not-negligible bursty loss patterns, i.e., a message has a considerable probability P to be lost during the delivery and the succession of consecutive dropped messages has an average length ABL greater than two. The critical nature of LCCIs demands that messages have to be delivered to all the destinations despite of the faulty behaviour of the network, so the adopted middleware has to provide some means to tolerate the message losses imposed by the network in order to achieve a reliable message distribution. However, the reliability gain is always achieved at the expenses of worsening the predictability of the delivery time and leading to timing failures. Since LCCIs require that messages are guaranteed to be timely delivered to all the interested consumers despite of the occurrence of several failures, it's needed to provide a trade-off between the achievable reliability and timeliness degree. The ultra large scale of LCCIs worsens the already-tough challenge to join reliability and timeliness since several solutions to tolerate message drops exhibit severe scalability limitations. In addiction, since LCCIs are smeared on several networking domains due to their geographic distribution, network conditions, i.e. propagation latency and loss pattern, do not result uniform all over the infrastructure, but the overall LCCI is composed of several portions each one characterized by a particular configuration of the network behaviour. Therefore, the approach "one solution fits all" does not work in the case of LCCIs, but the adopted middleware has to autonomously choose the proper message delivery strategy to the experienced network conditions in order to support a reliable and timely data distribution. Last, wide-area networks do not exhibit a stable behaviour but network conditions continuously change.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2009
			
	Lingua
	
				it
			
	Collezione di appartenenza
	
				BNCF

File in questo prodotto:

File	Dimensione	Formato
Esposito.pdf accesso solo da BNCF e BNCR Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati Dimensione 18.88 MB Formato Adobe PDF	18.88 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/331240

Il codice NBN di questa tesi è URN:NBN:IT:BNCF-331240