The wide spread adoption of IoT technologies has resulted in generation of huge amount of data, or Big Data, which has to be collected, stored and processed through new techniques to produce value in the best possible way. Distributed computing frameworks such as Hadoop, based on the MapReduce paradigm, have been used to process such amounts of data by exploiting the computing power of many cluster nodes. Unfortunately, in many real big data applications the data to be processed reside in various computationally heterogeneous data centers distributed in different locations. In this context the Hadoop performance collapses dramatically. To face this issue, we developed a Hierarchical Hadoop Framework (H2F) capable of scheduling and distributing tasks among geographically distant clusters in a way that minimizes the overall jobs execution time. Our experimental evaluations show that using H2F improves significantly the processing time for geodistributed data sets with respect to a plain Hadoop system.
L ampia diffusione di tecnologie ha portato alla generazione di enormi quantità di dati, o di Big Data, che devono essere raccolti, memorizzati e elaborati attraverso nuove tecniche per produrre valore nel modo migliore. I framework distribuiti di calcolo come Hadoop, basati sul paradigma MapRe- duce, sono stati utilizzati per elaborare tali quantità di dati sfruttando la potenza di calcolo di molti nodi di cluster. Purtroppo, in molte applicazioni di big data, i dati da elaborare risiedono in diversi data center computazionali eterogeni e distribuiti in luoghi diversi. In questo contesto le performance di Hadoop crollano drasticamente. Per affrontare questo problema, abbiamo sviluppato un Hierarchical Hadoop Framework(H2F) in grado di pianificare e distribuire task tra cluster geograficamente distanti in modo da ridurre al minimo il tempo di esecuzione complessivo delle applicazioni. Le nostre valutazioni sperimentali mostrano che l utilizzo di H2F migliora notevolmente il tempo di elaborazione per dataset geodistribuiti rispetto ad un semplice sistema Hadoop.
H2F: a hierarchical Hadoop framework to process Big Data in geo-distributed contexts
CAVALLO, MARCO
2017
Abstract
The wide spread adoption of IoT technologies has resulted in generation of huge amount of data, or Big Data, which has to be collected, stored and processed through new techniques to produce value in the best possible way. Distributed computing frameworks such as Hadoop, based on the MapReduce paradigm, have been used to process such amounts of data by exploiting the computing power of many cluster nodes. Unfortunately, in many real big data applications the data to be processed reside in various computationally heterogeneous data centers distributed in different locations. In this context the Hadoop performance collapses dramatically. To face this issue, we developed a Hierarchical Hadoop Framework (H2F) capable of scheduling and distributing tasks among geographically distant clusters in a way that minimizes the overall jobs execution time. Our experimental evaluations show that using H2F improves significantly the processing time for geodistributed data sets with respect to a plain Hadoop system.File | Dimensione | Formato | |
---|---|---|---|
Tesi_Cavallo_Marco.pdf
accesso aperto
Dimensione
2.1 MB
Formato
Adobe PDF
|
2.1 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/77500
URN:NBN:IT:UNICT-77500