Conceptual and physically based environmental simulation models as products of research environments efforts became complex software over time in order to allow describing the behaviour of natural phenomena more accurately. Results from these models are considered accurate but often require to operate an entire system of modeling resources with dedicated knowledge, an extensive set up, and sometimes significant computational time. Model complexity limits wide model adaptation among consultants because of lower available technical resources and capabilities. However, models should be ubiquitous to use in both research and consulting environments. This dissertation aims to address and alleviate two aspects of research model complexity: 1) for researchers, the model design complexity with respect to its internal software structure and 2) for consultants, the model application complexity with respect to data and parameter setup, runtime requirements, and proper model infrastructure setup. The first contribution provides modeling design and implementation support by managing interacting modeling solutions as “Directed Acyclic Graph”, while the second one helps to create surrogate models of complex physical models as a streamlined process. Both contributions are implemented within the OMS/CSIP modeling framework and infrastructure and were applied in various studies. First, a machine learning (ML)-based surrogate model approach is presented to respond to field application requirements to get quick but “accurate enough” model results with limited input and limited a-priori knowledge of the internal physical processes involved. The surrogate model aims to capture the behaviour of a physical model as an ensemble system of artificial neural networks (ANN). Here, the NeuroEvolution of Augmenting Topology (NEAT) technique has been leveraged because of its integration of a genetic approach to build and evolve its ANNs during supervised training. Throughout this phase, the thorough design of the services facilitate seamless monitoring of structural mutations of the artificial neural network and its performances with respect to behavioural emulation of the original model response. This results in a streamlined surrogate model generation. Furthermore, the stochasticity inherent to the evolutionary genetic algorithm combined with a specially designed cross-validation approach allows for straightforward use of the ensemble application. Several, slightly different artificial neural networks are concurrently trained. The ensemble system is built upon the selection of the utmost performant surrogate models and is used collectively to provide uncertainty quantified results when applied against new data. Secondly, a Directed Acyclic Graph (DAG) modeling structure NET3 was developed. NET3 provides appropriate data structures to represent modeling states interactions as relationships based on network topologies. The inherent structure of the DAG commands the execution of modeling tasks. NET3 implicitly manages the parallel computation depending on the network topology. A node of a NET3 modeling structure encapsulates any sort of modeling solution such as a system of ordinary differential equations, a set of statistical rules, or a system of partial differential equations. Each link connects these modeling solutions by handling their data flow. As a result, NET3 simplifies 1) the translation of physical mathematical concepts into model components, and 2) the management of complex interactions of modeling solutions. NET3 also pushes forward the idea of separating concerns between software architecture and scientific model codebase. It manages aspects that relate to the architectural design of the graph modeling structure and lets research scientist focus on their model’s domain. NET3 improves encapsulation and reusability of scientific/mathematical concepts. It avoids code duplication by allowing the same modeling solution to be adopted in different nodes and finely adapted to specific requirements. In summary, NET3 enables a new level of modeling flexibility by allowing to quickly change model representations to explore new modeling solutions. The two presented contributions were integrated into the Object Modeling System/Cloud Services Integrated Platform (OMS/CSIP) environmental modeling framework (EMF). EMFs are standard practice in environmental modeling because they represent a software solution of separating the burden of software architectural design management from scientific research. Here, OMS/CSIP has been identified “advanced” in terms of EMFs design. It offers high flexibility, low language invasiveness, fine and thorough architectural design, and innovative cloud computing deployment infrastructure. These aspects make OMS/CSIP infrastructure the suitable platform to host NEAT based surrogate modeling and NET3 extensions. Framework-enabled NEAT based Surrogate modeling (FeNS) results from the full integration of NEAT based surrogate modeling approach with OMS/CSIP platform. Here, the surrogate model approach was developed as CSIP services to help transitioning from research models to “field models” by enabling the modeling framework to interact with CSIP services, ML libraries, and a NoSQL database to emerge model surrogates for a(ny) modelling solution. OMS/CSIP was extended to harvest data from each model run and automatically derive the surrogate model at the modeling framework level. NET3 extends OMS modeling simulations to run as a graph network of interconnected modeling solutions. Furthermore, it enhances available OMS calibration algorithms to become multi-site calibration procedures. OMS already provided implicit parallel computation of independent components in a modeling solution. NET3 now adds a further layer of implicit parallelism by concurrently running independent modeling solutions. Two studies were carried out to develop and test FeSN while three applications supported the development and testing of NET3. Surrogate models of the Revised Universal Soil Loss Equation, Version 2 (R2) were generated to scale up from simple test cases with a constrained input space to more generic applications including a larger variety of input parameters. The main goal of the surrogate model was to streamline and simplify access to the R2 model behaviour. We performed sensitivity analysis of R2 to limit the input space to only relevant parameters (e.g. soil properties, climate parameter, field geometries, crop rotation description). The main study area was the State of Iowa starting from a single county (Clay county) ending up to four counties (Buena Vista, Cherokee, Clay, and Wright). Clustering methodologies were applied to improve surrogate model accuracy and to accelerate the training process by reducing the dataset size. The overall “goodness-of-fit” against the testing dataset estimated on the median of the uncertainty quantified result of the surrogate models ensemble was always above 0.95 Nash-Sutcliffe (NS), root mean squared error (RMSE) between 0.13 and 0.36, and bias between -0.07 and 0.02. In many cases, accuracy of the surrogate model with respect to testing dataset was above 0.98 NS. Surrogate models of the AgroEcoSystem (AgES) were generated to apply and test FeNS methodology to a semi-distributed hydrologic model. The main goal of the surrogate model was to streamline and simplify access to the AgES model behaviour. Only relevant lumped parameters on watershed centroid were used to train the surrogate models and limit the input space to only relevant parameters (e.g. precipitation, groundwater level, LAI, and potential evapotranspiration). The main study area was the South Fork Iowa River (SFIR) watershed in the State of Iowa across Wright, Franklin, Hamilton, and Hardin counties. The overall “goodness-of-fit” against the testing dataset estimated on the median of the uncertainty quantified result of the surrogate models ensemble was above 0.97 Nash-Sutcliffe (NS), root mean squared error (RMSE) of 2.24, and bias of -0.0794. With respect to NET3, the first application is the real-time modeling of flood forecasting through GEOframe system for the Civil Protection of Regione Basilicata implemented by PhD Bancheri. To scale the computation and finely tune calibration parameters, the Basilicata river basins were split into subcatchments where each was represented by a different NET3 node. The second application was part of Mr. Dalla Torre’s master thesis where the computational core of the rainfall-runoff model of Storm Water Management Model (SWMM by EPA) was componentized. NET3 now allows for reimplementing a concise and lightweight SWMM modeling core and highly parallel model runs. Software architectural design of rainfall-runoff, routing and sewer pipe design components targeted separation of concerns, single responsibility, and encapsulation principles. It resulted in clean and minimized code base. NET3 manages component connections and scalable computation by hosting rainfall-runoff modeling solution into separated nodes from routing and sewer pipe design modeling solution. It also enables each node of the modeling structure to 1) access a shared data structure to fetch input data from and push results to (SWMMobject), and 2) internally analyze the upstream subtree in order to adjust sewer pipe design parameters. The third test case is the application of a “system of systems” of urban models where each node of the graph modeling structure encapsulates a single responsibility system of models. Because of the stochasticity involved in each system of models, the entire graph modeling solution was required to run several times and generate independent realizations. Hence, NET3 was enabled to run a “graph of graphs” modeling structure.

Enabling modeling framework with surrogate modeling capabilities and complex networks

Serafin, Francesco
2019

Abstract

Conceptual and physically based environmental simulation models as products of research environments efforts became complex software over time in order to allow describing the behaviour of natural phenomena more accurately. Results from these models are considered accurate but often require to operate an entire system of modeling resources with dedicated knowledge, an extensive set up, and sometimes significant computational time. Model complexity limits wide model adaptation among consultants because of lower available technical resources and capabilities. However, models should be ubiquitous to use in both research and consulting environments. This dissertation aims to address and alleviate two aspects of research model complexity: 1) for researchers, the model design complexity with respect to its internal software structure and 2) for consultants, the model application complexity with respect to data and parameter setup, runtime requirements, and proper model infrastructure setup. The first contribution provides modeling design and implementation support by managing interacting modeling solutions as “Directed Acyclic Graph”, while the second one helps to create surrogate models of complex physical models as a streamlined process. Both contributions are implemented within the OMS/CSIP modeling framework and infrastructure and were applied in various studies. First, a machine learning (ML)-based surrogate model approach is presented to respond to field application requirements to get quick but “accurate enough” model results with limited input and limited a-priori knowledge of the internal physical processes involved. The surrogate model aims to capture the behaviour of a physical model as an ensemble system of artificial neural networks (ANN). Here, the NeuroEvolution of Augmenting Topology (NEAT) technique has been leveraged because of its integration of a genetic approach to build and evolve its ANNs during supervised training. Throughout this phase, the thorough design of the services facilitate seamless monitoring of structural mutations of the artificial neural network and its performances with respect to behavioural emulation of the original model response. This results in a streamlined surrogate model generation. Furthermore, the stochasticity inherent to the evolutionary genetic algorithm combined with a specially designed cross-validation approach allows for straightforward use of the ensemble application. Several, slightly different artificial neural networks are concurrently trained. The ensemble system is built upon the selection of the utmost performant surrogate models and is used collectively to provide uncertainty quantified results when applied against new data. Secondly, a Directed Acyclic Graph (DAG) modeling structure NET3 was developed. NET3 provides appropriate data structures to represent modeling states interactions as relationships based on network topologies. The inherent structure of the DAG commands the execution of modeling tasks. NET3 implicitly manages the parallel computation depending on the network topology. A node of a NET3 modeling structure encapsulates any sort of modeling solution such as a system of ordinary differential equations, a set of statistical rules, or a system of partial differential equations. Each link connects these modeling solutions by handling their data flow. As a result, NET3 simplifies 1) the translation of physical mathematical concepts into model components, and 2) the management of complex interactions of modeling solutions. NET3 also pushes forward the idea of separating concerns between software architecture and scientific model codebase. It manages aspects that relate to the architectural design of the graph modeling structure and lets research scientist focus on their model’s domain. NET3 improves encapsulation and reusability of scientific/mathematical concepts. It avoids code duplication by allowing the same modeling solution to be adopted in different nodes and finely adapted to specific requirements. In summary, NET3 enables a new level of modeling flexibility by allowing to quickly change model representations to explore new modeling solutions. The two presented contributions were integrated into the Object Modeling System/Cloud Services Integrated Platform (OMS/CSIP) environmental modeling framework (EMF). EMFs are standard practice in environmental modeling because they represent a software solution of separating the burden of software architectural design management from scientific research. Here, OMS/CSIP has been identified “advanced” in terms of EMFs design. It offers high flexibility, low language invasiveness, fine and thorough architectural design, and innovative cloud computing deployment infrastructure. These aspects make OMS/CSIP infrastructure the suitable platform to host NEAT based surrogate modeling and NET3 extensions. Framework-enabled NEAT based Surrogate modeling (FeNS) results from the full integration of NEAT based surrogate modeling approach with OMS/CSIP platform. Here, the surrogate model approach was developed as CSIP services to help transitioning from research models to “field models” by enabling the modeling framework to interact with CSIP services, ML libraries, and a NoSQL database to emerge model surrogates for a(ny) modelling solution. OMS/CSIP was extended to harvest data from each model run and automatically derive the surrogate model at the modeling framework level. NET3 extends OMS modeling simulations to run as a graph network of interconnected modeling solutions. Furthermore, it enhances available OMS calibration algorithms to become multi-site calibration procedures. OMS already provided implicit parallel computation of independent components in a modeling solution. NET3 now adds a further layer of implicit parallelism by concurrently running independent modeling solutions. Two studies were carried out to develop and test FeSN while three applications supported the development and testing of NET3. Surrogate models of the Revised Universal Soil Loss Equation, Version 2 (R2) were generated to scale up from simple test cases with a constrained input space to more generic applications including a larger variety of input parameters. The main goal of the surrogate model was to streamline and simplify access to the R2 model behaviour. We performed sensitivity analysis of R2 to limit the input space to only relevant parameters (e.g. soil properties, climate parameter, field geometries, crop rotation description). The main study area was the State of Iowa starting from a single county (Clay county) ending up to four counties (Buena Vista, Cherokee, Clay, and Wright). Clustering methodologies were applied to improve surrogate model accuracy and to accelerate the training process by reducing the dataset size. The overall “goodness-of-fit” against the testing dataset estimated on the median of the uncertainty quantified result of the surrogate models ensemble was always above 0.95 Nash-Sutcliffe (NS), root mean squared error (RMSE) between 0.13 and 0.36, and bias between -0.07 and 0.02. In many cases, accuracy of the surrogate model with respect to testing dataset was above 0.98 NS. Surrogate models of the AgroEcoSystem (AgES) were generated to apply and test FeNS methodology to a semi-distributed hydrologic model. The main goal of the surrogate model was to streamline and simplify access to the AgES model behaviour. Only relevant lumped parameters on watershed centroid were used to train the surrogate models and limit the input space to only relevant parameters (e.g. precipitation, groundwater level, LAI, and potential evapotranspiration). The main study area was the South Fork Iowa River (SFIR) watershed in the State of Iowa across Wright, Franklin, Hamilton, and Hardin counties. The overall “goodness-of-fit” against the testing dataset estimated on the median of the uncertainty quantified result of the surrogate models ensemble was above 0.97 Nash-Sutcliffe (NS), root mean squared error (RMSE) of 2.24, and bias of -0.0794. With respect to NET3, the first application is the real-time modeling of flood forecasting through GEOframe system for the Civil Protection of Regione Basilicata implemented by PhD Bancheri. To scale the computation and finely tune calibration parameters, the Basilicata river basins were split into subcatchments where each was represented by a different NET3 node. The second application was part of Mr. Dalla Torre’s master thesis where the computational core of the rainfall-runoff model of Storm Water Management Model (SWMM by EPA) was componentized. NET3 now allows for reimplementing a concise and lightweight SWMM modeling core and highly parallel model runs. Software architectural design of rainfall-runoff, routing and sewer pipe design components targeted separation of concerns, single responsibility, and encapsulation principles. It resulted in clean and minimized code base. NET3 manages component connections and scalable computation by hosting rainfall-runoff modeling solution into separated nodes from routing and sewer pipe design modeling solution. It also enables each node of the modeling structure to 1) access a shared data structure to fetch input data from and push results to (SWMMobject), and 2) internally analyze the upstream subtree in order to adjust sewer pipe design parameters. The third test case is the application of a “system of systems” of urban models where each node of the graph modeling structure encapsulates a single responsibility system of models. Because of the stochasticity involved in each system of models, the entire graph modeling solution was required to run several times and generate independent realizations. Hence, NET3 was enabled to run a “graph of graphs” modeling structure.
2019
Inglese
Rigon, Riccardo
Università degli studi di Trento
TRENTO
220
File in questo prodotto:
File Dimensione Formato  
FrancescoSerafin_183280_finale.pdf

accesso aperto

Dimensione 17 MB
Formato Adobe PDF
17 MB Adobe PDF Visualizza/Apri
Disclaimer_Serafin.pdf

accesso solo da BNCF e BNCR

Dimensione 431.23 kB
Formato Adobe PDF
431.23 kB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/108626
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-108626