PARALLEL LINEAR SOLVERS FOR RESERVOIR SIMULATIONS

Mavliutov, Artem

Efficient and scalable linear solvers are critical for implicit reservoir simulation, where the linear solver can account for up to 90% of total runtime. In this work, we present an integrated software and algorithmic framework that (i) replaces the inherently sequential ILU stage of the popular CPR preconditioner with a highly parallel FSAI preconditioner enhanced by an augmented decoupling mechanism, and (ii) accelerates the AMG preconditioner with SpGEMM library. On the CPR preconditioning side, the default global system relaxation with LU is substituted by the FSAI preconditioner, while the local pressure correction is retained by AMG. To improve FSAI in the presence of strong transport-induced couplings, a local block-diagonal decoupling is applied on small cell-blocks. The resulting fully local decoupling approach combines quasi-IMPES scaling to reduce pressure saturation couplings, a dynamic row summation to ensure solvability by AMG, and constrained pressure decoupling to improve FSAI preconditioning effect. This yields a highly effective and scalable CPR preconditioning framework. We implemented the preconditioned solver suite in C++/MPI and evaluated it with the OPM simulator on Norne, SPE11C, and Sleipner benchmarks. The resulting framework, deco, matches or improves upon default OPM solvers (DUNE and AMGCL) in a sequential setting (1 MPI rank) and delivers 2–4× speedups in strong-scaling tests with up to 16 MPI ranks. On the SpGEMM side, we developed a C++/MPI/CUDA library called nsp1 that builds on top of a single-GPU nsparse kernel and provides a multi-GPU extension employing a 1D row-wise partitioning to minimize inter-GPU communications and avoid host-mediated transfers. Each GPU executes kernels independently (task parallelism across GPUs) while exploiting data parallelism internally. The multi-GPU framework demonstrated strong scalability on the Leonardo supercomputer in tests employing up to 512 concurrent GPUs and processing matrices with up to ∼15 billion nonzeros and producing outputs of ∼52 billion nonzeros. Starting from the nsparse single-GPU kernel, we introduced several kernel-level improvements — a dynamic sparse accumulator with optimized hash search update, improved workload balancing for irregular sparsity, and specialized kernels for long rows — yielding several-fold speedups for square product tests (A2) and coarse-level operator (RAP) tests corresponding to double SpGEMM product. The proposed nsp library showed up to 2× speedup with respect to the original nsparse library and up to 6× speedup with respect to cuSparse library.

PARALLEL LINEAR SOLVERS FOR RESERVOIR SIMULATIONS

MAVLIUTOV, ARTEM

2026

Abstract

Efficient and scalable linear solvers are critical for implicit reservoir simulation, where the linear solver can account for up to 90% of total runtime. In this work, we present an integrated software and algorithmic framework that (i) replaces the inherently sequential ILU stage of the popular CPR preconditioner with a highly parallel FSAI preconditioner enhanced by an augmented decoupling mechanism, and (ii) accelerates the AMG preconditioner with SpGEMM library. On the CPR preconditioning side, the default global system relaxation with LU is substituted by the FSAI preconditioner, while the local pressure correction is retained by AMG. To improve FSAI in the presence of strong transport-induced couplings, a local block-diagonal decoupling is applied on small cell-blocks. The resulting fully local decoupling approach combines quasi-IMPES scaling to reduce pressure saturation couplings, a dynamic row summation to ensure solvability by AMG, and constrained pressure decoupling to improve FSAI preconditioning effect. This yields a highly effective and scalable CPR preconditioning framework. We implemented the preconditioned solver suite in C++/MPI and evaluated it with the OPM simulator on Norne, SPE11C, and Sleipner benchmarks. The resulting framework, deco, matches or improves upon default OPM solvers (DUNE and AMGCL) in a sequential setting (1 MPI rank) and delivers 2–4× speedups in strong-scaling tests with up to 16 MPI ranks. On the SpGEMM side, we developed a C++/MPI/CUDA library called nsp1 that builds on top of a single-GPU nsparse kernel and provides a multi-GPU extension employing a 1D row-wise partitioning to minimize inter-GPU communications and avoid host-mediated transfers. Each GPU executes kernels independently (task parallelism across GPUs) while exploiting data parallelism internally. The multi-GPU framework demonstrated strong scalability on the Leonardo supercomputer in tests employing up to 512 concurrent GPUs and processing matrices with up to ∼15 billion nonzeros and producing outputs of ∼52 billion nonzeros. Starting from the nsparse single-GPU kernel, we introduced several kernel-level improvements — a dynamic sparse accumulator with optimized hash search update, improved workload balancing for irregular sparsity, and specialized kernels for long rows — yielding several-fold speedups for square product tests (A2) and coarse-level operator (RAP) tests corresponding to double SpGEMM product. The proposed nsp library showed up to 2× speedup with respect to the original nsparse library and up to 6× speedup with respect to cuSparse library.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				SCIENZE DELL'INGEGNERIA CIVILE, AMBIENTALE E DELL'ARCHITETTURA
			
	Data di pubblicazione
	
				3-feb-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				JANNA, CARLO
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
final_thesis_Artem_Mavliutov.pdf embargo fino al 03/02/2027 Licenza: Tutti i diritti riservati Dimensione 2.93 MB Formato Adobe PDF	2.93 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/356943

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-356943