An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs
April 10, 2023 ยท Declared Dead ยท ๐ IEEE International Parallel and Distributed Processing Symposium
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam
arXiv ID
2304.04876
Category
math.NA: Numerical Analysis
Cross-listed
cs.DC,
cs.MS
Citations
2
Venue
IEEE International Parallel and Distributed Processing Symposium
Last Checked
2 months ago
Abstract
The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy. The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about $2\times$ using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Numerical Analysis
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations
R.I.P.
๐ป
Ghosted
PDE-Net: Learning PDEs from Data
R.I.P.
๐ป
Ghosted
Efficient tensor completion for color image and video recovery: Low-rank tensor train
R.I.P.
๐ป
Ghosted
Tensor Ring Decomposition
R.I.P.
๐ป
Ghosted
Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted