TotalEnergies is a broad energy company that produces and markets fuels, natural gas, and electricity. Task-based parallel programming has been a promising approach to improve performance in scientific applications. Our collaboration with TotalEnergies focuses on task-based parallel programming applications and research.
The specific practice of the project is carried out in two ways:
Application level: We explore the Minimod proxy application developed by Total. We implement the core kernel using OpenMP GPU offloading and OpenMP tasks. Understanding the performance characteristics of this application and bottlenecks is the focus. In this time, we compare the distributed aspects of the behavior of two implementations of our distributed task-based stencil numerical simulation, where one uses MPI, and the other uses Legion [1] for inter-node parallelism. In both cases, the same CUDA-implemented kernels are employed at the node level to facilitate the comparison. Overall, the results showed that the task-based approach provided by Legion is on par with the traditional MPI approach in terms of performance and scalability.
Model level: We explore the implementation of remote OpenMP offloading [2] performed by the researchers at Argonne National Laboratory. Remote OpenMP offloading is an OpenMP target plugin used to program distributed accelerator-based HPC systems with minimal changes to the application. We improved the performance of remote OpenMP offloading through various runtime optimizations. We evaluated our optimized plugin using the Minimod program, showing that our optimizations can reduce offloading latencies by up to 92% and increase application parallel efficiency by at least 25.2% when running with 16 GPUs. We are refactoring Remote OpenMP offloading with MPI to build a decentralized distributed tasking model. Compared with the original version, the MPI-based implementation has better performance and higher ease of use. Our goal is to implement OpenMP-based Global Tasking.
[1] Eric Raut, Jonathon Anderson, Mauricio Araya-Polo, and Jie Meng, “Evaluation of Distributed Tasks in Stencil-based Application on GPUs”, In 2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), November 2021, St. Louis, MO, USA.
[2] Wenbin Lu, Baodi Shan, Eric Raut, Jie Meng, Mauricio Araya-Polo, Johannes Doerfert, Abid M. Malik, and Barbara Chapman, “Towards Efficient Remote OpenMP Offloading”, in International Workshop on OpenMP (IWOMP), September 27th–30th, 2022, Chattanooga, Tennessee, USA.