" Resource-Aware Dynamic Load Balancing for Heterogeneous, Hierarchical, and Non-Dedicated Computing Environments "

Abstract:

Scientific computation is now being performed in a wide variety of parallel computing environments from small clusters to large supercomputers. Such systems can be comprised of nodes with different numbers of processors and include different processor speeds and memory capabilities. Networks may be heterogeneous or hierarchical. All parts of the system may be subject to shared usage. A large base of software has been developed that assumes a distributed memory environment and uniform processing and communication resources. This talk will first discuss the types of computations we consider -- adaptive computations that require a dynamic rebalancing of data, even in a homogeneous environment. We then discuss traditional dynamic load balancing procedures and tools, focusing on Sandia National Laboratories' Zoltan Toolkit. Next, we consider heterogeneous, hierarchical, and non-dedicated computing environments, the complications they introduce, and some possible approaches that account for these complications. Finally, we present two tools that we have developed to improve computational efficiency such environments with little if any change to application code: the Dynamic Resource Utilization Model and the Zoltan Toolkit's Hierarchical Partitioning and Dynamic Load Balancing procedures. Examples of computations using each will be presented.