14th Scheduling for Large Scale Systems Workshop

26-28 Jun 2019 Bordeaux (France)

sciencesconf.org:scheduling2019:278577

Power is a critical problem as the supercomputing community ventures toward exascale. Several system, hardware and application-level challenges to power management make it challenging to schedule jobs effectively with high throughput and utilization on large-scale supercomputers. In this talk, I will present two power-aware scheduling strategies that are being developed at LLNL. The first is an algorithm implemented within the popular resource manager, SLURM, and the second is a variation-aware scheduling algorithm implemented within LLNL's new hierarchical scheduling framework, Flux. Using real-world datasets from large supercomputers at LLNL and University of Tokyo, I will demonstrate the effectiveness of these power-management scheduling algorithms, and discuss how other flow resources (such as network, IO, etc) in HPC clusters can leverage similar techniques.

Subject :	:	oral
Topics	:	Scheduling
Topics	:	Job Scheduling
PDF version	:	PDF version

Online user: 1