26-28 Jun 2019 Bordeaux (France)
Adaptive scheduling of HPC applications using malleability and dynamic migration.
Alberto Cascajo  1  , Jesus Carretero  1@  , David E. Singh  2  
1 : University Carlos III of Madrid
2 : UNIVERSITY CARLOS III OF MADRID

In this talk we will present an HPC framework that provides new strategies for resource monitoring and job scheduling. 

This framework includes a scalable lightweight monitoring tool that is able to analyze the platform's compute nodes and to detect any risks of contention between them. This monitoring tool is designed for large-scale systems. It can be mapped to the system topology statically, but it also has self-organizing capacity transparent to the system users. This capacity, together with fault-tolerance, make our monitor a tool with strong resiliency. 

Our framework also includes an application scheduler that can subscribe to monitor events, such as congestion thresholds, and use this information, in combination with application-level information, to enhance the application execution applying dynamic process migration and malleability.

A description of the architecture, as well as a practical evaluation of the proposal will be presented in the talk. 



  • Presentation
Online user: 1