Keywords: distributed computing, safety scheduling, resource allocation, job execution, task execution, work, critical computing, resource co-, allocation, simulation, resource management, job models
Safety scheduling strategies in distributed computing
In this paper, we present an approach to safety scheduling in distributed computing based on strategies of resource co-allocation for complex sets of tasks (jobs). The necessity of guaranteed job execution until the time limits requires taking into account the distributed environment dynamics, namely, changes in the number of jobs for servicing, volumes of computations, possible failures of processor nodes, etc. As a consequence, in the general case, a set of versions of scheduling and resource co-allocation, or a strategy, is required instead of a single version. Safety strategies are formed for structurally different job models with various levels of task granularity and data replication policies. We develop and consider scheduling strategies which combine fine-grain and coarse-grain computations, multiple data replicas and constrained data movement. These strategies are evaluated using simulations studies and addressing a variety of metrics.