Cluster Computing, cilt.24, ss.2927-2956, 2021 (SCI-Expanded)
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Rapidly advancing technology brings a huge volume of data along the way that grows at a staggering pace and cannot be processed with traditional algorithms/hardware. Therefore, storing, processing, and analyzing this data in a timely manner requires distributed data clusters. One of the most critical problems facing these clusters is referred to as task scheduling. In this context, task scheduling is simply the name of the task-cluster node mapping process that will allow the last task to complete its execution as early as possible. Due to the NP-hard nature of the scheduling problem at hand, there is an inevitable need for metaheuristics to solve this problem in such a way that it can produce near-optimal (if possible optimal) solutions at reasonable times. In this study, a simulated annealing-based metaheuristic for cluster-based task scheduling is developed, and serial and parallel (shared memory) versions of the method are implemented in C++. The effectiveness of the proposed approach is demonstrated through twelve famous benchmarks from the Braun dataset. Both the serial and the parallel versions of the approach produce results that are much better than the best latency values ever reported in the literature for all benchmarks within the time constraint of 90 s. For example, the percentage of improvement of the serial version ranges from 0.01% to 0.49%. To decrease the execution time of the developed computer program and improve the quality of the scheduling solutions, different random number generation and perturbation techniques, data structures, early loop termination conditions, exploitation-exploration rates, and compiler effects are also analyzed in detail within the scope of this study.