FAULT TOLERANCE IN GRIDS USING JOB REPLICATION
DOI:
https://doi.org/10.47839/ijc.11.2.556Keywords:
Job scheduling, Fault tolerant, Replication, Grid Computing.Abstract
As grids consist of a large number of resources, fault tolerance forms an important aspect of the scheduling process. In this paper, we address the problem of scheduling user jobs in grids so that failures can be avoided in the presence of resources faults. We employ job replication as an effective mechanism to achieve efficient and fault-tolerant scheduling system. Most of the existing replication-based algorithms use a fixed number of replications for each job which consumes more grid resources. We first propose an algorithm to determine adaptively the number of job replicas according to the grid failure history. Then we propose an algorithm to schedule these replicas. The proposed algorithms have been evaluated through simulation and have shown better performance in terms of grid load, throughput and failure tendency.References
S. Priya, M. Prakash and K. Dhawan, Fault tolerance-genetic algorithm for grid task scheduling using check point, Proc. of the sixth International Conference on Grid and Cooperative Computing, Urumchi, Xinjiang, China, August 16-18, 2007, pp. 676-680.
S. S. Sathya and K. S. Babu, Survey of fault tolerant techniques for grid, Computer Science Review, (4) 2 (2010). pp. 101-120.
Q. Zheng and B. Veeravalli, On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices, J. Parallel and Distributed Computing, (69) (2009). pp. 282-294.
H. Lee et al., A resource management system for fault tolerance in grid computing, Proc. of International Conference on Computational Science and Engineering, Vancouver, Canada, August 29-31, 2009, pp. 609-614.
F. Khan, K. Qureshi and B. Nazir, Performance evolution of fault tolerance techniques in grid computing system, J. Computing and Electrical Engineering, (36) (2010). pp. 1110-1122.
S. Hwang, C. Kesselman, A flexible framework for fault tolerance in the grid, J. Grid Computing, (1) (2003), pp. 251-272.
H. Lee et al., A resource management and fault tolerance services in grid computing, Journal of Parallel and Distributed Computing, (65) (2005), pp. 1305-1317.
B. Khoo and B. Veeravalli, Pro-active failure handling mechanisms for scheduling in grid computing environments, J. Parallel and Distributed Computing, (70) 3 (2010), pp. 189-200.
M. Huda, H. Schmidt and I. Peake, An agent oriented proactive fault-tolerant framework for grid computing, Proc. of International Conference on e-Science and Grid Computing, Melbourne, Australia, Dec. 5-8, 2005, pp. 304-311.
J. Abawajy, Fault-tolerant scheduling policy for grid computing systems, Proc. of 18th IEEE International Parallel and Distributed Processing Symposium, April 26-30, 2004.
K. Srinivasa, G. Siddesh and S. Cherian, Fault-tolerant middleware for grid computing, Proc. of 12th IEEE International Conference on High Performance Computing and Communications, Melbourne, Australia, Sep. 1-3, 2010б pp. 635-640.
M. Chtepen, B. Dhoedt, F. Cleays and P. Van-rolleghem, Evaluation of replication and rescheduling heuristics for gird systems with varying resource availability, Proc. of 18th International Conference on Parallel and Distributed Computing Systems, Anaheim, CA, USA, Nov. 13-15, 2006, pp. 622-627.
S. Song, K. Hwang, and Y. Kwok, Risk-resilient heuristics and genetic algorithms for security-assured grid job scheduling, IEEE Trans. Computers, (55) 6 (2006), pp. 703-719.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.