ANALYTICALLY MODELING UNRELIABLE PARALLEL PROCESSING SYSTEMS WITH GENERAL TASK TIME DISTRIBUTIONS
DOI:
https://doi.org/10.47839/ijc.4.3.368Keywords:
Performance and Dependability Modeling, Parallel & Distributed Systems, Queueing Theory, Heavy-TailsAbstract
For many computing systems, failure is rare enough that it can be ignored. In other systems, failure is so common that the recovery procedure can have a significant impact on the performance of the system. In this paper, assuming a computing system is unreliable, we discuss how heavy-tail or power-tail job completion time distributions can appear in an otherwise well-behaved task stream. This is an important consideration since it is known that powertails can lead to unstable systems. We then demonstrate how to obtain performance and dependability measures for a class of computing systems comprised of P unreliable processors and a finite number of tasks, N, given different recovery policies. Finally, we discuss the effects of checkpointing on the job completion time distribution.References
. L. Lipsky. Queueing Theory: A Linear Algebraic Approach. McMillan. NY. 1992.
. V. Kulkarni, V. Nicola, and K. Trivedi. On Modeling the Performance and Reliablility of Multimode Systems, The Journal of Systems and Software 20 (1986).
. V. Kulkarni, V. Nicola, and K. Trivedi. The Completion Time of a Job on a Multimode System, Advances in Applied Probability 19 (1987).
. A. Bobbio and K. Trivedi. Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable, Communications in Statistics- Stochastic Models 6 (1990).
. M. Greiner, M. Jobmann, and L. Lipsky. The Importance of Power-Tail Distributions for Modeling Queueing Systems, Operations Research 47 (2) (1999).
. P. Fiorini, L. Lipsky, and M.Crovella. Consequences of Ignoring Self-Similar Data Traffic In Communications Modeling. "10th International Conference on Parallel and Distributed Computing (PDCS-97)", New Orleans, USA 1997.
. W.E. Leland and T. J. Ott. Load-balancing heuristics and process behavior, in "SIGMETRICS Conf. Measurement & Modeling of Comput. Syst.", 1986.
. D. Feitelson. Sensitivity of Parallel Job Scheduling to Fat-Tailed Distributions. unpublished manuscript, School of Computer Science and Engineering, The Hebrew University of Jerusalem, 2000.
. R. Sheahan, L. Lipsky, and P. Fiorini. The Effect of Different Failure recovery Procedures On the Distribution Of Task Completion Times," in 19th IEEE International Parallel and Distributed Processing Symposium," Denver, CO 2005.
. R. W. Rowan. Modeling Unreliable Parallel Systems with Non-Exponential Task Time Distributions, M.S. Thesis, University of Southern Maine, 2005.
. P. Fiorini, R. Sheahan, and L. Lipsky. On Unreliable Computing Systems when Heavy-Tails Appear as a Result of the Recovery Procedure, Performance Evaluation Review 33 (2) (2005).
. M. Greiner, M. Jobmann, and L. Lipsky. The Importance of Power-Tail Distributions for Modeling Queueing Systems. Operations Research 47 (2) (1999).
. P. Fiorini and C. Bossie. On Checkpointing and Heavy-Tails in Unreliable Computing Environments. to appear in Performance Evaluation Review (2006).
. P. Fiorini and L. Lipsky. Comparing Checkpointing Strategies in Unreliable Computing Environments. Unpublished manuscript, 2005.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.