ANALYTICALLY MODELING UNRELIABLE PARALLEL PROCESSING SYSTEMS WITH GENERAL TASK TIME DISTRIBUTIONS

Authors

  • Pierre M. Fiorini
  • Robert W. Rowan

DOI:

https://doi.org/10.47839/ijc.4.3.368

Keywords:

Performance and Dependability Modeling, Parallel & Distributed Systems, Queueing Theory, Heavy-Tails

Abstract

For many computing systems, failure is rare enough that it can be ignored. In other systems, failure is so common that the recovery procedure can have a significant impact on the performance of the system. In this paper, assuming a computing system is unreliable, we discuss how heavy-tail or power-tail job completion time distributions can appear in an otherwise well-behaved task stream. This is an important consideration since it is known that powertails can lead to unstable systems. We then demonstrate how to obtain performance and dependability measures for a class of computing systems comprised of P unreliable processors and a finite number of tasks, N, given different recovery policies. Finally, we discuss the effects of checkpointing on the job completion time distribution.

References

. L. Lipsky. Queueing Theory: A Linear Algebraic Approach. McMillan. NY. 1992.

. V. Kulkarni, V. Nicola, and K. Trivedi. On Modeling the Performance and Reliablility of Multimode Systems, The Journal of Systems and Software 20 (1986).

. V. Kulkarni, V. Nicola, and K. Trivedi. The Completion Time of a Job on a Multimode System, Advances in Applied Probability 19 (1987).

. A. Bobbio and K. Trivedi. Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable, Communications in Statistics- Stochastic Models 6 (1990).

. M. Greiner, M. Jobmann, and L. Lipsky. The Importance of Power-Tail Distributions for Modeling Queueing Systems, Operations Research 47 (2) (1999).

. P. Fiorini, L. Lipsky, and M.Crovella. Consequences of Ignoring Self-Similar Data Traffic In Communications Modeling. "10th International Conference on Parallel and Distributed Computing (PDCS-97)", New Orleans, USA 1997.

. W.E. Leland and T. J. Ott. Load-balancing heuristics and process behavior, in "SIGMETRICS Conf. Measurement & Modeling of Comput. Syst.", 1986.

. D. Feitelson. Sensitivity of Parallel Job Scheduling to Fat-Tailed Distributions. unpublished manuscript, School of Computer Science and Engineering, The Hebrew University of Jerusalem, 2000.

. R. Sheahan, L. Lipsky, and P. Fiorini. The Effect of Different Failure recovery Procedures On the Distribution Of Task Completion Times," in 19th IEEE International Parallel and Distributed Processing Symposium," Denver, CO 2005.

. R. W. Rowan. Modeling Unreliable Parallel Systems with Non-Exponential Task Time Distributions, M.S. Thesis, University of Southern Maine, 2005.

. P. Fiorini, R. Sheahan, and L. Lipsky. On Unreliable Computing Systems when Heavy-Tails Appear as a Result of the Recovery Procedure, Performance Evaluation Review 33 (2) (2005).

. M. Greiner, M. Jobmann, and L. Lipsky. The Importance of Power-Tail Distributions for Modeling Queueing Systems. Operations Research 47 (2) (1999).

. P. Fiorini and C. Bossie. On Checkpointing and Heavy-Tails in Unreliable Computing Environments. to appear in Performance Evaluation Review (2006).

. P. Fiorini and L. Lipsky. Comparing Checkpointing Strategies in Unreliable Computing Environments. Unpublished manuscript, 2005.

Downloads

Published

2014-08-01

How to Cite

Fiorini, P. M., & Rowan, R. W. (2014). ANALYTICALLY MODELING UNRELIABLE PARALLEL PROCESSING SYSTEMS WITH GENERAL TASK TIME DISTRIBUTIONS. International Journal of Computing, 4(3), 91-101. https://doi.org/10.47839/ijc.4.3.368

Issue

Section

Articles