Open Access Open Access  Restricted Access Subscription Access

MONALISA: A MONITORING FRAMEWORK FOR LARGE SCALE COMPUTING SYSTEMS

Ciprian Dobre, Ramiro Voicu, Iosif C. Legrand

Abstract


The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) framework provides a set of distributed services for monitoring, control, management and global optimization for large scale distributed systems. It is based on an ensemble of autonomous, multi-threaded, agent-based subsystems which are registered as dynamic services. They can be automatically discovered and used by other services or clients. The distributed agents can collaborate and cooperate in performing a wide range of management, control and global optimization tasks (such as network monitoring, resource accounting) using real time monitoring information. MonALISA includes a coherent set of network management services to collect in near real-time information about the network topology, the main data flows, traffic volume and the quality of connectivity. A set of dedicated modules were developed in the MonALISA framework to periodically perform network measurements tests between all sites. We developed global services to present in near real-time the entire network topology used by a community. The time evolution of global network topology is shown in a dedicated GUI. Changes in the global topology at this level occur quite frequently and even small modifications in the connectivity map may significantly affect the network performance. The global topology graphs are correlated with active end-to-end network performance measurements, done using the Fast Data Transfer application, between all sites. Access to both real-time and historical data, as provided by MonALISA, is also important for developing services able to predict the usage pattern, to aid in efficiently allocating resources globally. For resource accounting, MonALISA collects information regarding the amounts of resources consumed by the users, which represent virtual organizations in a large scale distributed system. Besides providing statistical information, an accounting system can also be the base for managing distributed resources upon an economic model. In the MonALISA monitoring framework we developed modules that provide accounting facilities, collecting information from cluster managers like Condor, PBS, LSF and SGE. The usage statistic s is used for an intelligent management of the resources.

Keywords


Monitoring; large scale networks; topology; accounting; MonALISA.

Full Text:

PDF

References


L. Gaido, A. Guarise, G. Patania, R. Piro, F. Rosso, A. Werbrouck, The Distributed Grid Accounting System (DGAS), Last accessed November 22, 2012, from http://www.to.infn.it/grid/accounting/main.html.

C. Dobre, R. Voicu, I. Legrand, Monitoring large scale network topologies, Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS 2011), Prague, Czech Republic (September 2011), pp. 218-222.

MonALISA official website (2012), Last accessed November 24, 2012, from http://monalisa.caltech.edu/.

CMS Experiment official website (2012), Last retrieved November 21, 2012, from http://cms.cern.ch.

ALICE Experiment official website (2012), Last retrieved November 21, 2012, from http://aliweb.cern.ch.

Atlas Experiment official website (2012), Last retrieved November 21, 2012, from http://atlas.web.cern.ch.

OSG official website (2012), Last retrieved November 25, 2012, from http://www.opensciencegrid.org.

RRD official website (2012), Last retrieved November 24, 2012, from http://www.mrtg.org/rrdtool.

FDT official website (2012), Last retrieved November 24, 2012, from http://fdt.cern.ch.

TL1 – Transaction Language 1 Generic Requirements Document GR-831-CORE (2012), Last retrieved November 12, 2012, from http://telecom-info.telcordia.com/site-cgi/ido/docs.cgi?ID=SEARCH&DOCUMENT=GR-831.

Calient Technologies official website (2012), Last retrieved November 23, 2012, from: http://www.calient.net.

GMPLS – General Multi-Protocol Label Switching Architecture RFC3945.

ITU-T Rec. G.7042, Link Capacity Adjustment Scheme (LCAS) for Virtual Concatenated Signals, Feb. 2004.

S. Bagnasco, L. Betev, P. Buncic, et al, AliEn: ALICE environment on the grid, in: J. Phys.: Conf. Ser. (2007), pp. 119.

I.C. Legrand, H. Newman, R. Voicu, et al, MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems, In: Computer Physics Communications, (180) Issue 12 (December 2009), pp. 2472-2498.

R. Byrom, R. Cordenonsib, L. Cornwall, et al, APEL: An implementation of Grid accounting using R-GMA, in: UK e-Science All Hands Conference, Nottingham (September 2005).

A. Cooke, A.J. Gray, W. Nutt, et al, The relational grid monitoring architecture: mediating information about the grid, in: Journal of Grid Computing, (2) 4 (2004), pp. 323-339.


Refbacks

  • There are currently no refbacks.