OSG Document 684-v0

glideinWMS - A generic pilot-based Workload Management System

Document #:
OSG-doc-684-v0
Document type:
Presentations
Submitted by:
Marcia Teckenbrock
Updated by:
Marcia Teckenbrock
Document Created:
27 Aug 2007, 17:54
Contents Revised:
27 Aug 2007, 17:54
Metadata Revised:
27 Aug 2007, 17:54
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Abstract:
Grids are making it possible for Virtual Organizations (VOs) to
run hundreds of thousands of jobs per day. However, the resources
are distributed among hundreds of independent Grid sites.
A higer level Workload Management System (WMS) is thus necessary.

glideinWMS is a pilot-based WMS, inheriting several useful features:
1) Late binding: Pilots are sent to all suitable Grid sites.
Only once pilots start are real jobs selected for that resources.
No forecasting is needed.
2) Reliability: A broken Grid site will either kill pilot jobs
or pilots will detect the problem at startup. Real jobs
only start on well-behaved resources.
3) Grid-wide fair share: The relative priorities between jobs of the
same VO are set inside the WMS. Grid sites only manage priorities
between different VOs.

glideinWMS is based on the Condor glidein concept, i.e.
a regular Condor pool, with the Condor daemons (startd) being started by
pilot jobs. The real jobs are vanilla, standard or MPI universe jobs.

glideinWMS is composed of Glidein Factories and VO Frontends, communicating
using Condor ClassAds:
* Factories publish the available Grid sites,
* Frontends match the Grid attributes to job attributes
and publish a request for a stream of glideins to suitable Grid sites
* Factories pick up the requests and submit the glideins

A detailed description of the system will be presented,
along with the currently deployed systems inside USCMS production and
user analysis frameworks. Integration with frameworks
of other VOs will also be presented, as well as the measured scalability limits.

Files in Document:
None
Topics:
Associated with Events:
CHEP'07 held on 02 Sep 2007 in Victoria, British Columbia, Canada
DocDB Home ]  [ Search ] [ Last 20 Days ] [ List Authors ] [ List Events ] [ List Topics ]

Supported by the National Science Foundation and the U.S. Department of Energy's Office of Science Contact Us | Site Map

DocDB Version 8.8.9, contact Document Database Administrators