OSG Document 912-v1

Use of Late-Binding Technology for Workload Management System in CMS

Document #:
OSG-doc-912-v1
Document type:
Paper
Submitted by:
Haifeng Pi
Updated by:
Haifeng Pi
Document Created:
24 Nov 2009, 16:32
Contents Revised:
24 Nov 2009, 16:32
DB Info Revised:
24 Nov 2009, 16:32
Viewable by:
  • Public document
Modifiable by:
Abstract:
Condor glidein-based workload management system (glideinWMS) has been developed and integrated with distributed physics analysis and Monte Carlo (MC) production system at Compact Muon Solenoid (CMS) experiment. The late-binding between the jobs and computing element (CE), and the validation of WorkerNode (WN) environment help significantly reduce the failure rate of Grid jobs. For CPU-consuming MC data production, opportunistic grid resources can be effectively explored via the extended computing pool built on top of various heterogeneous Grid resources. The Virtual Organization (VO) policy is embedded into the glideinWMS and pilot job configuration. GSI authentication, authorization and interfacing with gLExec allows a large user basis to be supported and seamlessly integrated with Grid computing infrastructure. The operation of glideinWMS at CMS shows that it is a highly available and stable system for a large VO of thousands of users and running tens of thousands of user jobs simultaneously. The enhanced monitoring allows system administrators and users to easily track the system-level and job-level status.
Files in Document:
Authors:
Publication Information:
This paper is presented at IEEE/NSS 2009
DocDB Home ]  [ Search ] [ Last 20 Days ] [ List Authors ] [ List Events ] [ List Topics ]

Supported by the National Science Foundation and the U.S. Department of Energy's Office of Science Contact Us | Site Map

DocDB Version 8.7.23, contact Document Database Administrators