OSG Document 689-v0

DZero Data-Intensive Computing on the Open Science Grid

Document #:
Document type:
Submitted by:
Marcia Teckenbrock
Updated by:
Marcia Teckenbrock
Document Created:
28 Aug 2007, 10:57
Contents Revised:
28 Aug 2007, 10:57
Metadata Revised:
28 Aug 2007, 10:57
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

High energy physics experiments periodically reprocess data, in order to take
advantage of improved understanding of the detector and the data processing code.
Between February and May 2007, the DZero experiment will reprocess a substantial
fraction of its dataset. This consists of half a billion events, corresponding to
more than 100 TB of data, organized in 300,000 files.

The activity utilizes resources from sites around the world, including a dozen sites
participating to the Open Science Grid consortium (OSG). About 1,500 jobs are run
every day across the OSG, consuming and producing hundreds of Gigabytes of data. OSG
computing and storage resources are coordinated by the SAM-Grid system. This system
organizes job access to a complex topology of data queues and job scheduling to
clusters, using a SAM-Grid to OSG job forwarding infrastructure.

For the first time in the lifetime of the experiment, a data intensive production
activity is managed on a general purpose grid, such as OSG. This paper describes the
implications of using OSG, where all resources are granted following an opportunistic
model, the challenges of operating a data intensive activity over such large
computing infrastructure, and the lesson learned throughout the few months of the

Files in Document:
Associated with Events:
CHEP'07 held on 02 Sep 2007 in Victoria, British Columbia, Canada
DocDB Home ]  [ Search ] [ Last 20 Days ] [ List Authors ] [ List Events ] [ List Topics ]

Supported by the National Science Foundation and the U.S. Department of Energy's Office of Science Contact Us | Site Map

DocDB Version 8.8.9, contact Document Database Administrators