OSG Document 542-v1

Scheduling Data-Intensive Workflows onto Storage-Constrained Distributed Resources

Document #:
OSG-doc-542-v1
Document type:
Scentific Journal/Publications
Submitted by:
Marcia Teckenbrock
Updated by:
Marcia Teckenbrock
Document Created:
09 Feb 2007, 11:02
Contents Revised:
09 Feb 2007, 11:02
Metadata Revised:
09 Feb 2007, 11:02
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Abstract:
In this paper we examine the issue of optimizing disk
usage and of scheduling large-scale scientific workflows
onto distributed resources where the workflows are dataintensive,
requiring large amounts of data storage, and
where the resources have limited storage resources. Our
approach is two-fold: we minimize the amount of space a
workflow requires during execution by removing data files
at runtime when they are no longer required and we schedule
the workflows in a way that assures that the amount of
data required and generated by the workflow fits onto the
individual resources. For a workflow used by gravitationalwave
physicists, we were able to improve the amount of
storage required by the workflow by up to 57%. We also
designed an algorithm that can not only find feasible solutions
for workflow task assignment to resources in diskspace
constrained environments, but can also improve the
overall workflow performance.
Files in Document:
Notes and Changes:
Published in the Seventh IEEE International Symposium on Cluster Computing and the Grid — CCGrid 2007.
DocDB Home ]  [ Search ] [ Last 20 Days ] [ List Authors ] [ List Events ] [ List Topics ]

Supported by the National Science Foundation and the U.S. Department of Energy's Office of Science Contact Us | Site Map

DocDB Version 8.8.9, contact Document Database Administrators