OSG Document 911-v1

Hadoop Distributed File System for the Grid

Document #:
Document type:
Submitted by:
Updated by:
Haifeng Pi
Document Created:
24 Nov 2009, 16:23
Contents Revised:
24 Nov 2009, 16:23
Metadata Revised:
24 Nov 2009, 16:28
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions.
Files in Document:
Publication Information:
This paper is presented at IEEE/NSS 2009
DocDB Home ]  [ Search ] [ Last 20 Days ] [ List Authors ] [ List Events ] [ List Topics ]

Supported by the National Science Foundation and the U.S. Department of Energy's Office of Science Contact Us | Site Map

DocDB Version 8.8.9, contact Document Database Administrators