HPSS:
The High Performance Storage System (HPSS) is
IBM's highly scalable Hierarchical Storage Management (HSM) System.
HPSS is intended to be used by IBM's high-end HPC customers, with
storage requirements in the tens of millions, hundreds of millions, and
even the billion file range. HPSS is capable of concurrently accessing
hundreds of tapes for extremely high aggregate data transfer rates, and
can easily meet otherwise unachievable total storage bandwidth and
capacity requirements. HPSS can stripe files across multiple tapes,
which ensures high bandwidth data transfers of huge files. HPSS
provides for stewardship and access of many petabytes of data stored on
robotic tape libraries.
|
|
GPFS:
The IBM General Parallel File System (GPFS)
is a true distributed, clustered file system. Multiple servers are
used to manage the data and metadata of a single file system.
Individual files are broken into multiple blocks and striped across
multiple disks, and multiple servers, which eliminates bottlenecks.
Information Lifecycle Management (ILM) policy scans are also distributed
across multiple servers, which allows GPFS to quickly scan the entire
file system, identifying files that match a specific criteria. Shortly
after we showed the Billion File Demo
at the international conference on high performance computing (SC07),
the Almaden Research Center showed that a pre-GA version of GPFS is
capable of scanning a single GPFS file system, containing a billion
files, in less than 15 minutes!
|
|
GPFS/HPSS Interface (GHI):
|
|
GPFS + HPSS:
The ILM policy scan results are sent to the
Scheduler. The Scheduler distributes the work to the I/O Managers
(IOM), and the GPFS data are copied to HPSS in parallel. For those
files that are no longer active, holes are then punched into GPFS files
to free up GPFS disk resources. The continuous movement of GPFS files
to HPSS tape, and the freeing of GPFS disk resources is an automated
process that is transparent to the GPFS user. If the GPFS user should
access a file that is only on HPSS tape, the file will automatically
stage back to GPFS, so the user can access the file.
The GPFS/HPSS Interface continuously copies GPFS file
data to HPSS. When the time comes to perform a backup, only those files
that have not yet been copied to HPSS tape are migrated. Therefore, it
is NOT necessary to recapture all of the file data at each backup.
|
|
Small File Aggregation:
Most GPFS file systems are made up of small files
about 90% of the files use 10% of the disk resources. Traditionally,
copying small files to tape diminishes your tape drive performance. The
GPFS/HPSS Interface copies small files from GPFS to HPSS by grouping
many small files into much larger aggregates. Small file aggregation is
completely configurable, but 10,000 GPFS files were placed into each
HPSS aggregate at the SC07 Billion File Demo. Large aggregates allow data to stream to the tape drive, which yields higher tape transfer rates.
|
That's why we say...
GPFS + HPSS = Extreme Storage Scalability!
0 blogger-disqus:
Post a Comment