by P. Carns et al., Linux Conference 2000.
Abstract:
As Linux clusters have matured as platforms for low- cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and net- working. One area devoid of support, however, has been parallel file systems, which are critical for high- performance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further re- search in parallel I/O and parallel file systems for Linux clusters.
In this paper, we describe the design and implementa- tion of PVFS and present performance results on the Chiba City cluster at Argonne. We provide performance results for a workload of concurrent reads and writes for various numbers of compute nodes, I/O nodes, and I/O request sizes. We also present performance results for MPI-IO on PVFS, both for a concurrent read/write workload and for the BTIO benchmark. We compare the I/O performance when using a Myrinet network versus a fast-ethernet network for I/O-related communication in PVFS. We obtained read and write bandwidths as high as 700 Mbytes/sec with Myrinet and 225 Mbytes/sec with fast ethernet.
As Linux clusters have matured as platforms for low- cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and net- working. One area devoid of support, however, has been parallel file systems, which are critical for high- performance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further re- search in parallel I/O and parallel file systems for Linux clusters.
In this paper, we describe the design and implementa- tion of PVFS and present performance results on the Chiba City cluster at Argonne. We provide performance results for a workload of concurrent reads and writes for various numbers of compute nodes, I/O nodes, and I/O request sizes. We also present performance results for MPI-IO on PVFS, both for a concurrent read/write workload and for the BTIO benchmark. We compare the I/O performance when using a Myrinet network versus a fast-ethernet network for I/O-related communication in PVFS. We obtained read and write bandwidths as high as 700 Mbytes/sec with Myrinet and 225 Mbytes/sec with fast ethernet.
Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/pvfs.pdf
Since manager daemon is single point of contact for metadata, what if it crashes, is there a mechanism to safeguard against it?
ReplyDeleteWhen portions of data from different I/O nodes are striped, what if one I/O node fails? How does PVFS handle it?
ReplyDeleteWhat happens when an IO Daemon is down while a file is deleted? The portions of the file in the other daemons will be deleted except the part in the IO Daemon that is down.Any mechanism to overcome this?
ReplyDeleteWhat happens if a particular server has less space compared to others? How is data striped then?
ReplyDeleteMy question is general to parallel FSes.
ReplyDeleteThe file is transferred in data packets to the client. In a single connection(client-server), the TCP protocol takes care of the sequencing/ordering of the packets at the client side. When the same file is distributed among machines then, with multiple connections, how the sequencing of files is taken care of at the receiving client side?
Is there a way for the pvfs system to detect any possible I/O node failures and recover from it?
ReplyDeleteIs there a support for a centralized data recovery/data redundancy in case of I/O node crashes? Or is it left to individual I/O nodes?
ReplyDeleteCan you give an example for the use of logical file partitioning?
ReplyDelete