Monday, February 25, 2013

Shark: Scaling File Servers via Cooperative Caching

by S. Annapureddy et al., NSDI 2005.

Abstract:
Network file systems offer a powerful, transparent inter- face for accessing remote data. Unfortunately, in current network file systems like NFS, clients fetch data from a central file server, inherently limiting the system’s ability to scale to many clients. While recent distributed (peer-to- peer) systems have managed to eliminate this scalability bottleneck, they are often exceedingly complex and pro- vide non-standard models for administration and account- ability. We present Shark, a novel system that retains the best of both worlds—the scalability of distributed systems with the simplicity of central servers. 
Shark is a distributed file system designed for large- scale, wide-area deployment, while also providing a drop- in replacement for local-area file systems. Shark intro- duces a novel cooperative-caching mechanism, in which mutually-distrustful clients can exploit each others’ file caches to reduce load on an origin file server. Using a dis- tributed index, Shark clients find nearby copies of data, even when files originate from different servers. Perfor- mance results show that Shark can greatly reduce server load and improve client latency for read-heavy workloads both in the wide and local areas, while still remaining competitive for single clients in the local area. Thus, Shark enables modestly-provisioned file servers to scale to hundreds of read-mostly clients while retaining tradi- tional usability, consistency, security, and accountability. 

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/shark.pdf

OceanStore: An Architecture for Global-Scale Persistent Storage

by J. Kubiatowicz et al., ASPLOS 2000.

Abstract:
OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is pro- tected through redundancy and cryptographic techniques. To im- prove performance, data is allowed to be cached anywhere, any- time. Additionally, monitoring of usage patterns allows adapta- tion to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development. 

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/oceanstore.pdf

Monday, February 18, 2013

Panache: A Parallel File System Cache for Global File Access


by M. Eshel et al., FAST 2010.

Abstract:
Cloud computing promises large-scale and seamless ac- cess to vast quantities of data across the globe. Appli- cations will demand the reliability, consistency, and per- formance of a traditional cluster file system regardless of the physical distance between data centers.
Panache is a scalable, high-performance, clustered file system cache for parallel data-intensive applications that require wide area file access. Panache is the first file system cache to exploit parallelism in every aspect of its design—parallel applications can access and update the cache from multiple nodes while data and metadata is pulled into and pushed out of the cache in parallel. Data is cached and updated using pNFS, which performs parallel I/O between clients and servers, eliminating the single-server bottleneck of vanilla client-server file ac- cess protocols. Furthermore, Panache shields applica- tions from fluctuating WAN latencies and outages and is easy to deploy as it relies on open standards for high- performance file serving and does not require any propri- etary hardware or software to be installed at the remote cluster.
In this paper, we present the overall design and imple- mentation of Panache and evaluate its key features with multiple workloads across local and wide area networks.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/panache.pdf

Nache: Design and Implementation of a Caching Proxy for NFSv4


by A. Gulati et al., FAST 2007.

Abstract:
In this paper, we present Nache, a caching proxy for NFSv4 that enables a consistent cache of a remote NFS server to be maintained and shared across multiple lo- cal NFS clients. Nache leverages the features of NFSv4 to improve the performance of file accesses in a wide- area distributed setting by bringing the data closer to the client. Conceptually, Nache acts as an NFS server to the local clients and as an NFS client to the remote server. To provide cache consistency, Nache exploits the read and write delegations support in NFSv4. Nache enables the cache and the delegation to be shared among a set of local clients, thereby reducing conflicts and improving performance. We have implemented Nache in the Linux 2.6 kernel. Using Filebench workloads and other bench- marks, we present the evaluation of Nache and show that it can reduce the NFS operations at the server by 10-50%.

Link to the full paper:

Wednesday, February 6, 2013

Scalable Performance of the Panasas Parallel File System

by B. Welch et al., FAST 2008.

Abstract:
The Panasas file system uses parallel and redundant access to object storage devices (OSDs), per-file RAID, distributed metadata management, consistent client caching, file locking services, and internal cluster management to provide a scalable, fault tolerant, high performance distributed file system. The clustered design of the storage system and the use of client- driven RAID provide scalable performance to many concurrent file system clients through parallel access to file data that is striped across OSD storage nodes. RAID recovery is performed in parallel by the cluster of metadata managers, and declustered data placement yields scalable RAID rebuild rates as the storage system grows larger. This paper presents performance measures of I/O, metadata, and recovery operations for storage clusters that range in size from 10 to 120 storage nodes, 1 to 12 metadata nodes, and with file system client counts ranging from 1 to 100 compute nodes. Production installations are as large as 500 storage nodes, 50 metadata managers, and 5000 clients. 

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/panasas.pdf

GPFS: A Shared-Disk File System for Large Computing Clusters


by F. Schmuck et al., FAST 2002.

Abstract:
GPFS is IBM’s parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/gpfs.pdf

Tuesday, February 5, 2013

Lustre: A Scalable, High-Performance File System


by Cluster File Systems, Whitepaper 2002.

Abstract:
Today's network-oriented computing environments require high-performance, network-aware file systems that can satisfy both the data storage requirements of individual systems and the data sharing requirements of workgroups and clusters of cooperative systems. The Lustre File System, an open source, high-performance file system from Cluster File Systems, Inc., is a distributed file system that eliminates the performance, availability, and scalability problems that are present in many traditional distributed file systems. Lustre is a highly modular next generation storage architecture that combines established, open standards, the Linux operating system, and innovative protocols into a reliable, network-neutral data storage and retrieval solution. Lustre provides high I/O throughput in clusters and shared-data environments and also provides independence from the location of data on the physical storage, protection from single points of failure, and fast recovery from cluster reconfiguration and server or network outages.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/lustre-whitepaper.pdf

PVFS: A Parallel File System for Linux Clusters


by P. Carns et al., Linux Conference 2000.

Abstract:
As Linux clusters have matured as platforms for low- cost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and net- working. One area devoid of support, however, has been parallel file systems, which are critical for high- performance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further re- search in parallel I/O and parallel file systems for Linux clusters.
In this paper, we describe the design and implementa- tion of PVFS and present performance results on the Chiba City cluster at Argonne. We provide performance results for a workload of concurrent reads and writes for various numbers of compute nodes, I/O nodes, and I/O request sizes. We also present performance results for MPI-IO on PVFS, both for a concurrent read/write workload and for the BTIO benchmark. We compare the I/O performance when using a Myrinet network versus a fast-ethernet network for I/O-related communication in PVFS. We obtained read and write bandwidths as high as 700 Mbytes/sec with Myrinet and 225 Mbytes/sec with fast ethernet.