Wide Area Distributed File Systems

Monday, February 25, 2013

Shark: Scaling File Servers via Cooperative Caching

by S. Annapureddy et al., NSDI 2005.

Abstract:
Network file systems offer a powerful, transparent inter- face for accessing remote data. Unfortunately, in current network file systems like NFS, clients fetch data from a central file server, inherently limiting the system’s ability to scale to many clients. While recent distributed (peer-to- peer) systems have managed to eliminate this scalability bottleneck, they are often exceedingly complex and pro- vide non-standard models for administration and account- ability. We present Shark, a novel system that retains the best of both worlds—the scalability of distributed systems with the simplicity of central servers.
Shark is a distributed file system designed for large- scale, wide-area deployment, while also providing a drop- in replacement for local-area file systems. Shark intro- duces a novel cooperative-caching mechanism, in which mutually-distrustful clients can exploit each others’ file caches to reduce load on an origin file server. Using a dis- tributed index, Shark clients find nearby copies of data, even when files originate from different servers. Perfor- mance results show that Shark can greatly reduce server load and improve client latency for read-heavy workloads both in the wide and local areas, while still remaining competitive for single clients in the local area. Thus, Shark enables modestly-provisioned file servers to scale to hundreds of read-mostly clients while retaining tradi- tional usability, consistency, security, and accountability.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/shark.pdf

OceanStore: An Architecture for Global-Scale Persistent Storage

by J. Kubiatowicz et al., ASPLOS 2000.

Abstract:
OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is pro- tected through redundancy and cryptographic techniques. To im- prove performance, data is allowed to be cached anywhere, any- time. Additionally, monitoring of usage patterns allows adapta- tion to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/oceanstore.pdf

Monday, February 18, 2013

Panache: A Parallel File System Cache for Global File Access

by M. Eshel et al., FAST 2010.

Abstract:
Cloud computing promises large-scale and seamless ac- cess to vast quantities of data across the globe. Appli- cations will demand the reliability, consistency, and per- formance of a traditional cluster file system regardless of the physical distance between data centers.
Panache is a scalable, high-performance, clustered file system cache for parallel data-intensive applications that require wide area file access. Panache is the first file system cache to exploit parallelism in every aspect of its design—parallel applications can access and update the cache from multiple nodes while data and metadata is pulled into and pushed out of the cache in parallel. Data is cached and updated using pNFS, which performs parallel I/O between clients and servers, eliminating the single-server bottleneck of vanilla client-server file ac- cess protocols. Furthermore, Panache shields applica- tions from fluctuating WAN latencies and outages and is easy to deploy as it relies on open standards for high- performance file serving and does not require any propri- etary hardware or software to be installed at the remote cluster.
In this paper, we present the overall design and imple- mentation of Panache and evaluate its key features with multiple workloads across local and wide area networks.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/panache.pdf

Nache: Design and Implementation of a Caching Proxy for NFSv4

by A. Gulati et al., FAST 2007.

Abstract:
In this paper, we present Nache, a caching proxy for NFSv4 that enables a consistent cache of a remote NFS server to be maintained and shared across multiple lo- cal NFS clients. Nache leverages the features of NFSv4 to improve the performance of file accesses in a wide- area distributed setting by bringing the data closer to the client. Conceptually, Nache acts as an NFS server to the local clients and as an NFS client to the remote server. To provide cache consistency, Nache exploits the read and write delegations support in NFSv4. Nache enables the cache and the delegation to be shared among a set of local clients, thereby reducing conflicts and improving performance. We have implemented Nache in the Linux 2.6 kernel. Using Filebench workloads and other bench- marks, we present the evaluation of Nache and show that it can reduce the NFS operations at the server by 10-50%.

Link to the full paper:

http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/nache.pdf

Wednesday, February 6, 2013

Scalable Performance of the Panasas Parallel File System

by B. Welch et al., FAST 2008.

Abstract:
The Panasas file system uses parallel and redundant access to object storage devices (OSDs), per-file RAID, distributed metadata management, consistent client caching, file locking services, and internal cluster management to provide a scalable, fault tolerant, high performance distributed file system. The clustered design of the storage system and the use of client- driven RAID provide scalable performance to many concurrent file system clients through parallel access to file data that is striped across OSD storage nodes. RAID recovery is performed in parallel by the cluster of metadata managers, and declustered data placement yields scalable RAID rebuild rates as the storage system grows larger. This paper presents performance measures of I/O, metadata, and recovery operations for storage clusters that range in size from 10 to 120 storage nodes, 1 to 12 metadata nodes, and with file system client counts ranging from 1 to 100 compute nodes. Production installations are as large as 500 storage nodes, 50 metadata managers, and 5000 clients.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/panasas.pdf

GPFS: A Shared-Disk File System for Large Computing Clusters

by F. Schmuck et al., FAST 2002.

Abstract:
GPFS is IBM’s parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/gpfs.pdf