Wide Area Distributed File Systems: The Hadoop Distributed File System

Tuesday, April 2, 2013

The Hadoop Distributed File System

by K. Shvachko et al., MSST 2010.

Abstract:
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/hfs.pdf

2 comments:

UnknownApril 2, 2013 at 7:33 PM
Is there anything that prevents HDFS from being used outside the Hadoop framework?
ReplyDelete
Replies
UnknownApril 2, 2013 at 10:08 PM
if the backup node contain all the namespace data in its RAM why cant it be used as a failover node?
ReplyDelete
Replies

Add comment