Tuesday, April 9, 2013

Ceph: A Scalable, High-Performance Distributed File System

by S. Weil et al., OSDI 2006.

Abstract:
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation ta- bles with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clus- ters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replica- tion, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific comput- ing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata manage- ment, supporting more than 250,000 metadata operations per second.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/ceph.pdf

9 comments:

  1. With CRUSH function in client side(and the hashing it uses), the client can directly know the location of the placement groups? ; thereby reducing the dependency on Metadata servers whose primary service might be to inform about the location of the files in the OSDs.

    ReplyDelete
  2. Heavily read directories are replicated across multiple nodes to distribute load. Also, clients accessing popular metadata are told the metadata reside either on different or multiple MDS nodes- to reduce hot spots. How is a globally consistent state of metadata attained if clients make changes to different metadata replicas?

    ReplyDelete
  3. How is the CRUSH function better than consistent hashing? Both provide distribution without file allocation tables and consistent hashing is less complex

    ReplyDelete
  4. How does Ceph ensure Data Integrity across replicas?

    ReplyDelete
  5. When directories become hot spot and are hashed across mul-tiple nodes. In this case how locality is preserved.

    ReplyDelete
  6. Are the replicas utilised for load balancing? Is so how is the right OSD identified from the PG?

    ReplyDelete
  7. What is the significance of the global switch feature that Ceph supports?

    ReplyDelete
  8. In figure 4, does client release the object lock after receipt of the "ack" or the "commit" message?? After "ack", the system might accept another reads/writes right?

    ReplyDelete
  9. How does the client know the address of the metadata server where the information for the desired file is stored?

    ReplyDelete