by S. Weil et al., OSDI 2006.
Abstract:
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation ta- bles with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clus- ters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replica- tion, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific comput- ing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata manage- ment, supporting more than 250,000 metadata operations per second.
Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/ceph.pdf
With CRUSH function in client side(and the hashing it uses), the client can directly know the location of the placement groups? ; thereby reducing the dependency on Metadata servers whose primary service might be to inform about the location of the files in the OSDs.
ReplyDeleteHeavily read directories are replicated across multiple nodes to distribute load. Also, clients accessing popular metadata are told the metadata reside either on different or multiple MDS nodes- to reduce hot spots. How is a globally consistent state of metadata attained if clients make changes to different metadata replicas?
ReplyDeleteHow is the CRUSH function better than consistent hashing? Both provide distribution without file allocation tables and consistent hashing is less complex
ReplyDeleteHow does Ceph ensure Data Integrity across replicas?
ReplyDeleteWhen directories become hot spot and are hashed across mul-tiple nodes. In this case how locality is preserved.
ReplyDeleteAre the replicas utilised for load balancing? Is so how is the right OSD identified from the PG?
ReplyDeleteWhat is the significance of the global switch feature that Ceph supports?
ReplyDeleteIn figure 4, does client release the object lock after receipt of the "ack" or the "commit" message?? After "ack", the system might accept another reads/writes right?
ReplyDeleteHow does the client know the address of the metadata server where the information for the desired file is stored?
ReplyDelete