Tuesday, March 26, 2013

Ivy: A Read/Write Peer-to-Peer File System

by A. Muthitacharoen et al., OSDI 2002.


Abstract: Ivy is a multi-user read/write peer-to-peer file system. Ivy has no centralized or dedicated components, and it provides useful integrity properties without requiring users to fully trust either the underlying peer-to-peer storage system or the other users of the file system. 

An Ivy file system consists solely of a set of logs, one log per participant. Ivy stores its logs in the DHash distributed hash table. Each participant finds data by consulting all logs, but performs modifi cations by appending only to its own log. This arrangement allows Ivy to maintain meta-data consistency without locking. Ivy users can choose which other logs to trust, an appropriate arrangement in a semi-open peer-to-peer system.

Ivy presents applications with a conventional file system interface. When the underlying network is fully connected, Ivy provides NFS-like semantics, such as close-to-open consistency. Ivy detects confl icting modifi cations made during a partition, and provides relevant version information to application-specifi c confl ict resolvers. Performance measurements on a wide-area network show that Ivy is two to three times slower than NFS.

Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/ivy.pdf

18 comments:

  1. A log at a system is written after disk write? What if the system crashes in between, will the log be in-consistent after recovery? How Ivy handles it.

    ReplyDelete
    Replies
    1. Logs are written before writing to disk. Hence, after recovery from crashes, the log of the system is considered to reconstruct the overall file system

      Delete
  2. Given that Ivy is 3 times slower than NFS, is it still feasible to use it?

    ReplyDelete
    Replies
    1. Ivy is a peer to peer system and the slowness is mainly due to more number of network round trips and cost of dhash table construction. It is useful when the peers need not trust each other and when single central server is not needed/present.

      Delete
  3. In case of Network partition, what actions taken place with Dhash. Will there be migration or distribution of logs from one partition to another. What could be the ways to know network partition.

    ReplyDelete
    Replies
    1. 1. In case of network partition, no updates are done to Dhash till the partition heals. The older logs are still preserved even during partition.
      2. There won't be any migration or distribution of log. Each participant is responsible for its own logs. No other participant can take up the responsibility of handling other log.
      3. During partition too, the local logs are maintained. Upon recovery, these logs can be replayed to know when partition occured

      Delete
  4. Ivy does not reclaim log space. Isn't this expensive?

    ReplyDelete
    Replies
    1. No. Indefinite retaining of old logs is not expensive according to "Venti: a new approach for archival storage".
      However , Elephant's approach of allowing editing of logs upto a certain time limit before ceasing updation can be used if space is a constraint.

      Delete
  5. Because the user could simply examine the versions of the file and merge them by hand in a text editor. So, how to ensure that every user satisfy with this merge?

    ReplyDelete
    Replies
    1. Version vectors are usually used to resolve conflicts and merge versions of files.I believe after each manual merge, version vectors will be referred to ensure causal ordering.

      Delete
  6. In case of partitioned updates, how is the ordering achieved once the partition heals?

    ReplyDelete
    Replies
    1. Once the partition heals, the local log file is considered and the log head is used to compare the version vectors. If conflicts are found, public keys of the two logs are considered and if necessary 'lc' tool is used to resolve them.

      Delete
  7. Chord doesn't necessarily ensure load balancing. Does Ivy have any specific mechanism to achieve that?

    ReplyDelete
    Replies
    1. SHA1 hashing provides consistent hashing and this is known to provide load balancing by distributing keys equally in the chord.

      Delete
  8. Why is success status returned for both concurrent updates that are conflicting in nature (like rename and unlink)?

    ReplyDelete
    Replies
    1. This is one strange behavior of Ivy. Since Logs are maintained independently by each participant and the version vectors are equal in this case, both requests get honored as seen by the participant. However subsequently all participants agree on order of concurrent logs and will thus agree on which update succeeded.

      Delete
  9. Could you elaborate on how the lc tool is used for conflict resolution?

    ReplyDelete
  10. lc tool is used to resolve conflicts during concurrent updates during partitions. It scans the logs for records with concurrent version vector and identifies the partition point and classifies participants w.r.t partitions. It then creates multiple historic views: at the time of partition and for each participant just before partition healed. These are used to merge either manually or by using application specific resolvers. Apart from this, I could not find much information.

    ReplyDelete