Abstract: Ivy is a multi-user read/write peer-to-peer file system. Ivy has no centralized or dedicated components, and it provides useful integrity properties without requiring users to fully trust either the underlying peer-to-peer storage system or the other users of the file system.
An Ivy file system consists solely of a set of logs, one log per participant. Ivy stores its logs in the DHash distributed hash table. Each participant finds data by consulting all logs, but performs modifi cations by appending only to its own log. This arrangement allows Ivy to maintain meta-data consistency without locking. Ivy users can choose which other logs to trust, an appropriate arrangement in a semi-open peer-to-peer system.
Ivy presents applications with a conventional file system interface. When the underlying network is fully connected, Ivy provides NFS-like semantics, such as close-to-open consistency. Ivy detects confl icting modifi cations made during a partition, and provides relevant version information to application-specifi c confl ict resolvers. Performance measurements on a wide-area network show that Ivy is two to three times slower than NFS.
Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/ivy.pdf
A log at a system is written after disk write? What if the system crashes in between, will the log be in-consistent after recovery? How Ivy handles it.
ReplyDeleteLogs are written before writing to disk. Hence, after recovery from crashes, the log of the system is considered to reconstruct the overall file system
DeleteGiven that Ivy is 3 times slower than NFS, is it still feasible to use it?
ReplyDeleteIvy is a peer to peer system and the slowness is mainly due to more number of network round trips and cost of dhash table construction. It is useful when the peers need not trust each other and when single central server is not needed/present.
DeleteIn case of Network partition, what actions taken place with Dhash. Will there be migration or distribution of logs from one partition to another. What could be the ways to know network partition.
ReplyDelete1. In case of network partition, no updates are done to Dhash till the partition heals. The older logs are still preserved even during partition.
Delete2. There won't be any migration or distribution of log. Each participant is responsible for its own logs. No other participant can take up the responsibility of handling other log.
3. During partition too, the local logs are maintained. Upon recovery, these logs can be replayed to know when partition occured
Ivy does not reclaim log space. Isn't this expensive?
ReplyDeleteNo. Indefinite retaining of old logs is not expensive according to "Venti: a new approach for archival storage".
DeleteHowever , Elephant's approach of allowing editing of logs upto a certain time limit before ceasing updation can be used if space is a constraint.
Because the user could simply examine the versions of the file and merge them by hand in a text editor. So, how to ensure that every user satisfy with this merge?
ReplyDeleteVersion vectors are usually used to resolve conflicts and merge versions of files.I believe after each manual merge, version vectors will be referred to ensure causal ordering.
DeleteIn case of partitioned updates, how is the ordering achieved once the partition heals?
ReplyDeleteOnce the partition heals, the local log file is considered and the log head is used to compare the version vectors. If conflicts are found, public keys of the two logs are considered and if necessary 'lc' tool is used to resolve them.
DeleteChord doesn't necessarily ensure load balancing. Does Ivy have any specific mechanism to achieve that?
ReplyDeleteSHA1 hashing provides consistent hashing and this is known to provide load balancing by distributing keys equally in the chord.
DeleteWhy is success status returned for both concurrent updates that are conflicting in nature (like rename and unlink)?
ReplyDeleteThis is one strange behavior of Ivy. Since Logs are maintained independently by each participant and the version vectors are equal in this case, both requests get honored as seen by the participant. However subsequently all participants agree on order of concurrent logs and will thus agree on which update succeeded.
DeleteCould you elaborate on how the lc tool is used for conflict resolution?
ReplyDeletelc tool is used to resolve conflicts during concurrent updates during partitions. It scans the logs for records with concurrent version vector and identifies the partition point and classifies participants w.r.t partitions. It then creates multiple historic views: at the time of partition and for each participant just before partition healed. These are used to merge either manually or by using application specific resolvers. Apart from this, I could not find much information.
ReplyDelete