by S. Annapureddy et al., NSDI 2005.
Abstract:
Network file systems offer a powerful, transparent inter- face for accessing remote data. Unfortunately, in current network file systems like NFS, clients fetch data from a central file server, inherently limiting the system’s ability to scale to many clients. While recent distributed (peer-to- peer) systems have managed to eliminate this scalability bottleneck, they are often exceedingly complex and pro- vide non-standard models for administration and account- ability. We present Shark, a novel system that retains the best of both worlds—the scalability of distributed systems with the simplicity of central servers.
Shark is a distributed file system designed for large- scale, wide-area deployment, while also providing a drop- in replacement for local-area file systems. Shark intro- duces a novel cooperative-caching mechanism, in which mutually-distrustful clients can exploit each others’ file caches to reduce load on an origin file server. Using a dis- tributed index, Shark clients find nearby copies of data, even when files originate from different servers. Perfor- mance results show that Shark can greatly reduce server load and improve client latency for read-heavy workloads both in the wide and local areas, while still remaining competitive for single clients in the local area. Thus, Shark enables modestly-provisioned file servers to scale to hundreds of read-mostly clients while retaining tradi- tional usability, consistency, security, and accountability.
Link to the full paper:
http://www.cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/shark.pdf
In Shark, when a node retrieves a file, it becomes a proxy for further access requests, for that file by other nodes. How this single proxy node handles multiple access and write requests?
ReplyDeleteWouldn't this proxy will be saturated with multiple requests?
Once client second client fetches the copy it also register itself as a proxy for that chunk. This new proxy is also ready to transfer chunk to other clients. This is how the load among the clients distributed.
DeleteThis comment has been removed by the author.
ReplyDeleteHow shark client ensures that it is fetching chunks from nearby client
ReplyDeleteClients form overlay cluster, this cluster is characterized by RTT (Round Trip Time). Queries go to belonging cluster, if chunk found in underline cluster then it downloads from them. This is how system ensure locality awareness.
DeleteIf the most nearby client has a smaller file chunk compared to another client which has the complete file, how does shark deal this scenario?
ReplyDeleteCline downloads file in parallel by spawning k threads, chunk by chunk, it gets chunk from multiple clients. If proxy has next chunks it re usage the connection and download next chunks.
DeleteIf some different clients ask for the same file lease, what will the file sever do?
ReplyDeleteThe lease offered by server it to notify the client about the modification for a given time eg 5 mins. Server grants lease to each request, even though already leased by other client.
DeleteWhy does Shark use Sloppy DHT(Several values for same key) instead of regular DHT for its distributed index?
ReplyDeleteThere is one to many relation ship with chunk to proxies. A DSHT provides a similar interface to a DHT, except
Deletethat a key may have multiple values: put(key, value)
stores a value under key, and get(key) need only return
some subset of the values stored. This is suitable for shark as many proxies registered for same chunk.
In what order should we fetch chunks of file? It is unclear to me as to when to use the random fetch and when to use sequential fetch ..
ReplyDeletePaper talk about the experiment they conducted and found that random fetch is more suitable for parallel download and increase throughput.
Delete