Tuesday, January 15, 2013

Scale and Performance in a Distributed File System (AFS)



by J. Howard et al., ACM ToCS 1988

Abstract:
The Andrew File System is a location-transparent distributed tile system that will eventually span more than 5000 workstations at Carnegie Mellon University. Large scale affects performance and complicates system operation. In this paper we present observations of a prototype implementation, motivate changes in the areas of cache validation, server process structure, name translation, and low-level storage representation, and quantitatively demonstrate Andrew’s ability to scale gracefully. We establish the importance of whole-file transfer and caching in Andrew by comparing its perform- ance with that of Sun Microsystem’s NFS tile system. We also show how the aggregation of files into volumes improves the operability of the system.

Link to the full paper:

10 comments:

  1. It has been mentioned that even with the use of "Callbacks", there is still a potential for inconsistency if the callback states being maintained by Venus and the server go out of sync. Is there any mechanism present to detect this inconsistency and bring Venus and the server in sync again?

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. The use of a dedicated process per client on each server causes critical resource limits to be exceeded on a number of occasions. It also resulted in excessive context switching overhead and in high virtual memory paging demands. Is there any improvements in the AFS to make a client not prone to crashing?

    ReplyDelete
  4. If a client acquires lock on a file present in server and if the client crashes, how this situation is handled?

    ReplyDelete
  5. In AFS, Files are arranged as Volumes to enable scalability. But each User will be associated to a separate volume. Why is this needed in the case of a system having multiple users? Is this not a overhead? Instead, can't we have an algorithm for tree traversal depending on the permission level for each user?

    ReplyDelete
  6. What is the algorithm for the servers to decide which one callback to break and how to deal with the reclaim storage---just leave there without handling?

    ReplyDelete
  7. How does the AFS handle file system consistency issues and checks on AFS servers? The OpenAFS documentation says to use AFS specific fsck to check file system inconsistencies.

    ReplyDelete
  8. What happens when the callback packets are lost?

    ReplyDelete
  9. How are concurrent changes to the file updated when multiple clients are accessing the same file?

    ReplyDelete
  10. When multiple clients are trying to modify one a file concurrently, is there any protection mechanism and how does AFS ensure the security?

    ReplyDelete