Assignment 2: Pushmi-Pullyu (32 marks)

Due: Tuesday, March 4, 2014 (11:59pm)

Learning Objectives

The purpose of this assignment is to learn about some of the tradeoffs between TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) as transport-layer protocols. In particular, you will write a C or C++ program that implements a simple peer-assisted file sharing protocol, using either a pull-based or push-based design.

Background

In the late 1990's, the nature of the Internet changed dramatically with the emergence of napster as the first prominent peer-to-peer (P2P) file-sharing application. Napster was apparently designed and implemented by a college freshman (i.e., first-year student). It had a centralized server to keep track of files and participating peers, but used P2P transfers for the actual exchange of files, with a single large FTP-like transfer for each file. Many other P2P file sharing applications have emerged since napster was shut down in 2000, with some of the more prominent ones being Gnutella, KaZaA, eDonkey, eMule, and BitTorrent. BitTorrent in particular has many interesting design features, including chunk-based transfers of pieces of files, rather than entire files. These chunks can be obtained concurrently from different participating peers, which provides a very scalable "bandwidth harvesting" mechanism for high performance data transfers.

Technical Requirements

In this assignment, you will design and implement a simple peer-assisted file sharing application, and test it on a network of your own choosing. Your application will make use of a central server, which provides a repository for storing files and keeping track of participating peers. The server will have a single file available. Each peer contacts the central server to find out what other peers are participating, and what pieces of the file each peer has. Most of the time, peers use P2P exchange of pieces to advance their progress in downloading the file. If no peers have useful pieces, then a piece can be obtained from the server. This approach is called peer-assisted file sharing.

Your application can use either UDP (i.e., datagram sockets) or TCP (i.e., stream sockets) for its underlying transport-layer protocol. This choice is entirely up to you, but you will need to provide justification for your design in the documentation that you submit. With TCP, reliable data transfer will be provided automatically, but there will be a lot of transport-layer connections to keep track of. With UDP, the transport-layer endpoints will be simpler to manage, but you may need to add some reliable data transfer mechanisms (e.g., timeouts, retransmissions) in order to make your protocol work.

Your application can use either a pull-based design or a push-based design (your choice). In a pull-based design, a participating peer A who needs piece i would consult the server to find a peer B who has piece i, and then contact peer B to download piece i (i.e., similar to a GET in FTP). In a push-based design, a participating peer A who has piece j would consult the server to find a peer B who needs piece j, and then contact peer B to upload piece j (i.e., similar to a PUT in FTP). Since either design approach is possible, we will call this file sharing application Pushmi-Pullyu, named after one of Doctor Dolittle's legendary creatures. (Here is a rare photo, courtesty of BBC News.)

There are several important configuration parameters in Pushmi-Pullyu (PP for short):

N is an integer parameter specifying the maximum number of participating peers that the server can handle. We will limit this to at most N = 8. Any additional peers beyond N that try to connect to the server are turned away (i.e., rejected).
P is an integer parameter specifying the maximum number of connections that a peer can have in the file sharing protocol. We will limit this to 1 <= P <= N. Note that one connection to the server is used to query scoreboard information, and the other P-1 connections can be used to talk to other peers. (There is clearly no need for a peer to talk to itself.) Multiple concurrent connections to the same peer are permitted, though this obviously reduces the number of other peers that can be contacted.
M is an integer parameter specifying the number of pieces in the file that is being shared. For initial testing, we will use M = 8, but later experiments will use M = 32. The M pieces should be retrieved in random order.
C is an integer parameter specifying the chunk size for file pieces. For initial testing, we will use C = 1 byte, but later experiments will use C = 1024 bytes. (Note that the product of M and C is the total file size.)

Grading Rubric

The grading scheme for the assignment is as follows:

12 marks for the design and implementation of a UDP-based or TCP-based solution to this problem, capable of handling the simple case of N = 2, P = 2, C = 1, and M = 8, where the single file being shared has the data content "Dolittle". You may use either a pull-based design or a push-based design. Your central server must be able to keep track of the participating peers and the pieces that each peer has. Your peer nodes must be able to communicate with the server, and participate in the upload and download of file pieces with other peers. Your implementation should include proper use of UDP or TCP socket programming in C or C++, and reasonably commented code. Test your solution either on a single machine using the loopback interface, or using a real network of your own choosing.
8 marks for extending your UDP-based or TCP-based solution to this problem so that it can successfully handle N = 8, P = 8, C = 1024, and M = 32 for a single 32 KB ASCII test file. Think carefully about the design of a non-blocking (i.e., multi-threaded or multi-process) implementation that handles concurrency well. Test your solution either on a single machine using the loopback interface, or using a real network of your own choosing.
6 marks for a clear and concise user manual (at most 1 page) that describes how to compile, configure, and use your application. Make sure to indicate (and justify) your design choice (i.e., push-based or pull-based), and describe any special features that are supported. Please include a paragraph or two about your choice of either UDP or TCP as a transport-layer protocol, and the implications of this choice (e.g., size/complexity of code, scalability, range of parameters supported, portability of code across different networks). Last but not least, describe where and how your testing was done (e.g., home, university, office), what works, and what does not. Be honest!
6 marks for a suitable demonstration of your Pushmi-Pullyu protocol to your TA in your tutorial section, or to your professor at a mutually convenient time. All demos will be done during the week of March 10.

Bonus (4 marks): Extend your solution as necessary so that it can successfully handle N = 8, P = 8, C = 8192, and M = 32 for a single 256 KB ASCII test file.

When you are finished, please submit your assignment solution to your TA via D2L. Your submission should include your source code and user manual documentation.

Tips

This assignment is extremely challenging, so please get started early!
You may want to start with the UDP-based approach as a warmup exercise, to help build your intuition about how peer-assisted file sharing works. Carey's "word server" might provide a useful starting point for this.
Here is debugging output from a run of Carey's prototype UDP solution that he showed in class: server, client-peer1, and client-peer2.
You can tackle a TCP-based version once you have a better understanding of the dynamics of PP and its communication requirements. This one will definitely drive you crazy at some point!!
You may find Wireshark particularly useful to look at the network packets being sent and received by your application (i.e., ports, size, data content).
When debugging your application, you may want your peers to put (random or deterministic) delays between their piece requests. You can always remove these later.
Get the single peer case working first, and then extend it to two peers. Once the latter is working, extending to larger P should be relatively straightforward. At least, one would hope so.
Stick with plain-text ASCII files for your testing. Start small, and then work upwards to larger files for your final experiments.
Work with very small files and pieces (less than 1 KB) to start with, so that you can put verbose debugging in to test the functionality of your protocol. Once these are working right, you can scale up to larger files (4 KB or more) to tackle any new challenges that arise.
You do not have to make your server non-blocking, but you can do so if you wish. It is advisable to do so, but not strictly required. Depends a bit on whether you are using TCP or UDP.
If you are still totally baffled on how to get started, then click here for even more blatant hints and suggestions.

CPSC 441: Computer Communications