Assignment 2: TCP Traffic Analysis (25 marks)
Due: Thursday, February 16, 2012 (11:59pm)The purpose of this assignment is to learn about the Transmission Control Protocol (TCP). In particular, you will write a C or C++ program to analyze a specially formatted network traffic trace file, in order to assess and understand the TCP/IP protocol, including its handshaking behaviour and its protocol states.
The file trace.txt (210 KB ASCII text file) shows some TCP/IP packet traffic collected using a network traffic analyzer on a research network at the University of Calgary. This trace contains 2,168 TCP/IP packets, and lasts about 3.5 minutes. During the period traced, a single Web client was downloading Web pages from different Web sites on the Internet. This trace is to be used for your TCP traffic analysis, and for answering the questions given below.
Each line of data in the trace file represents one TCP/IP packet. There are multiple columns of data on each line, separated by spaces. The columns, from left to right, represent:
- the timestamp (in seconds) at which the packet was seen
- the IP source address in the packet
- the IP destination address in the packet
- the size of the IP packet (including the IP header and the TCP header)
- the protocol type in the packet (always TCP in this trace)
- the source port specified in the packet
- the destination port specified in the packet
- the TCP sequence number carried in the packet (the values shown represent the sequence number associated with the first byte of data, and a number just beyond that of the last byte of data in the packet, such that subtracting the two values indicates how many actual bytes of TCP payload data are in the packet, if any. These two values are separated by a colon.)
- the TCP acknowledgement number carried in the packet
- the receive window advertised by the receiver
- the TCP flags carried in the packet, if any. These flags are encoded in the trace as 'S' for a SYN packet (i.e., handshake to open a connection), 'F' for a FIN packet (i.e., handshake to close a connection), 'P' for the PUSH bit, 'A' to indicate a valid acknowledgement number, and 'R' to reset a connection due to a protocol error. These are the only possibilities in this trace.
An example line from this trace is:
1916.911715 192.168.1.9 -> 216.239.39.99 44 TCP 1026 80 20948 : 20948 0 win: 32768 S
This TCP packet traveled from IP source address 192.168.1.9 (port 1026)
to IP destination address 216.239.39.99 (port 80) at time 1916.911715 sec.
It was a SYN packet of size 44 bytes (including TCP/IP protocol headers).
The proposed starting TCP sequence number was 20948.
This packet carried no actual TCP data bytes.
The acknowledgement field was invalid, and initialized to 0.
The flow control window size advertised was 32 KB.
You need to write a program (20 marks) for parsing and processing trace files in this format, and tracking TCP connection state information. In particular, the program processes the trace file and computes summary information about TCP connections. Note that a TCP connection is identified by a 4-tuple (IP source address, source port, IP destination address, destination port), and packets can flow in both directions on a connection (i.e., from host A to host B, and from host B to host A). Also note that the packets from different connections can be arbitrarily interleaved with each other in time, so your program will need to extract packets and associate them with the correct connection. Your program should be written in C or C++.
The summary information to be computed for each TCP connection includes:
- the state of the connection. The states of a connection can be expressed in a concise numerical form as SiFj, where i and j are small integers indicating the number of SYN and FIN packets observed for the connection, respectively. For example, S1F1 means one SYN and one FIN, while S0F2 means no SYN and two FIN. We also use a special state R to indicate a connection that was reset due to a protocol error. Getting the TCP state information correct is an important part of your program. We are especially interested in the complete TCP connections for which at least one SYN and at least one FIN are observed (and no resets!). For these complete connections, you can report additional information, as indicated in the following.
- the starting time, ending time, and total elapsed time duration of each complete connection
- the number of packets sent in each direction on each complete connection, as well as the total packets observed for the connection
- the number of data bytes sent in each direction on each complete connection, as well as the total bytes exchanged by the connection. This byte count is for data bytes (i.e., excluding the TCP and IP protocol headers).
Use your program, and the trace file, to answer as many of the following questions as you can (1 mark each, total of 5 marks):
- How many complete TCP connections are observed in the trace?
- What are the minimum, mean, and maximum time durations of the complete TCP connections that you observed?
- What are the minimum, mean, and maximum number of packets sent on the complete TCP connections that you observed?
- What is the minimum, mean, and maximum number of data bytes sent on the complete TCP connections that you observed?
- How many reset TCP connections are observed in the trace?
Bonus (3 marks): Find in the trace the complete TCP connection that downloaded the most TCP data bytes from the server. What is the IP address of this Web server? Approximately how many objects were downloaded? What was the average throughput (in bits per second) for the entire connection?
When you are finished, please submit your assignment solution as a single file in electronic form to your TA on or before the stated deadline. Make sure your name and identification is on everything that you submit. Your submission should include your source code, the output produced by your program on the trace file, and a text file with your answers to the questions above. Make sure to show your work for any calculations.
TIPS:
Parsing the input file might look a bit intimidating, but it is doable. If you need some help getting started, take a look at tcpreader.c and try it out.
For testing your program, it is best to work with some small example traces. Here are a few that you might find useful. The trace example1.txt (17 TCP packets) contains a single complete TCP connection. The trace example2.txt (12 TCP packets) contains a single TCP connection that is reset. The trace example3.txt (100 TCP packets) contains 8 TCP connections (5 complete, 2 reset, and 1 still in progress when the trace ended, as evidenced in the example3 sample output). When you have your program working properly, you can run it on the large trace file for this assignment.