CPSC 441: Computer Communications

Professor Carey Williamson

Winter 2014

Assignment 1: The "Odd Duck" Web Proxy (32 marks)

Due: Tuesday, February 4, 2014 (11:59pm)

Learning Objectives

The purpose of this assignment is to learn about the HyperText Transfer Protocol (HTTP) used by the World Wide Web. In particular, you will design and implement an HTTP proxy (i.e., Web proxy server) with functionality that demonstrates both the simplicity and the power of HTTP as an application-layer protocol. Along the way, you will also learn a lot about client-server socket programming, TCP/IP, network debugging, and more.

Background

A Web proxy is a software entity that functions as an intermediary between a Web client (browser) and a Web server. The Web proxy intercepts Web requests from clients and reformulates the requests for transmission to a Web server. When a response is received from the Web server, the proxy sends the response back to the client. From the server's point of view, the proxy is the client. Similarly, from the client's point of view, the proxy is the server. A Web proxy thus provides a single point of control to regulate Internet access between clients and servers. A lot of Calgary schools use Web proxies to limit the types of Web sites that students are allowed to access. Commercially available Web proxies, such as Net Nanny or Barracuda, are some examples of this, as is the open-source proxy Squid, which also provides Web object caching.

Technical Requirements

In this assignment, you will implement and test your very own Web proxy, in either C or C++. The goals of the assignment are to build a properly functioning Web proxy for simple Web pages, and then implement some access control mechanisms to limit the type of content that can be viewed. In particular, you will use your proxy to block "odd" content, where we will define "odd" in an odd way, based on the number of bytes in a Web object (i.e., odd or even). On average, your proxy will block about half of the content on the World Wide Web, which will make for some rather unpredictable browsing!

There are two main pieces of functionality needed in your proxy. The first is the ability to prevent your browser from downloading and rendering odd-sized base HTML pages, replacing these instead with a suitably-worded error indication (in an even-sized HTML page, of course!). The second is the ability to prevent your browser from downloading and rendering odd-sized embedded objects (e.g., images such as .jpg, .gif, and .png), perhaps replacing these with other objects (even-sized, of course). There is no requirement for Web object caching in your proxy at all.

The most important HTTP command for your Web proxy to handle is the "GET" request, which specifies the URL for an object to be retrieved. In the basic operation of your proxy, it should be able to parse, understand, and forward to the Web server a (possibly modified) version of the client request. Similarly, the proxy should be able to parse, understand, and return to the client a (possibly modified) version of the response that the Web server provided to the proxy. Your proxy should be able to handle response codes such as 200 (OK) and 404 (Not Found) correctly, notifying the client as appropriate. HTTP request redirection (302) should also be supported. Reasonable handling of browser-initiated Conditional GET requests and 304 (Not Modified) responses from the server is also desirable, but not mandatory.

You will need at least one TCP socket (i.e., SOCK_STREAM) for client-proxy communication, and at least one additional TCP socket for each Web server you are talking to for proxy-server communication. If you want your proxy to support multiple concurrent HTTP transactions (recommended), you will need to fork child processes (or threads) for request handling as well. Each child process or thread will use its own socket instances for its communications with the client and with the server.

When implementing your proxy, feel free to compile and run your Web proxy on any suitable department machine, or even your home machine or laptop, but please be aware that you will ultimately have to demo your proxy to your TA on campus at some point. You should be able to use your proxy from a modern Web browser (e.g., Internet Explorer, Mozilla Firefox, Chrome, Safari), and from any machine (either on campus or at home). To test the proxy, you will have to configure your Web browser to use your specific Web proxy (e.g., look for menu selections like Tools, Internet Options, Proxies, Advanced, LAN Settings).

As you design and build your Web proxy, give careful consideration to how you will debug and test it. For example, you may want to print out information about requests and responses received, processed, forwarded, redirected, or altered. Once you become confident with the basic operation of your Web proxy, you can toggle off the verbose debugging output. If you are testing on your home network, you can also use tools like WireShark or tcpdump to collect network packet traces. By studying the HTTP messages and TCP/IP packets going to and from your proxy, you should be able to figure out what is wrong, or convince yourself (and others) that it is working properly.

When you are finished, please submit your solution in electronic form to your TA. Your submission should include the source code for your Web proxy, a brief user manual describing how to compile and use your proxy, and a description of the testing done with your proxy. Please remember that assignments are to be done individually, and submitted to your assigned TA on time. You should also plan to give a brief demo of your proxy to your TA in early February, either just before or just after your submission.

Testing

The primary test of correctness for your proxy will be a simple visual test. That is, for even-sized Web pages, such as this even-sized HTML test page, or this even-sized HTML test page with an even-sized embedded image, the content displayed by your Web browser should look the same regardless of whether you are using your Web proxy or retrieving content directly from the Web server. The only differences appear when you try to access Web objects that are odd in size, such as this odd-sized HTML test page, or this even-sized HTML page with an odd-sized embedded image. For the first case, you should get some sort of error message page, while for the latter, you could render it either by suppressing the image or replacing the image with an even-sized one that is less odd, or has a suitable warning message. Once you have these simple cases working, you can be ambitious and try your proxy on real-world Web pages. Have fun!!

Grading Rubric

The grading scheme for the assignment is as follows:

Up to 4 bonus marks will be awarded for proper design of a non-blocking (i.e., multi-threaded or multi-process) proxy that can handle complicated Web sites with ease.

Tips