CPSC 441: Computer Communications

Professor Carey Williamson

Winter 2013

Assignment 1: Subliminal Web Proxy (24 marks)

Due: Tuesday, February 5, 2013 (11:59pm)

The purpose of this assignment is to learn about the World Wide Web and the HyperText Transfer Protocol (HTTP). Along the way, you will also learn a bit about client-server socket programming, Web proxies, network debugging, and more.

With the arrival of January 1, many of the people that you know and love have made New Year's resolutions to improve themselves during 2013, with dieting and weight loss among the most prevalent of these resolutions. You, however, have concocted a sinister plan to thwart their self-improvement efforts. You will do so by designing and configuring a Web proxy that will alter their Web browsing experience. In particular, it will bombard them with subliminal messages that make them hungry, and crave fast food. How, you ask? Read on.

A Web proxy is a software entity that functions as an intermediary between a Web client (browser) and a Web server. The Web proxy intercepts Web requests from clients and reformulates the requests for transmission to a Web server. When a response is received from the Web server, the proxy sends the response back to the client. From the server's point of view, the proxy is the client. Similarly, from the client's point of view, the proxy is the server. A Web proxy thus provides a single point of control to regulate Internet access between clients and servers. This is also a natural point to implement content manipulation, if so desired.

In this assignment, you will implement and test a simple Web proxy in either C or C++. The goals of the assignment are to build a properly functioning Web proxy for simple Web pages, and then to apply some simple content manipulation techniques to alter some of the content being displayed. Your evil plan involves two particular content alteration techniques. First, you are to replace any occurrences of the word "Happy" in the textual content of a Web page with the corresponding word "Yummy". For example, a Web page that contains the wording "Happy New Year!" would have its text rendered as "Yummy New Year!". Second, your proxy should replace any embedded JPEG images (i.e., .jpg) with some other suitable food-related images (e.g., hamburger, hot dog, fast-food corporate logo).

The most important HTTP command for your Web proxy to handle is the "GET" request, which specifies the URL for an object to be retrieved. In the basic operation of your proxy, it should be able to parse, understand, and forward to the Web server a (possibly modified) version of the client request. Similarly, the proxy should be able to parse, understand, and return to the client a (possibly modified) version of the response that the Web server provided to the proxy. Your proxy should be able to handle response codes such as 200 (OK) and 404 (Not Found) correctly, notifying the client as appropriate. HTTP request redirection (302) should also be supported. Reasonable handling of browser-initiated Conditional GET requests and 304 (Not Modified) responses is also desirable, but not mandatory.

You will need at least one TCP (stream) socket for client-proxy communication, and at least one additional TCP (stream) socket for proxy-server communication. If you want your proxy to support multiple concurrent HTTP transactions (recommended), you will need to fork child processes (or threads) for request handling as well. Each child process or thread will use its own socket instances for its communications with the client and with the server.

You should be able to compile and run your Web proxy on any department machine, or even your home machine. You should be able to use your proxy from a modern Web browser (e.g., Internet Explorer, Mozilla Firefox, Chrome, Safari), and from any machine (either on campus or at home). To test the proxy, you will have to configure your Web browser to use your specific Web proxy (e.g., look for menu selections like Tools, Internet Options, Proxies).

As you design and build your Web proxy, give careful consideration to how you will debug and test it. For example, you may want to print out information about requests and responses received, processed, forwarded, redirected, or altered. Once you become confident with the basic operation of your Web proxy, you can toggle off the verbose debugging output. If you are testing on your home network, you can also use tools like wireshark or tcpdump to collect network packet traces. By studying the HTTP/TCP packets going to and from your proxy, you can convince yourself (and perhaps your TA) that it is working properly.

The primary test of correctness for your proxy is a simple visual test. That is, for most Web pages, the content displayed by your Web browser should look the same regardless of whether you are using your Web proxy or retrieving content directly from the Web server. The only differences appear when you try to access Web pages that contain the special word "Happy" (like this simple text test file) or embedded JPEG images (like this simple image test file). In this case, the subliminal messaging occurs. Woot!

When you are finished, please email your solution in electronic form to your TA, attaching a single gzipped tar file. Your submission should include the source code for your Web proxy, a brief user manual describing how to compile and use your proxy, and a description of the testing done with your proxy.

The proposed grading scheme for the assignment is as follows:

Please remember that assignments are to be done individually, and submitted to your assigned TA on time. Have fun!

TIPS

If you have never done socket programming in C before, you should make sure to get to your tutorial on January 21.

Focus on the basic HTTP proxy functionality first, by simply forwarding everything that you receive from the client directly to the server, and everything you receive from the server directly back to the client. Then add more functionality, such as text parsing, text replacement, and HTTP redirection.

BIG HINT: Your proxy will need one socket for talking to the client, and another socket for talking to the server. Make sure to keep track of which one is which!

Additional Hint: Your proxy will actually need to dynamically create a socket for every new server that it talks to. Make sure to manage these properly.

Small Hint: Start with very simple HTML files, such as the simple text test file and the simple image test file. Once you have these working, then you can try more complicated Web pages with lots of embedded objects.