Your assignment is to construct a small, command line web browser. The program is to retrieve an HTTP URL and store the URL body in a local file. The program must accept as input a command line argument that specifies the HTTP URL to retrieve. Informative error messages should be displayed for HTTP errors. Upon successful retrieval the program must display the number of bytes downloaded and the HTTP headers returned by the server. The program should then parse the headers and display as much information about the content as possible. The program should work for any arbitrary file length.
The HTTP URL will be of the form:
http://machinename[:portnum]/pathname.
The part in [] is optional. The machinename
part may be a hostname or a numeric dotted decimal value, such as
157.182.194.28. The pathname portion may be any pathname.
Invalid URLs that do not match this description should generate
some informative error message and not be processed. The following
example should retrieve the SRL.gif file and display a message about it.
An example output from the program is also given.
$ webbrowse http://naur.csee.wvu.edu/~tmont/SRL.gif
Connected to 157.182.194.28:80, asking for /~tmont/SRL.gif
Retrieving SRL.gif to local directory:
Headers from server:
Date: Tue, 18 Jan 2000 14:37:20 GMT
Server: Apache/1.3.4 (Unix) PHP/3.0.10
Accept-Ranges: bytes
Connection: close
Content Information:
Length: 21209 bytes
Type: Image (GIF)
Encoding: none
Last-Modified: Mon, 15 Jul 1996 20:41:39 GMT
HTTP 1.0 Specification at http://www.w3.org/pub/WWW/Protocols/HTTP/1.0/spec.html
For extra credit, make the program support starting programs to
display the content upon completion [3%]. Text files should start
cat. HTML files should start lynx, etc.
If the program comes across a file type it does not know, it should
ask the user for the program to execute.
And/Or add support for parsing HTML files to list the URL links
embedded in the file [4%].