Report on CNS 2000 programming project (C WWW browser/server)

Note: This is the original report of the programming project at University that this software emerged from. Note that the Tcl/Tk part is not included and that the link to its documentation also doesn't work. Please also note that Liuben Tonev's email is not current, I'm afraid I do not have a current one. Even though I praise it below, I switched from emacs which I used years ago to VIM ;-)

Group 18, Liuben Tonev and Christian Gosch

This report consists of two parts. The first is the one you are looking at now, the second one is the code documentation and how-to-use manual (quick start and how to compile) in server/doc/html and client/doc/html directories.
For understanding the code, we recommend reading both parts.
If you want, go on reading about the browser or the server directly.

Getting it running quickly

Client / Browser

Server

  • Just follow the compile instructions in the code documentation
  • Work contributions from each group member

    Group members were Liuben Tonev and Christian Gosch.

    Liuben Tonev

    Christian Gosch

    Code Documentation and User Manuals

    Used Software

    1. Operating systems: Linux 2.2.13 and Solaris
      The C part of the software was written entirely under Linux.
    2. GNU C compiler (gcc and egcs-2.91.66)
    3. GNU make 3.77
    4. GNU autoconf 2.13 / autoheader
    5. flex 2.5.4 lexical analyser generator
    6. Doxygen 0.49-991003 by Dimitri van Heesch
    7. GNU Emacs 20.4.1 (there can be only one)
    8. Tcl/Tk 8.2

    Browser

    Browser diagram
    The browser consists of two parts: The GUI written in Tcl/Tk and the actual client program written in C. The GUI takes commands from the user and uses the client program to retrieve html files from the URI given by the user. The advanced version downloads all images referenced in the html file and corrects the downloaded html file, in the first version using the 'correct' program. Correct is basically a lexical analyser generated by flex.
    Correct is no longer used, this work is now done by the image analyser in client/src/img.lex.
    In the following description, all functions can be looked up in the code documentation in doc/html.

    Tcl browser documentation and user manual

    The client program

    Basic version

    The main function (main.c) reads the command line parameters and hands them over to w3ClientDocToFile(). This function takes the URI, a filename of the file to write the received data to, and a port number (for details, refer to the code documentation in doc/html). w3ClientDocToFile() uses w3ParseURI() to get a w3URI structure out of the URI string. Then, w3ClientGetDocument() is used to retrieve the document from the server. There are facilities to retrieve documents multi-threaded, but, since the GUI calls a client program for any requested document, these facilities are not used. w3ClientGetDocument() uses the lower level functions w3SendData() and w3ReceiveData() to request and receive the document. The connection is established using w3ClientConnect() and deleted by w3ClientDisconnect(). These lower level functions use the known C socket interface to the network.

    Standard version

    Changes were made only to the Tcl code.

    Advanced version

    The main change to the previous version is the handling of downloaded files. A page is not simply stored but also analysed to find image tags and automatically download the images. The images are stored in a simple cache structure.
    Directly downloaded files are stored in a cache directory (standard is cache) and image files subsequently downloaded are stored in subdirectories of cache which resemble the URI path of the images.
    For example, www.animals.org/dog/cat/mouse.html is stored in the given destination file (still part of the command line to client) AND in the cache subdirectory (which is specified in client/src/w3clientconfig.h). The downloaded file is scanned for image tags and the corresponding images are downloaded and stored in the cache directory, in this example that could be
    cache/www.animals.org/pictures/dog.gif
    The directly downloaded files are stored flatly in cache/, e.g. as
    cache/www.animals.org_index.html
    The slashes are replaced by underscores. This CAN lead to name clashes but that is neglected for this program. In a release version, another mechanism would have to be used; since this program is free and works fine for me, I am happy with it. It is, however, easy to change this behaviour when needed.
    This behaviour can be switched off by defining or undefining W3CLIENT_CREATE_CACHE. If this variable is undefined, only the downloaded images are stored in the way described above.
    All this caching is done by w3ClientDocToFile() and the related functions in w3client.c.

    Handling of HREF Tags in Downloaded HTML Documents in the Advanced Version

    The file is scanned for HREF tags and relative tags (like href="otherfile.html"...) are replaced by absolute ones (like href="http://www.server.com/otherfile.html"...).
    This is done to enable the user interface to make use of hyperlinks which would otherwise be handles as local files.

    The Analyser Code

    The replacement of img and href tags is done by an analyser which is created using flex. The source code for flex can be found in advanced/client/src/img.lex.
    In case it becomes necessary to change and rebuild the analyser, call
    flex -i img.lex
    in the source directory. Of course you need the flex package to do this.

    Sending Mails With client

    The advanced version of client is capable of sending emails through an SMTP server. The code added for sending emails can be found in w3smtp.c.
    The functions used to send mails are w3SendMail() and w3SendMailFile(). The latter is used by the main() client function to send mails. w3SendMailFile() reads the needed data from a file and calls w3SendMail().
    The file given to the client program MUST have the following structure:


    First line: Recipient
    Second line: Sender
    Third line: mailer
    Fourth line: port number on which the mailer accepts connections
    The following lines contain the mail body, including ALL headers the sender wants to be sent (including sender and recipient headers).


    This description can also be found in the code documentation for w3SendMailFile().

    Server

    Server diagram

    Standard version

    The server operates in multiple threads. The current version runs six accepting threads in parallel, but this number can be changed at will in the file w3server.c in function w3StartServer() simply by changing the loop which starts the threads (how to do this is obvious and needs no further description).
    The main () function calls the function w3StartServer() which sets up everything to run the server:
    All send and receive operations are wrapped by convenience functions, which are explained in the code documentation.

    Advanced version

    Standard and advanced version are the same program except for the ability of the advanced version to execute files on the server machine. No changes were needed in order to make the server to send any file requested (other than .html).

    Executing binaries on the serverside

    Thinking of CGI, the implementation of something similar was started (but not yet finished).
    The server is capable of executing files in the binary directory, which is defined in w3server.h as HTDOC_BIN_BASE. When a file in this directory is requested, it is instead executed and the standard output of that executable (be it a binary or a shell script) is sent to the client.
    If you set the default HTDOCS directory to the htdocs directory delivered with this package, and you are getting this documentation from our server, this should give an example.
    Program arguments are yet to be implemented (using the "prgname?arg1=..." syntax). All the server can do with this feature yet is executing files without arguments.
    See also the code documentation for the advanced server, function w3ServerSendFile().

    Error handling

    The server handles errors in client requests by responding with an error page. The code for the creation of the error pages can be found in w3html.c (w3HTMLGenerateError()).
    An example error page can be seen here.

    Problems encountered (C-part and Tcl/Tk-part)