Data Structures (cs206) Haverford College

Lab 4: HTML Tag Checker

HTML tags generally balance --- in other words, after the starting tag for an unordered list (i.e, a < UL >), there must be an ending tag (i.e, < /UL >). Furthermore, sets of balanced tags should be properly nested --- if an ordered list is started inside an unordered list, it should also end before the unordered list ends.

In this lab, you will determine whether or not tags are properly balanced and nested. You may assume that all tags are simple --- they do not include attributes or any other information inside the tag definition (so, unfortunately, you will not be able to handle links). You may also create a set of unmatched tags (such as < p > and < li > for this document), and assume that these will never be matched, whereas all other tags must be matched.

Your function should return True iff the tags are properly balanced and nested. For files with errors, you should also print an error message. The starter files for the lab contain functions to read in an html file and split it into "tokens" (individual words or tags).

Check out the lab "HTML_validator" -- you will have modules for a set object, as well as a stack and a queue. Review these classes and their interface methods as you will need them to complete the lab.

In addition, add to the file README.txt to provide instructions for a user who wished to run your program, and add to the file README-DESIGN.txt to provide instructions for other programmers who wish to understand how your program works --- this document, together with your comments in the code, should provide a clear description of how your program works.

For extra credit, consider improving the range of the checker to remove limits, such as:

  1. extend the set of unmatched attributes
  2. produce more than one error message if there is more than one error in the file
  3. allow tags that are sometimes, but not always, matched, such as < p >
  4. consider attributes like COLOR in the FONT tag or HREF in an A tag.