Introduction to Data Structures CS106 Spring 2013

### Lab 3: Efficient Representation of Sets

Check out the sets_sorted project files for Lab 3. This contains an additional file, sets_no_dups_key.py which is the answer key for Lab 2, with information about the complexities of the operations. In the file set.py you will be modifying class Set again, this time creating a set representation based on a sorted vector (Python list) that still contains no duplicates. The very idea of sorting depends on being able to have a consistent ordering of elements, so of course this representation will limit uses of the set class to only sets of elements that can be compared with < and > (as well as ==, which was of course needed for the previous representation, and the very idea of a set). While we usually try to ensure that changes to representation don't affect the use of a type, you are allowed to violate this rule in this way for this lab. (You are invited to think about how you would create a set type that checked, when each item is added, to see whether or not it is compatible with the types of items in the sorted array, and if not, simply revert to the unsorted representation for that particular let this is, however, beyond what you should actually try to program in this course).

The specific requirements for this lab are:

• Make sure that the vector of elements used to represent any set is kept in a sorted order (e.g., lowest-to-highest).
• You should add a representation invariant, giving it a name like __rep_inv__. Since this is not part of standard Python, you must also use it in all the appropriate places. Since this will help you catch errors in your other functions, you should probably do this near the start in fact, you can do it first, and then confirm that it fails by building a set for which the old algorithms would produce an vector that's not in order.
• You should provide all the functions that were required by the previous lab, including equality testing (==) and an abstraction function (__repr__).
• You may only re-use the old algorithms in method bodies if they are still appropriate for the new representation, i.e., if they really are the best way to get the given answer when using a sorted vector as the representation.
• While it is typically appropriate to use library functions whenever possible, for this lab you should not use the built-in Python sort function.
• You should ensure that the complexity comments reflect the actual complexities of the algorithms you use.
• Except for requiring that all inserted items be comparable with < and >, the axioms defining sets are, of course, unchanged. If you add new operations, you should provide axioms for them.
• Thanks to the above point, none of your modifications should break any correct program using the old class for sets containing only integers or only strings.

When these are done, add a README.txt file discussing the advantages of the new representation, from the point of a user (when would the new version work better? when (if ever) would the old one have been preferable?), and a README-DESIGN.txt file to serve as a guide for other programmers working on your class. Then use Team->Commit to hand in your work (and, of course, you can also use commit to make backups also at each significant milestone, e.g., at the end of each bullet item above).