CS190 Refactoring Assignment

For Task 1, simply provide a new header file with an explanation of what you changed, and why. For each of the other designs described below (in tasks 2-5), and answer the following questions:

Levelize (assign level numbers to) this design. For this, you may print out the diagrams below and write on them.
How testable is this design? Is there a clear order in which to build components? Is there anything which might limit testability unnecessarily?
Use some refactoring techniques from Lakos to improve the design. Draw (by hand, regardless of what the TA writing this assignment may have done) a new component diagram. Please do not print the diagrams below and draw indications of what has changed - provide a completely new diagram.
Explain what problem(s) you saw, which Lakos technique(s) you used, and why. If you're not sure you used one of Lakos's techniques exactly, say which you believe is the closest, and what is the same and different between that approach and what you did.
Levelize (assign level numbers to) the new component diagram.

Please make a directory named by your login in the handin directory, and hand in electronic versions of your Task 1 header file, and the testability and refactoring descriptions for the other tasks. Bring hardocopies of these things, as well as the levelized component diagrams (before and after) to class.

Designs:

The designs below are each designs for an actual piece of software, either open source, or in use within the department. Some of the designs are somewhat simplified from a few of the intricacies of their actual environment (for example, the pools framework required reworking portions of existing code to work with the new framework). Some are also presented with slightly more naive designs than the systems actually use, since if we gave you the real designs, it might be much harder to find anything to fix. In some cases, the diagrams indicate the way the code is laid out in files, but not the way it is logically organized. If you like, you can go look at said source, but don't expect it to help you much with the design questions - they're all fairly sizeable code bases, and may not be written in programming languages you know. Arrows indicate functional dependencies.

Task 1: Independent Testability

For this problem, you need not provide a diagram; provide a modified .h file. This header file, as-is, makes the RobotMaid class difficult to test without a full implementation of the MaidController and MotherBrain. Explain why, and use a method from Lakos to make these modules more independently testable. (Virtual) Bonus points if you can also explain why this change also makes the code a little bit faster.

class RobotMaid {
    MaidController controller;
    MotherBrain mother;
    // ...
    public:
        RobotMaid(MaidController control, MotherBrain brain);
	int getControllerID();  // extracts info from the Maid's controller
	                        // and returns it.  Usually called by a
				// MotherBrain.
	// ...
};

Task 2: The Department's Grade-tracking Software: Eatgrades:

Source in the department: /contrib/projects/evalpig/current/

Evalpig is the department's home-grown grade-tracking solution, which has grown by accretion over the past decade. It has two main user-facing components, the eatgrades program and the report program.

The eatgrades program manages a course grade database. It allows users (TAs) to:

add and remove students
create, alter, and delete assignments
make a distribution of students to TAs for an individual assignment, so each TA grades a roughly equal number of assignments, taking into account a blacklist (for each TA, there is a list of students that TA should not grade for whatever reason) in a course configuration file

It can run in two modes: as an interactive shell, or it can be given a file of commands to perform.

The report program pulls data out of the course database, and performs various useful tasks including:

list assignments, students, etc.
print a report of all of a student's grades, or all grades for some assignment
print a histogram or various other statistics about grades for a given assignment
send grade reports of all recorded grades to each student
mail a summary of grades (statistics and histogram) to a class mailing list

Both report and eatgrades depend on a configuration file (actually a very simple Python module) in a specific location for each course, and dictate a particular layout for the course directory (ever wonder why almost every course's course directory has handin and admin subdirectories?). This configuration file specifies the location of the database itself, the set of TAs, and the blacklist for each TA.

Refactoring Notes: You cannot move the Configuration module, the per-course configuration module (think of this like a preferences file in your home directory, but for a course). At a minimum, the database location must be stored there.

The OpenSolaris Resource Pools Framework:

Kernel portion explorable from: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/pool.c
Userland portion explorable from: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libpool/common/

The pools framework is a subsystem in OpenSolaris which is used to group portions of various resources together into "pools" of resources to provide availability guarantees to processes (essentially operating systems parlance for programs). This is useful for doing things like guaranteeing that MySQL will always have 4 cores to itself on your big 32-core machine. The framework can handle grouping CPUs, portions of RAM, or other things, but we'll only consider creating groups of CPUs for now.

The framework is split into two levels. There is the part inside the kernel, which directly manages representations of processes, pools, and related things. There is also a significant portion outside/above the kernel, consisting of a few command line utilities and a shared library used for both managing stored XML configurations (which say what pools exist, which processors are in which pools, etc.), and the current state of the machine (by talking to the kernel).

All this split really means for you is that there are essentially two sub-designs, which you may consider independently. There is an interface between the two parts, greyed out in the diagram, which you may not modify - all it does is translate requests from the library into a format the kernel can understand, and then translate responses back. For those interested, there is a fake device driver for a fake device called /dev/pool, which is how the shared library talks to the kernel part of the system. All this driver does is basically act as a kernel/userland translator (moving data into and out of kernels is very complex; for a good explanation, take CSCI1670/90).

Task 3: The Userland Pools Design

The userland portion of the system is composed of a number of command line utilities which all depend on a library, libpool. There are two main commands:

pooladm: Manages pool configurations of what pools exist, and what processors are in each pool (for example, processors 1-4 might be in the default pool, and all the others might be in the MySQL pool). Configurations can either be stored as XML in files, or can be the setup currently being used by the system. To manage XML files, this component calls into libpool requesting that it perform an action on an XML file. To manage the current setup, it calls into libpool requesting that the library forward this request to the kernel.
poolbind: Takes some identifier for a pool, and some way of identifying one of more processes (for example, a process ID number). It then binds the specified process to the specified pool, in the running configuration or in an XML file.

If you want to look at their documentation, you can use /course/cs167/bin/sunman to read their man pages. It is important to note that there is a distinct separation between the handling of the active configuration, and the handling of stored XML configurations.

Refactoring Notes: Assume that libxml, a generic XML library, already exists, and works properly. How might you change the userland design such that more people could work on it, and more pieces of functionality might be tested more independently?

The diagram for this task is below the next task, as the two are separate portions of the same system.

Task 4: The Kernel Pools Design

Inside the kernel is where real pools are handled. There is a main control module, which accepts requests from the libpool library on behalf of the commands. It is responsible for carrying out tasks such as creating new pools, deleting pools, creating new groups of processors (separate from pools - remember, this system really handles other groups as well, but we're simplifying), etc. Other than creating partitions for various pools to manage, the pools control module does not use the partitions directly. Its primary role is to act as a dispatcher of user requests to to pools.

There is also a pools structure. The pools control module takes requests from userland, and turns them into requests on the appropriate pool for each request. The pool structure then takes this request and transforms it into the correct action on one of the resource partitions (for example, translating the request to add a CPU to that pool into a request to add a CPU to the CPU partition managed by that pool). Remember that even though the diagram only shows one resource type, in reality there would be a number of them, and equivalent calls from the pools structure to the appropriate resource, and from that resource to the process structure. The pool structure is also a dispatcher, from requests for actions on a certain pool to actions on a particular resource partition.

There is always a default pool, with at least one processor in it. This makes sure that newly launched programs are attached to a pool with a processor (it isn't very useful to have a program which can't be put on a processor!). Some of the cleanup on program exit differs depending on whether or not the process is in the default pool, so the process management code needs to be able to check if a process is in the default pool or not.

There is also support for managing simple groups of processors, putting processors in these groups, associating programs with these groups, etc. This is the CPU partition module (you should think of this as if there were other modules which supported partitions of memory, network bandwidth, etc.). This module includes an explicit representation of these CPU groups, and code for managing the representation.

There are two functions shown which call from a pool structure to some resource partition's support. add_cpu_to_cpu_part() moves the specified CPU into that CPU partition. There would be similar calls for portions of RAM, network bandwidth, etc. attach_process_to_cpu_part() tells the CPU partition module that it needs to attach some process to that partition. There would again be similar calls to tie processes to other resource partitions.

Refactoring Notes: Look at how the dependency flow. Remember that a box in a component diagram does not necessarily correspond to a single class.

Task 5: The MLkit Standard ML Compiler:

Source online: http://mlkit.svn.sourceforge.net/viewvc/mlkit/trunk/kit/src/

This is a compiler for a superset of Standard ML. We'll consider its general model, ignoring many of its more advanced features.

SML essentially takes a list of .sml files to compile. As one might expect of any compiler, it first parses the code into a set of structures to manipulate (the abstract syntax tree, a.k.a. AST). Following this, it type-checks the program, performs any of a number of optimizations, and generates object code. The compiler can target a number of hardware platforms, including X86, and HP's PA-RISC processor (okay, this one's been out of commission for a while). Programs written in SML can also make their own calls down to C code by defining functions in terms of the "prim" operator:
fun myMLdoubler (x : int) : int = prim ("myCdoubler", (x))
to interface with the C function:
int myCdoubler(int x) {...}
This is called a foreign function interface (talking to a foreign programming language). There is an FFI module which is responsible for providing type information to the type checker to ensure foreign functions are used correctly. Separately, it also generates the code for the actual platform-specific function calls themselves (i.e. it given an identifier for some hardware platform, it will return a unique template for making a foreign function call in assembly on that platform).

Refactoring Notes: Take a good look at the FFI module. Also, it's likely that no matter what you do, everything will depend on the AST (this is a compiler, after all).