CS190 Refactoring Assignment
For Task 1, simply provide a new header file with an explanation of what you
changed, and why.
For each of the other designs described below (in tasks 2-5), and answer the following questions:
-
Levelize (assign level numbers to) this design. For this, you may print out
the diagrams below and write on them.
-
How testable is this design? Is there a clear order in which to build
components? Is there anything which might limit testability unnecessarily?
-
Use some refactoring techniques from Lakos to improve the design. Draw
(by hand, regardless of what the TA writing this assignment may have done) a
new component diagram. Please do not print the diagrams below and draw
indications of what has changed - provide a completely new diagram.
-
Explain what problem(s) you saw, which Lakos technique(s) you used, and
why. If you're not sure you used one of Lakos's techniques exactly,
say which you believe is the closest, and what is the same and different
between that approach and what you did.
-
Levelize (assign level numbers to) the new component diagram.
Please make a directory named by your login in the handin directory, and hand in
electronic versions of your Task 1 header file, and the testability and
refactoring descriptions for the other tasks. Bring hardocopies of these
things, as well as the levelized component diagrams (before and after) to class.
Designs:
The designs below are each designs for an actual piece of software, either open
source, or in use within the department. Some of the designs are somewhat
simplified from a few of the intricacies of their actual environment (for
example, the pools framework required reworking portions of existing code to
work with the new framework). Some are also presented with slightly more
naive designs than the systems actually use, since if we gave you the real
designs, it might be much harder to find anything to fix. In some cases,
the diagrams indicate the way the code is laid out in files, but not the way it
is logically organized. If you like, you can go look at said source, but
don't expect it to help you much with the design questions - they're all fairly
sizeable code bases, and may not be written in programming languages you
know. Arrows indicate functional dependencies.
Task 1: Independent Testability
For this problem, you need not provide a diagram; provide a modified .h
file. This header file, as-is, makes the RobotMaid class difficult to test
without a full implementation of the MaidController and MotherBrain.
Explain why, and use a method from Lakos to make these modules more
independently testable. (Virtual) Bonus points if you can also explain why
this change also makes the code a little bit faster.
class RobotMaid {
MaidController controller;
MotherBrain mother;
// ...
public:
RobotMaid(MaidController control, MotherBrain brain);
int getControllerID(); // extracts info from the Maid's controller
// and returns it. Usually called by a
// MotherBrain.
// ...
};
Task 2: The Department's Grade-tracking Software: Eatgrades:
Source in the department: /contrib/projects/evalpig/current/
Evalpig is the department's home-grown grade-tracking solution, which has grown
by accretion over the past decade. It has two main user-facing components,
the eatgrades program and the report program.
The eatgrades program manages a course grade database. It allows users
(TAs) to:
-
add and remove students
-
create, alter, and delete assignments
-
make a distribution of students to TAs for an individual assignment, so each
TA grades a roughly equal number of assignments, taking into account a
blacklist (for each TA, there is a list of students that TA should not grade
for whatever reason) in a course configuration file
It can run in two modes: as an interactive shell, or it can be given a file of
commands to perform.
The report program pulls data out of the course database, and performs various
useful tasks including:
-
list assignments, students, etc.
-
print a report of all of a student's grades, or all grades for some
assignment
-
print a histogram or various other statistics about grades for a given assignment
-
send grade reports of all recorded grades to each student
-
mail a summary of grades (statistics and histogram) to a class mailing list
Both report and eatgrades depend on a configuration file (actually a very simple
Python module) in a specific location for each course, and dictate a particular
layout for the course directory (ever wonder why almost every course's course
directory has handin and admin subdirectories?). This configuration file
specifies the location of the database itself, the set of TAs, and the blacklist
for each TA.
Refactoring Notes: You cannot move the Configuration module, the per-course
configuration module (think of this like a preferences file in your home
directory, but for a course). At a minimum, the database location must be
stored there.

The OpenSolaris Resource Pools Framework:
Kernel portion explorable from:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/pool.c
Userland portion explorable from:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libpool/common/
The pools framework is a subsystem in OpenSolaris which is used to group
portions of various resources together into "pools" of resources to provide availability guarantees to
processes (essentially operating systems parlance for programs). This is useful
for doing things like guaranteeing that MySQL will always have 4 cores to itself
on your big 32-core machine. The framework can handle grouping CPUs, portions
of RAM, or other things, but we'll only consider creating groups of CPUs for
now.
The framework is split into two levels. There is the part inside the kernel,
which directly manages representations of processes, pools, and related things.
There is also a significant portion outside/above the kernel, consisting of a
few command line utilities and a shared library used for both managing stored
XML configurations (which say what pools exist, which processors are in which
pools, etc.), and the current state of the machine (by talking to the kernel).
All this split really means for you is that there are essentially two
sub-designs, which you may consider independently. There is an interface
between the two parts, greyed out in the diagram, which you may not modify - all
it does is translate requests from the library into a format the kernel can
understand, and then translate responses back. For those interested, there
is a fake device driver for a fake device called /dev/pool, which is how the
shared library talks to the kernel part of the system. All this driver does is
basically act as a kernel/userland translator (moving data into and out of kernels is very complex; for a good
explanation, take CSCI1670/90).
Task 3: The Userland Pools Design
The userland portion of the system is composed of a number of command line
utilities which all depend on a library, libpool. There are two main
commands:
-
pooladm: Manages pool configurations of what pools exist, and what
processors are in each pool (for example, processors 1-4 might be in the
default pool, and all the others might be in the MySQL pool).
Configurations can either be stored as XML in files, or can be the setup
currently being used by the system. To manage XML files, this component
calls into libpool requesting that it perform an action on an XML file. To
manage the current setup, it calls into libpool requesting that the library
forward this request to the kernel.
-
poolbind: Takes some identifier for a pool, and some way of identifying one
of more processes (for example, a process ID number).
It then binds the specified process to the specified pool,
in the running configuration or in an XML file.
If you want to look at their documentation, you can use /course/cs167/bin/sunman
to read their man pages. It is important to note that there is a distinct
separation between the handling of the active configuration, and the handling of
stored XML configurations.
Refactoring Notes: Assume that libxml, a generic XML library, already
exists, and works properly. How might you change the userland design such that
more people could work on it, and more pieces of functionality might be tested
more independently?
The diagram for this task is below the next task, as the two are separate
portions of the same system.
Task 4: The Kernel Pools Design
Inside the kernel is where real pools are handled. There is a main control
module, which accepts requests from the libpool library on behalf of the
commands. It is responsible for carrying out tasks such as creating new pools,
deleting pools, creating new groups of processors (separate from pools -
remember, this system really handles other groups as well, but we're
simplifying), etc. Other than creating partitions for various pools to manage,
the pools control module does not use the partitions directly. Its primary role
is to act as a dispatcher of user requests to to pools.
There is also a pools structure. The pools control module takes requests from
userland, and turns them into requests on the appropriate pool for each request.
The pool structure then takes this request and transforms it into the correct
action on one of the resource partitions (for example, translating the request to add
a CPU to that pool into a request to add a CPU to the CPU partition managed by
that pool). Remember that even though the diagram only shows one resource type,
in reality there would be a number of them, and equivalent calls from the pools
structure to the appropriate resource, and from that resource to the process
structure. The pool structure is also a dispatcher, from requests for actions
on a certain pool to actions on a particular resource partition.
There is always a default pool, with at least one processor in it. This makes
sure that newly launched programs are attached to a pool with a processor (it
isn't very useful to have a program which can't be put on a processor!). Some
of the cleanup on program exit differs depending on whether or not the process
is in the default pool, so the process management code needs to be able to check
if a process is in the default pool or not.
There is also support for managing simple groups of processors, putting
processors in these groups, associating programs with these groups, etc. This
is the CPU partition module (you should think of this as if there were other
modules which supported partitions of memory, network bandwidth, etc.). This
module includes an explicit representation of these CPU groups, and code for
managing the representation.
There are two functions shown which call from a pool structure to some
resource partition's support. add_cpu_to_cpu_part() moves the specified CPU
into that CPU partition. There would be similar calls for portions of RAM,
network bandwidth, etc. attach_process_to_cpu_part() tells the CPU partition
module that it needs to attach some process to that partition. There would
again be similar calls to tie processes to other resource partitions.
Refactoring Notes: Look at how the dependency flow. Remember that a box in a
component diagram does not necessarily correspond to a single class.

Task 5: The MLkit Standard ML Compiler:
Source online: http://mlkit.svn.sourceforge.net/viewvc/mlkit/trunk/kit/src/
This is a compiler for a superset of Standard ML. We'll consider its
general model, ignoring many of its more advanced features.
SML essentially takes a list of .sml files to compile. As one might expect
of any compiler, it first parses the code into a set of structures to manipulate
(the abstract syntax tree, a.k.a. AST). Following this, it type-checks the
program, performs any of a number of optimizations, and generates object
code. The compiler can target a number of hardware platforms, including X86, and
HP's PA-RISC processor (okay, this one's been out of commission for a while). Programs written in SML can also make
their own calls down to C code by defining functions in terms of the "prim"
operator:
fun myMLdoubler (x : int) : int = prim ("myCdoubler", (x))
to interface with the C function:
int myCdoubler(int x) {...}
This is called a foreign function interface (talking to a foreign programming
language). There is an FFI module which is responsible for providing type
information to the type checker to ensure foreign functions are used correctly.
Separately, it also generates the code for the actual platform-specific function calls
themselves (i.e. it given an identifier for some hardware platform, it will
return a unique template for making a foreign function call in assembly on that
platform).
Refactoring Notes: Take a good look at the FFI module. Also, it's likely
that no matter what you do, everything will depend on the AST (this is a
compiler, after all).
