What makes a program “good?”

We started the class by asking “What makes a program good?” Some answers students came up with:

readability
organization
performance

This semester, we’ll learn how to write programs that are “better” in all of these ways. This lecture introduces two of them: organization and performance.

Margaret Hamilton, Software Engineering, and the Moon

Margaret Hamilton was the Director of the Software Engineering Division at MIT’s Draper Laboratory in the ’60s. Among other things, she was in charge of the team that developed the in-flight software that ran on the Apollo 11 mission, which took humans to the Moon. She was also one of the inventors of the term “software engineering.” More on that in a second.

How many lines of code would you guess were in the Apollo in-flight system? The answer is about 145,000.

In CSCI 0111, you mostly wrote programs that were less than 100 lines of code or so (usually much less). When writing a program of that size, you might be able to hold the whole thing in your head at once–the whole thing might even fit on your screen! When you’re writing 145,000 lines of code, that’s probably not possible. The Apollo 11 team had to be able to think about particular pieces of code in isolation, without worrying about the details of other parts of the code. Software engineering is the study of building reliable software, and techniques for developing components in isolation are its bread and butter. An example of this kind of technique: functions! When you write a function that does some particular task, the code you write that calls that function doesn’t need to worry about the function’s body. We’ll be learning several other such techniques in this course.

By the way, here’s a chunk of the Apollo 11 code I grabbed at random (the source code is available here):

SETWO           TC      WOZERO          # GO SET WORD ORDER CODE TO ZERO.
        +1      CA      DNECADR         # RELOAD A WITH THE DNADR.
        +2      AD      MINB1314        # IS THIS A REGULAR DNADR?
                EXTEND
                BZMF    FETCH2WD        # YES.  (A MUST NEVER BE ZERO)
                AD      MINB12          # NO.  IS IT A POINTER (DNPTR) OR A
                EXTEND                  #       CHANNEL(DNCHAN)
                BZMF    DODNPTR         # IT'S A POINTER.  (A MUST NEVER BE ZERO)

DODNCHAN        TC      6               # (EXECUTED AS EXTEND)  IT'S A CHANNEL
                INDEX   DNECADR
                INDEX   0       -4000   # (EXECUTED AS READ)
                TS      L
                TC      6               # (EXECUTED AS EXTEND)
                INDEX   DNECADR
                INDEX   0       -4001   # (EXECUTED AS READ)
                TS      DNECADR         # SET DNECADR
                CA      NEGONE          #       TO MINUS
                XCH     DNECADR         #               WHILE PRESERVING A.
                TCF     DNTMEXIT        # GO SEND CHANNELS

What do we notice about this code? A couple of the things I noticed:

Almost every line of code has a comment explaining what it’s doing! (The comments are the bits after the # on each line)
Nevertheless, the code is pretty hard to read. It’s written in a very old, very low-level programming language called AGC assembly language.

Luckily, in this class you’ll be working in Python, not AGC assembly language. Python has built-in support for the kinds of software engineering techniques and concepts we’ll learn in the course (like functions!). I want to emphasize, however, that clean code with well-isolated components can be written in any programming language!

As an aside: Apollo 11 had 145,000 lines of code. Google has about 2 billion lines of code. Part of what makes that possible is that Google’s code is mostly written in languages like Python, not in AGC assembly language.

Program performance

Another topic we’ll visit and re-visit in CSCI 0112 is analyzing how fast our programs run. On one level, this is easy–just run your program while looking at your watch! That kind of performance analysis (sometimes called benchmarking) is useful, and it’s often an important step in testing a program. It does have its limitations, though. Programs run at different speeds on different computers (for instance, the computer that ran the Apollo 11 code above was about 60,000 times slower than the computer in my cell phone). Programs also take different amounts of time depending on the data they are processing–think about writing a program to find duplicate elements in a list and then running it on [1, 2, 3, 4] vs. running it on every word from every article on Wikipedia.

To see why this is hard, let’s take a look at these two (admittedly strange) Python programs:

def strange_function_one(l: list):
    total = 0
    for x in l:
        total += x
    for x in l:
        total += x
    for x in l:
        total += x
    for x in l:
        total += x
    for x in l:
        total += x
    for x in l:
        total += x
    for x in l:
        total += x
    for x in l:
        total += x

def strange_function_two(l: list):
    total = 0
    for x in l:
        for x in l:
            total += x
            total += x

Which of these is faster?

We can simulate the “look at your watch” approach in Python like this. The timeit.timeit function will print the time it takes to execute the code one million times:

>>> timeit('strange_function_one([1,2,3])', globals=globals())
0.9604734549999989
>>> timeit('strange_function_two([1,2,3])', globals=globals())
0.6956890860000158
>>> timeit('strange_function_one([1,2,3,4,5,6])', globals=globals())
1.632538684999986
>>> timeit('strange_function_two([1,2,3,4,5,6])', globals=globals())
2.024607646999982

It looks like the runtime depends on the data that are passed in!

Later in the class we’re going to develop a way to reason about the runtime of programs without having to look at our watches. We’ll learn that in a meaningful way, strange_function_one is faster than strange_program_two, and will learn how to describe this distinction.

Logistics

For now, the big thing is to check out the course website. There’s a lot of information there about course policies, plans for assignments, etc. If you have any questions, please email Doug.