« Back to the main CS 300 website

SRC Project: Software Preservation

Part 1 recommended completion time: April 6

All parts due April 15 at 6:00pm (EDT)


Introduction

Each of us produces immense amounts of digital data today, and we all use software in doing so. In particular, if you use a note-taking app, that app might store your notes in its own proprietary file format, and the app itself will assume that it runs on a current operating system (e.g., iOS, Android, macOS, or Windows) and widely-used hardware architecture (e.g., x86-64).

But what happens if you, old and wise, a few decades from now, decide to look at your old CS 300 notes to relive your college days? The app may be long since defunct, your laptop or smartphone from your college days may have moved on to greener pastures, and you’ll probably be using devices that run a different operating system on a different hardware architecture (e.g., ARM64 or one of its successors).

If your college notes aren’t exciting enough, consider NASA’s lost Apollo guidance software and their hunt for old processors on eBay: despite enabling some of humanity’s greatest technological achievements, NASA now faced difficulty maintaining their space shuttles with antiquated software.

The preservation of digital artifacts and the software needed to access them is already an acute challenge, and will only grow in importance as more critical infrastructure comes to depend on software. In this assignment, you’ll explore some of the motivations behind software preservation, and do a hands-on exploration of the challenges of software preservation.

Learning Objectives and Answer Expections

This project will help you:

Answer Expectations: A strong answer to the written questions in this assignment will be primarily characterized by a stringent argument, rather than length. Making such an argument will usually require writing between a few sentences and one or two short paragraphs.


Assignment installation

Ensure that your project repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s22-projects.git

Then run:

$ git pull
$ git pull handout master

This will merge our os-src folder with your repository.

Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be doing all your work for this project inside the os-src directory in the working copy of your projects repository.

Infrastructure Help

Part 1: Reading Old Files

You, a budding digital preservationist, start browsing digital collections (as one does), and stumble upon an archive with an interesting file:

Task: Download the archive.

Eager to contribute your share to history, you take a peek at what it is… and are met with almost complete gibberish. Even though it’s a .DOC file, none of your word processing apps seem to work. What’s going on?

oh no!

The problem you’re facing is one all too common in the digital preservation world. Until recently, much of the focus has been on data preservation; but preserving data alone can often miss key components of the environment that are necessary to interpret the data meaningfully. Be it notes or 3D modeling, data formats are often integrated with the software used to create, view, and manage them, rendering them useless without the necessary software. That, combined with software’s increasing pervasiveness in our daily lives, the need to preserve it has become more pronounced, and digital preservationists have increased their focus on the preservation of software, not only as a tool to aid in data preservation, but also an important part of our modern cultural heritage.

After some research, you discover that the file is actually a WordPerfect file. Since the file doesn’t seem to be working with your current software, you deduce that it’s are from an earlier time: the DOS era (mid-1980s to mid-1990s).

Assignment

How do we run a program that was written for MS-DOS in the 1980s on today’s computers? First of all, we need a processor and hardware that can run the program. Fortunately, WordPerfect ran on x86 machines (amongst other architectures in use at the time), so as long as we can get a copy of a WordPerfect executable, we should be good, right? Not so fast.

The WordPerfect executable, like all programs, makes assumptions about the syscalls available and the kernel that runs underneath. So, we also need a DOS kernel and the ability to run it on modern hardware! In this instance, we’ll emulate a DOS kernel, rather than actually installing an ancient operating system on your computer.

Emulation is one of the most common methods of interacting with legacy software; to use WordPerfect and open our files, we’ll use an emulator to create a DOS environment, DOSBox. (For this, and all following installation steps, you should not use the course container.)

  1. Install DOSBox: follow the installation instructions for your operating system (Mac OS X, Windows).
    (If you want the full retro experience, press Alt-Enter (Opt-Enter on Mac) to enable fullscreen when running DOSBox.)
  2. Create a folder on your host computer (e.g. on Mac, ~/DOSBox), then start DOSBox. You’ll see a MS-DOS command prompt (Z:\>). Mount the folder you created as the C: drive with Z:\> MOUNT C <FOLDER> inside DOSBox (For instance, Z:\> MOUNT C ~/DOSBOX.) Note that this mounting functionality is something the emulator provides; it wasn’t available in the original DOS running on a physical computer.
  3. Now, any file within that folder will automatically appear in the DOSBox emulated environment. Switch to the C: drive with Z:\> C:.
  4. Locate and download a copy of WordPerfect 5.1 online from a digital archive[1]. Create a directory for the installation files, and move them into your mounted folder.
    Many copies of WordPerfect come as floppy disk images (.img files). The correct way to work with them is to mount each disk image as its own floppy disk into DOS; but fortunately there are easier options for installing WordPerfect.
    a. If you’re on macOS, you can open all of the .img files (mounting them as a drive in macOS) and copy the files contained into your mounted folder.
    b. If you’re on Windows, check out EdStem #1325.
  5. Additionally, move the cs300-files into the mounted folder.
  6. DOSBOX does not reflect changes to your mounted folder immediately in some situations. Restart DOSBOX at this point, mounting the folder again.
  7. Inside DOSBOX, navigate to your WordPerfect directory, and run INSTALL. This will start the WordPerfect installation process. Press y or <Enter> when prompted. You only need to install the core program, so you can say n to many of the questions about installing extra features (help documents, printer drivers, etc).
MacOS WordPerfect Setup

WordPerfect makes extensive use of the Ctrl Key, especially for Ctrl-Fn combinations; however, DOSBox rebinds the Ctrl-Fn shortcuts to special actions, such as Screenshot, Record Video/Audio, etc. These are helpful for DOSBox’s configurations, but unfortunately interfere with our programs. You will need to remap these special keys to a separate key sequence:

  1. Open the DOSBox Keymapper with Ctrl-F1 (or Ctrl-Shift-F1 / Ctrl-Cmd-F1).

  2. You should see a screen like this:

    In the bottom right section (ShutDown, Mapper, etc.), remap these Special Keys to a different configuration. We suggest enabling both the Mod1 and Mod2 keys; do this by clicking on a key, then enabling both keys in the bottom left corner:

    By default, Mod1 and Mod2 map to the Ctrl and Alt (Option) keys respectively; so, in order to activate the Keymapper/Screenshot/etc, you’ll now need to press Ctrl-Alt-Fn, rather than just Ctrl-Fn.

  3. Save and exit out of the Keymapper; your Ctrl key should behave normally now.

If your Ctrl-Fn still doesn’t work or behaves strangely (e.g. Ctrl instead functions as Alt), consider mapping another key to Ctrl; for example, to have your Cmd key function as Ctrl, click on the Ctrl key, press Add (center bottom), then press Cmd.

Task: WordPerfect should now be installed.

  1. Copy CS300REF.DOC into your WordPerfect directory, WP51 (your file names may be mangled as a result of mounting; you can rename them with REN <OLD NAME> <NEW NAME>). Run WordPerfect on a file by typing WP <File> at the C:\WP51> prompt.
  2. Familiarize yourself with WordPerfect’s interface. Then, scroll down to the bottom of the CS300REF.DOC document, and complete the tasks listed there.

Instructions to submit are inside the document. You should copy any files produced into the os-src directory of your CS 300 projects repo. Enjoy!

Reflection

Wahoo! We did it; the old documents were saved from disappearing forever into the void, and you now have a working DOS environment to experiment with more legacy software. (If you’re interested, websites like DOSGames and WinWorld contain numerous DOS-era software and games; try one out, and experience what technology felt like in the 90s[2].) Now, if someone discovers a stash of George R. R. Martin’s unfinished novels, you’ll be adequately equipped to handle his WordStar files[3].

The process you went through, albeit somewhat simplified, is a real problem digital archivists face every day when handling decades-old data formats. In the next part of the assignment, as we explore additional challenges to and alternative methods of software preservation, keep in mind some of the difficulties you faced while setting up preservation software and interacting with legacy applications.

Part 2: Hardware Emulation

In Part 1, we explored a solely software emulator, DOSBox. While DOSBox provides a DOS kernel emulation on an x86 machine, it is certainly possible to boot up DOS on modern machines, as DOS environments (e.g., MS-DOS or IBM PC DOS) are compatible with today’s Intel x86 CPUs.

However, it is not always so straightforward. For example, Library of Congress archivists who worked to preserve digital data of nobel laureate Nina Federoff faced some serious challenges: her data was created with the MacDraw Plus and Hypercard programs, which require an Apple Mac OS 9 environment. Mac computers prior to Mac OS 10 used the PowerPC instruction set, and software for them is incompatible with modern x86 or ARM64 computers. Indeed, with new hardware like Apple’s M1 chips and the accompanying transition to the ARM64 architecture, it is entirely possible that in a few years or decades, software applications written for our current x86 architectures will be incompatible with the hardware of the day. Apple M1 Mac users are already acutely aware of this issue[4].

Of course, one option to access applications for specific architectures is to purchase a physical retro-computer; for instance, old PowerPC Macintoshes can be found on eBay. However, this is clearly neither durable nor feasible; physical hardware eventually breaks, and rare retro-hardware may be difficult for digital archivists to access—even NASA has had difficulty. What if, like software, we could achieve some sort of emulation, but instead of different hardware components (i.e., different CPU architectures)?

Enter hardware emulation. In this part, we’ll explore one such example: Atari 2600! Released in 1977, the Atari 2600 quickly dominated the market, becoming synonymous with video games and sparking the growth of the entire industry; following its decline, its games have become favorites in retro gaming communities (and have even found use in a rather surprising modern application: deep reinforcement learning).

Assignment

The Atari 2600 operating system used the MOS Technology 6502 instruction set, a long-since defunct CPU; thus, we’ll use the Stella emulator.

Apple M1/ARM64 notes

The Stella emulator for this part of the assignment is not officially compatible with M1 machines, and the download page says that it is “Intel only”.

However, our testing on M1 devices suggests that it works just fine. The reason for this is that Apple built an x86-64 emulation mode into the hardware of M1 processors (“Rosetta 2”). M1 processors can dynamically translate the machine code of x86-64 executables into ARM64 machine code as they run them (though this does come with some slowdown). So, if you have an M1, you’re running one emulator (Rosetta 2) to run another emulator (Stella), and there are no fewer than three architectures involved: ARM64 (hardware), x86-64 (emulated by Rosetta 2), and MOS-6502 (emulated by Stella)!

Task:

  1. Download the emulator for your operating system.
  2. Acquire some ROMs for Atari 2600 systems (Stella provides some guidelines here).
  3. Choose any game, and play it for a bit.

Now answer the following question:

Q1: In the README.md file in os-src in your project repository, describe your experience with the game. Here are some guiding questions (you don’t need to follow them strictly, but do demonstrate that you’ve explored a game):

Hardware emulation provides a complete infrastructure for digital preservation, but it is also technically difficult to execute: all technical specifications and digital logic, down to precise clock cycles and analog elements, must be recreated by a hardware emulator. Moreover, hardware emulation can be quite fragile.

Task: Answer the following question (again, in os-src/README.md).

Q2: What assumptions do hardware emulators themselves make about hardware? If new hardware architectures come along, how should emulators adapt?

Answer Expectations: A strong answer to this question (and other following written questions in this assignment) will be primarily characterized by a stringent argument, rather than length. Making such an argument will usually require writing between a few sentences and one or two short paragraphs.

Part 3: Social Context

Society has plenty of experience with preserving valuable historical artifacts outside of the digital realm: whether art, architecture, film, or archaeological conservation, the importance and process of preserving our cultural heritage is well established. But preservation in the digital context is less well understood.

Task: First, familiarize yourself with approaches to preserving our cultural heritage.

  1. Read this article about ethical considerations surrounding preservation; then, read this case study about modern historical preservation techniques, priorities, and cultural and financial shortcomings.

Next, consider the efforts and cultural implications in the digital domain.

  1. Read pages 12-22 of this Library of Congress report (An Executable Past: The Case for a National Software Registry). As you’re reading, consider the justifications for and approaches to software preservation, and compare them with your notions of traditional preservation.

In the assignment, you explored two common ways emulation assists software preservation; while not the only option, emulation has seen the most success as a preservation tool. As interest grows and we continue to improve our toolbox, our ability to tackle technical challenges as a software preservation community increases. However, technical difficulties are just one aspect to consider when dealing with preservation.

Representation

As we reckon with our digital legacy, it’s important to consider what we choose to represent.

Task: Answer the following questions (again, in os-src/README.md).

Q3: Take a look at who made DOSBox and Stella. Why did the developers make the emulation software? What was their motivation for doing this work?

Q4: How should software preservation compare to other forms of preservation that society already engages in today? What standards should we apply, and how do they compare to the standards used for other types of preservation? For example, you could consider the preservation of art or architecture in your answer.

Q5: Who should get to decide what digital content is preserved?

Legality

Another challenge in software preservation is legality. If you look into it, WordPerfect and the Atari ROMs were commercial software at the time when they were produced; to use them, you must obtain a license and/or purchase the software. Is it okay to just make them freely available on the internet? Digital archivists who host such artifacts on the internet face legal uncertainties every day.

Archivists frequently operate with abandonware — software that, while technically still proprietary and protected by copyright, has been ignored by a potentially defunct manufacturer. Some manufacturers (if they’re still around) actively help abandonware sites, or at least tolerate them. However, this is not always the case; some manufacturers, like Microsoft or Nintendo, have pursued legal challenges against digital archives, which resulted in some major sites shutting down.

Task: Answer the following questions (again, in os-src/README.md).

Q6: WordPerfect 5.1 for DOS can cost hundreds of dollars, and Atari ROM images can be similarly expensive. Do you think you have an obligation to pay for legacy software or games?

Q7: Should digital libraries and preservationists receive legal protections, and what might they look like?

The open source software movement, which flourished in the 1980s and 1990s, advocates for permissive licenses that make copying and modifying software easy and free (legally and in cost terms); software like Git or the Linux operating system can be used and developed by anyone without constraints. One argument they make in favor of open-source software is that it is much easier to preserve and continuously maintain it than closed-source software, which depends on the original owner to take active steps to aid preservation, or to at least tolerate it (with the attendant legal risk for archivists).

Task: Answer the following questions (again, in os-src/README.md).

Q8: Do you think open-sourcing proprietary software solves the challenge of software preservation?

Q9: What challenges, if any, might persist even after releasing software under an open source license?

Tying it Together

So far, companies and government institutions have been rather uninvolved in software preservation efforts. As software proliferates and grows in complexity, and more of it moves to web services that have proprietary server-side code, it will become increasingly difficult for individuals and non-profit organizations to preserve digital artifacts on their own.

Task: For the remaining question, discuss your responses with another classmate, then write up some key points in your discussion. Please tell us in your answer who you discussed the questions with (again, in os-src/README.md).

Q10: What responsibility should the government or public bodies (e.g., libraries) have, and why? What role do companies have?

Here are some dimensions to consider (although feel free to explore different directions):

Final Remarks

Congratulations 🎉 you’ve completed the SRC assignment for the OS part of CS 300! We hope you’ve developed an appreciation of the importance of software preservation, as well as some common techniques and considerations, and how they relate to the technical operating systems and hardware topics we discuss in the course.

We’ve only scratched the surface of some of the immense challenges facing software preservation efforts right now. In Project 5, you will see that open source software, while it may help with preservation, can also introduce substantial risks into the software development process.

Handing In

Please hand in the files and answers for this assignment via Git in your cs300-s22-projects-YOURNAME repository. Put your answers into the README.md file in the os-src/ subdirectory of your project repository, and also put all other files from this assignment into that directory.

By 6:00pm on April 15th, you must have filled in the file README.md in the os-src directory in your projects repo, and pushed the files produced by Part 1 of this assignment.

Grading breakdown

This assignment is worth 3% of your total course grade (i.e., it does not contribute to your WeensyOS grade, but constitutes a separate grade). These 3% come out of the originally 16% for the midterm; the midterm will now be worth 13%.


This assignment was created for CS 300.


  1. If you can’t find it, post privately on Edstem and we’ll give a hint! ↩︎

  2. One application that we found particularly interesting was Sid Meier’s Civilization 1; even in the 90s, games were truly quite sophisticated! ↩︎

  3. WordStar was the first word processor that offered textual WYSIWYG functionality; it preceded WordPerfect, and dominated the market until WordPerfect eventually took over. ↩︎

  4. Indeed, our course Docker container originated in part to support the new M1 Macs; the old virtualization software, VirtualBox, worked only on x86 machines, and was incompatible with ARM. ↩︎