Partner Choice Due Friday, April 7th at 11:59 PM (EST)
Part 1 Due Friday, April 14th at 6:00 PM (EST)
All Parts Due Friday, April 21th at 6:00pm (EST)
Each of us produces immense amounts of digital data today, and we all use software in doing so. In particular, if you use a note-taking app, that app might store your notes in its own proprietary file format, and the app itself will assume that it runs on a current operating system (e.g., iOS, Android, macOS, or Windows) and widely-used hardware architecture (e.g., x86-64).
But what happens if you, old and wise, a few decades from now, decide to look at your old CS 300 notes to relive your college days? The app may be long since defunct, your laptop or smartphone from your college days may have moved on to greener pastures, and you’ll probably be using devices that run a different operating system on a different hardware architecture (e.g., ARM64 or one of its successors).
If your college notes aren’t exciting enough, consider NASA’s lost Apollo guidance software and their hunt for old processors on eBay: despite enabling some of humanity’s greatest technological achievements, NASA faced difficulty maintaining their space shuttles with antiquated software.
The preservation of digital artifacts and the software needed to access them is already an acute challenge, and will only grow in importance as more critical infrastructure comes to depend on software. In this assignment, you’ll explore some of the motivations behind software preservation, and do a hands-on exploration of the challenges of software preservation.
You will choose a partner for the written portion of this assignment. Alternatively, if you would prefer to be randomly assigned a partner, we will pair you with a classmate. To help us determine groups, everyone should fill out this form by 11:59pm on Friday, March 24! Within the form, you will either write down your chosen partner’s CS login, or opt into random assignment.
This project will help you:
Ensure that your project repository has a
handout remote. Type:
$ git remote show handout
If this reports an error, run:
$ git remote add handout https://github.com/csci0300/cs300-s23-projects.git
$ git pull $ git pull handout main
This will merge our
timemachine folder with your repository.
Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be handing in your work for this project to the
timemachine directory in the working copy of your projects repository.
You, a budding digital preservationist, start browsing digital collections (as one does), and stumble upon an archive with an interesting file:
Task: Download the archive.
Eager to make your mark on history, you take a peek at what it is… and are met with almost complete gibberish. What’s going on?
The problem you’re facing is one all too common in the digital preservation world. Until recently, much of the focus has been on data preservation; but preserving data alone can often miss key components of the environment that are necessary to interpret the data meaningfully. Be it notes or 3D modeling, data formats are often integrated with the software used to create, view, and manage them, rendering them useless without the necessary software. Digital preservationists have increased their focus on software to address this problem.
After some research, you discover that the file is actually a WordPerfect file. Since the file doesn’t seem to be working with your current software, you deduce that it’s from an earlier time: the DOS era (mid-1980s to mid-1990s).
How do we run a program that was written for MS-DOS in the 1980s on today’s computers? First of all, we need a processor and hardware that can run the program. Fortunately, WordPerfect ran on x86 machines (amongst other architectures in use at the time), so as long as we can get a copy of a WordPerfect executable, we should be good, right? Not so fast.
The WordPerfect executable, like all programs, makes assumptions about the syscalls available and the kernel that runs underneath. So, we also need a DOS kernel and the ability to run it on modern hardware! In this instance, we’ll emulate a DOS kernel, rather than actually installing an ancient operating system on your computer.
Emulators are one of the most common methods of interacting with legacy software. Essentially, an emulator recreates the original environment needed by the software, allowing it to run on a modern computer. In our case, we will use DOSBox to emulate a DOS environment. This will give us a platform to run WordPerfect and open our file.
Note: For this, and all following installation steps, you should not use the course container.
Opt-Enteron Mac) to enable fullscreen when running DOSBox.)
~/DOSBOX), then start DOSBox. You’ll see a MS-DOS command prompt (
Z:\>). Mount the folder you created as the
Z:\> MOUNT C <FULL-PATH-TO-FOLDER>inside DOSBox (For instance,
Z:\> MOUNT C ~/cs300/DOSBOX.) Note that this mounting functionality is something the emulator provides; it wasn’t available in the original DOS running on a physical computer.
C:drive with the command
.dmgfile containing your download of DOSBox. Then, type in
cd dosbox.app, then
cd Contents, and then
cd MacOS. From here, you can reset your KeyMapper by entering
C:\Program Files (x86)\DOSBox-0.74-3). Then, run the
Reset KeyMapper.batscript by double-clicking it.
secret-filesyou downloaded earlier into your DOSBOX (not your
INSTALL. This will start the WordPerfect installation process. Press
<Enter>when prompted. You only need to install the core program and the help files. Make sure you respond
yto features described as essential (including help/utility files), and
nto all questions about installing extra features (printer drivers, graphics, etc).
CS300REF.WPDinto this folder.
Success! WordPerfect should now be installed.
cd WP51from within your mounted folder. You can run WordPerfect on a file by typing
WP <file-name>at the
Your instructions are inside the document. You should copy any files produced into the
timemachine directory of your CS 300 projects repo. Enjoy!
Wahoo! We did it; the old documents were saved from disappearing forever into the void, and you now have a working DOS environment to experiment with more legacy software. (If you’re interested, websites like DOSGames and WinWorld contain numerous DOS-era software and games; try one out, and experience what technology felt like in the 90s.) Now, if someone discovers a stash of George R. R. Martin’s unfinished novels, you’ll be adequately equipped to handle his WordStar files.
The process you went through, albeit very simplified, is a real problem digital archivists face every day when handling decades-old data formats. In the next part of the assignment, as we explore additional challenges to and alternative methods of software preservation, keep in mind some of the difficulties you faced while setting up preservation software and interacting with legacy applications.
In Part 1, we explored a solely software emulator, DOSBox. While DOSBox provides a DOS kernel emulation on an x86 machine, it is certainly possible to boot up DOS on modern machines, as DOS environments (e.g., MS-DOS or IBM PC DOS) are compatible with today’s Intel x86 CPUs.
However, it is not always so straightforward. For example, Library of Congress archivists who worked to preserve digital data of nobel laureate Nina Federoff faced some serious challenges: her data was created with the MacDraw Plus and Hypercard programs, which require an Apple Mac OS 9 environment. Mac computers prior to Mac OS 10 used the PowerPC instruction set, and software for them is incompatible with modern x86 or ARM64 computers. Indeed, with new hardware like Apple’s M1 chips and the accompanying transition to the ARM64 architecture, it is entirely possible that in a few years or decades, software applications written for our current x86 architectures will be incompatible with the hardware of the day. Apple M1 Mac users are already acutely aware of this issue.
Of course, one option to access applications for specific architectures is to purchase a physical retro-computer; for instance, old PowerPC Macintoshes can be found on eBay. However, this is clearly neither durable nor feasible; physical hardware eventually breaks, and rare retro-hardware may be difficult for digital archivists to access—even NASA has had difficulty. What if, like software, we could achieve some sort of emulation, but instead of different hardware components (i.e., different CPU architectures)?
Enter hardware emulation. In this part, we’ll explore one such example: Atari 2600! Released in 1977, the Atari 2600 quickly dominated the market, becoming synonymous with video games and sparking the growth of the entire industry; following its decline, its games have become favorites in retro gaming communities (and have even found use in a rather surprising modern application: deep reinforcement learning).
The Atari 2600 operating system used the MOS Technology 6502 instruction set, a long-since defunct CPU; thus, we’ll use the Stella emulator.
The Stella emulator for this part of the assignment is not officially compatible with M1 machines, and the download page says that it is “Intel only”.
However, our testing on M1 devices suggests that it works just fine. The reason for this is that Apple built an x86-64 emulation mode into the hardware of M1 processors (“Rosetta 2”). M1 processors can dynamically translate the machine code of x86-64 executables into ARM64 machine code as they run them (though this does come with some slowdown). So, if you have an M1, you’re running one emulator (Rosetta 2) to run another emulator (Stella), and there are no fewer than three architectures involved: ARM64 (hardware), x86-64 (emulated by Rosetta 2), and MOS-6502 (emulated by Stella)!
Now answer the following question:
Q1: In the
README.md file in
timemachine in your project repository, describe your experience with the game. Here are some guiding questions (you don’t need to follow them strictly, but do demonstrate that you’ve explored a game):
Hardware emulation provides a complete infrastructure for digital preservation, but it is also technically difficult to execute: all technical specifications and digital logic, down to precise clock cycles and analog elements, must be recreated by a hardware emulator. Moreover, hardware is constantly changing.
For example, Apple switched from Intel processors to Apple Silicon in 2020. If you have an M1 or M2 MacBook, your computer uses this new architecture! This means that many emulators designed for Intel hardware will not work on your device.
Task: Answer the following question (again, in
Q2: What happens to hardware emulators when new architectures come along? How does this affect the feasibility of hardware emulation as a method of software preservation?
Society has plenty of experience with preserving valuable historical artifacts outside of the digital realm: whether art, architecture, film, or archaeological conservation, the importance and process of preserving our cultural heritage is well established. But preservation in the digital context is less well understood.
Task: First, familiarize yourselves with approaches to preserving our cultural heritage.
Next, consider the efforts and cultural implications in the digital domain.
In the assignment, you explored two common ways emulation assists software preservation; while not the only option, emulation has seen the most success as a preservation tool. As interest grows and we continue to improve our toolbox, our ability to tackle technical challenges as a software preservation community increases. However, technical difficulties are just one aspect to consider when dealing with preservation.
As we reckon with our digital legacy, it’s important to consider what we choose to represent.
Task: With your partner, discuss the following and write up some key points from your conversation. (again, in
Q3: How does software preservation compare to other forms of preservation that society already engages in today? What standards should we apply, and how does this compare to the standards used for other types of preservation? For example, you could consider the preservation of art or architecture in your answer.
Another challenge in software preservation is legality. If you look into it, WordPerfect and the Atari ROMs were commercial software at the time when they were produced; to use them, you must obtain a license and/or purchase the software. Yet, preservation efforts have made these programs freely available on the internet. Digital archivists who host such artifacts face legal uncertainties every day.
Archivists frequently operate with abandonware — software that, while technically still proprietary and protected by copyright, has been ignored by a potentially defunct manufacturer. Some manufacturers (if they’re still around) actively help abandonware sites, or at least tolerate them. However, this is not always the case; some manufacturers, like Microsoft or Nintendo, have pursued legal challenges against digital archives, which resulted in some major sites shutting down.
Task: For the following question, coordinate with your partner to take on opposing, or at least conflicting views. Then, respond individually in
timemachine/README.md. When you are done, come together and discuss your responses! Together, write up 3-4 bullet points from your conversation (also in
timemachine/README.md). If you are able to come to a consensus, make sure to include your conclusion and explain how you reached it. If not, explain where your positions clashed.
Q4: Should digital libraries and preservationists receive legal protections, and if so, what might this look like? When supporting you answer, be sure to consider:
So far, companies and government institutions have been rather uninvolved in software preservation efforts. As software proliferates and grows in complexity, and more of it moves to web services that have proprietary server-side code, it will become increasingly difficult for individuals and non-profit organizations to preserve digital artifacts on their own.
Task: For this activity, each partner should play the role of a different stakeholder in software preservation (e.g. the government, a company/software developer, an individual/non-profit, or a consumer). Your response should go in
Q5: Take turns sharing what you think “your” responsibility towards creating and preserving digital content should be, and what your partner’s should be. Then, see if you are able to come to a consensus. If you are, write your conclusions about what each stakeholders’ responsibility should be. If not, write a summary of your conversation, and explain where your positions clash.
Here are some potential dimentions to consider (although feel free to explore different directions):
Maintaining legacy software is an expensive, laborous, and ever-growing task, and the financial incentive for effective preservation is not always high. Furthermore, thoroughly maintaining all of our old software requires substantial use of online resources and programming ability. As such, it is worth discussing the true importance of software preservation, weighed against these high costs.
One way to look at this conversation is to define where software preservation falls between public good and expensive taste. In Michael J. Rushton’s paper on Expensive Tastes and Public Funding for the Arts, he defines expensive taste as:
“…those held by a person who, compared with the general population, in order to achieve a given level of welfare, needs to have available for consumption a good (or a few goods) that is only available at a high price. Suppose, for example, George only enjoys an art form that is expensive to experience, when most of the population is satisfied by cultural offerings more cheaply obtained, and further assume that this art deeply matters to George in terms of his wellbeing and capability for enjoying a fully satisfying life.”
Meanwhile, public goods are commodities that benefit all citizens, and should therefore be made publicly available. Services that qualify vary by country, and might include public education, national defense, and healthcare. Furthermore, once an item is a public good, it will be:
“… made available to all members of a society. Typically, these services are administered by governments and paid for collectively through taxation. …The two main criteria that distinguish a public good are that it must be non-rivalrous and non-excludable. Non-rivalrous means that the goods do not dwindle in supply as more people consume them; non-excludability means that the good is available to all citizens.” (Investopedia)
Task: As with Question 4, coordinate with your partner to take on opposing, or at least conflicting views. After you respond individually in
timemachine/README.md, come together and share what you wrote! You should then write a few sentences responding to your partner’s position, and include that in your
README.md as well.
Q6: Is legacy software an expensive taste or a public good? Should this impact our approach to software preservation? Be sure to explain your reasoning.
Congratulations 🎉 you’ve completed the SRC assignment for CS 300! We hope you’ve developed an appreciation for the difficulty and importance of software preservation, as well as some common techniques and considerations, and how they relate to the technical operating systems and hardware topics we discuss in the course.
Please hand in the files and answers for this assignment via Git in your
cs300-s23-projects-YOURNAME repository. Put your answers into the
README.md file in the
timemachine/ subdirectory of your project repository, and also put all other files from this assignment into that directory.
By 6:00 PM on Friday, April 21st, you must have filled in the file
README.md in the
timemachine directory in your projects repo, and pushed the files produced by Part 1 of this assignment.
This assignment is worth 3% of your total course grade.
This assignment was created for CS 300.
Indeed, our course Docker container originated in part to support the new M1 Macs; the old virtualization software, VirtualBox, worked only on x86 machines, and was incompatible with ARM. ↩︎