Testing ParselTongue
You are the proud CEO of BuildHub, a company that provides 3D printing solutions for any device on the market. Unfortunately, every device on the market has a slightly different implementation of the scripting language ParselTongue, which is the high-level language of choice for describing 3D designs. In order to write software for all these devices, you need to figure out how the implementations on different machines differ.
Your task is clear: you need to design a test suite of ParselTongue programs that can find the bugs in these implementations. You have at your disposal:
- A tutorial of the ParselTongue language (read the rest of the instructions first, then start here);
- A specification of the ParselTongue language;
- An executable that contains 25 broken ParselTongue interpreters, and a correct interpreter
- A (tiny) start to a test suite that contains some example programs
Getting Started
Go to the links for the executable downloads and find the one that matches your platform. Unzip it and find the executable at (depending which one you downloaded):
win32-dist/assignment1-win32.exe
osx-dist/bin/assignment1-osx
debian-dist/bin/assignment1-debian
The program takes a few commands:
--interp
Run standard input as a ParselTongue program, using the correct
implementation.
> echo "+(40, 2)" | ./debian-dist/bin/assignment1-debian --interp
42
--test-interps <directory>
Run the tests in <directory> against all the broken interpreters.
This will yield an error if the tests don't pass the standard
implementation. It will yield output that describes which
interpreters your test suite catches (if any), and those that it
doesn't.
> .\win32-dist\assignment1.exe --test-interps path\to\my\tests
--brief <directory>
Run the tests, as with --test-interps, but provide much briefer output
(just one line of pass/fail for each broken interpreter).
--single <interp-name> <directory>
The output from --test-interps has nicknames for the broken
interpreters. With the --single option, only that interpreter is
run. E.g.
> ./osx-dist/bin/assignment1-osx --single objects1 path/to/my/tests
This also gives much more detailed feedback on how the output from that
interpreter differed from what your tests indicate was expected.
--report <directory>
(described in the "Handing In" section)
To get started, download and unzip the sample test suite, and run the executable on it with the --test-interps options. (Part of) the output should look like this:
if-then-else1:
Bug not found!
if-then-else2:
Differences in:
/Users/Jonah/Documents/fall2012/cs173ta/repo/parseltongue-lang/../doc/assignment1/sample-test-suite/if1.psl
This indicates that your test 'if1.psl' found an error in the broken
interpreter named 'if-then-else2', but none of your tests identified the
problem with 'if-then-else1'. The labels if-then-elseN
simply
indicate that the problem with the interpreter is somewhere in the
if-then ParselTongue construct. There are interpreters that have
incorrect behavior on objects, functions, variables, scope, and more,
and these labels should guide you towards which features you need to
test more.
NOTE: Unfortunately, there are by mistake 2 different interpreters both named operators2. They have different bugs to find; they just also unfortunately happen to have identical names. So, you have to catch both of them since they're different.
Writing your own tests
The best way to get a feel for writing tests is to examine the existing tests in our sample test suite. Every test case consists of up to three files:
<testname>.psl
- the ParselTongue program to interpret<testname>.psl.expected
- the expected result of interpreting the program (output tostdout
)<testname>.psl.error
- the expected error output of interpreting the program (output tostderror
)
An omitted expected
or error
file is treated as an empty
string---e.g. if you create a test with only a .expected
and .psl
file, you are telling the testing script that you expect no error output
at all.
Caveat regarding timeouts: If your test causes the interpreter to run for more than about 3 seconds, it will be stopped. Since there is no way to tell what ouput should have occurred by then (because of file buffering reasons and general OS nondeterminism), the testing script flags it as a timeout and assumes empty standard out and empty standard error for the run. The upshot is that the testing script will detect timeouts as different from non-timeouts, so if your test, for example, runs infinitely on the correct interpreter but not on a broken interpreter, you will get credit for having detected that interpreter's bug. Conversely, if it times out on both, you don't detect the bug.
You can generate the expected and error outputs by running the correct interpreter on the test program like so:
> ./debian-dist/bin/assignment1-debian --interp < mytest.psl 1> my-test.psl.expected
2> my-test.psl.error
On Windows, the redirection works a little differently. The type command will output the file you want to run, which you can then pipe to the interpreter command. Inside a command prompt:
> type mytest.psl | .\win32-dist\assignment1-win32.exe --interp > mytest.psl.expected
2> mytest.psl.error
Windows seems finicky about redirecting output; we've had to run the output redirection a few times to get it to work in some cases. Let us (and the course community) know if you run into any problems.
Each test should target a specific feature of the language. A good test suite consists of many small test cases that each exercise a single facet of the language using as few other language constructs as possible. Explain what each test is testing in a comment at the top of the test file.
All of your tests should pass our definitionally correct interpreter. No matter how legitimate you think your test is, if it fails on our interpreter, it is wrong. Like any language in the real world (think JavaScript), it is the implementation of ParselTongue that you need to worry about when running your code, not the specification. The spec should function as an extremely detailed guide for your testing.
How to Proceed
Start with the tutorial, it will give you a tour through ParselTongue's features and syntax. It also has a ton of helpful hints for particularly interesting features to test. If you try all the recommended examples in the tutorial, you'll be well on your way.
Armed with the tutorial, you should be able to start building up your own test suite, using the instructions above. You should be able to catch a number of the broken interpreters with fairly straightforward programs. Others will seem much harder.
When you've gotten to some interpreters that don't seem to be easy to detect, use their labels to figure out what kind of feature they are testing, and go to the spec. If it's functions, go check the spec for a detailed look at when, how, and where the arguments are evaluated, for example. We'll be honest; some are quite devious, but we promise that we have a test suite that catches each and every one of them.
Handing in
Before you hand in your assignment, you'll need to grade it using the grade report option built into the assignment executable:
--report <directory>
Run the tests in <directory> against all the broken interpreters,
and generate a grade report. This report is what you'll hand in
when you're ready to submit your assignment, and contains your
grade and a signature of its authenticity.
> ./osx-dist/bin/assignment1-osx --report path/to/my/tests > my-grade-report.txt
Once you've generated your grade report upload it here with:
- Your grade report file in
txt
format (you must include the.txt
extension) - A
zip
file containing your test suite (and nothing more)