Programming with
Data Structures and Algorithms


You and your friends are creating Nile, which you hope will be the next big thing in on-line bookstores. You know that you can save money by anticipating what books people will buy; your brainstorm is that you will pass these savings on to your users by offering a discount if they buy books that Nile recommends.

To do this, you offer an incentive for people to upload their lists of recommended books. From their lists, you can establish suggested pairs. A pair of books is a suggested pair if both books appear on one person's recommendation list. Of course, some suggested pairs are far more popular than others; and for a given book, it is paired with some books much more frequently than with others.

But that's not all! You are also offering a cool GUI so that users can enter a location containing lists of recommended books and get the feedback right in the same window. Unfortunately, your friends have already done some of the fun design work - your job is to meet the specification by fulfilling the contract they've provided you.

Program Specification

You need to organize the list of recommended books to support two tasks:

These two tasks will be accomplished by you by implementing the Librarian interface. In addition you will also implement a method that validates a directory (see the documentation for the specification of what constitutes 'valid'). You MUST implement this interface; we will be testing your code based on this interface.

The recommendation lists your program needs as input will be in files stored in a directory specified by the first command-line argument (accessible as args[0] in your main method). Each list will be stored in its own file, and every file in the directory will be an input list. Inside the lists every line will contain the description of a single book. Each book has a unique and unambiguous description; that is, if two lines in two different input lists are identical, they refer to the same book. Otherwise they refer to different books. Input lists will always contain at least two books, and they will never contain duplicates.

To retrieve the input lists, you should use the NileIO class we provide. It has two public static methods:

GUI Specification

If your program is executed without any command-line arguments it should intialize the GUI. You are welcome to make the GUI as fancy as you like, and to use any method to create it you like, but we recommend Swing, and require you to have the following:

Support Code

We provide you the support code in a jar format, along with the javadoc.

We do not provide you with the source code (.java file) for two reasons. First, because we do not want you to concern yourself with the implementation of the classes we've written for you. Secondly, because you're responsible for implementing the interfaces we provide, without being able to change them (even though we know your implementation would of course be better, faster, stronger).

You will need to make sure the NileSupport jar is in your classpath. Due to the abundance of Java environments, you will have to figure out how to do this in the environment of your choice. We recommend NetBeans or Eclipse, both of which have numerous support sites online.

Part 1

Write a program in Java that performs the above tasks. Your program should consist of a file,, containing a class with a main method, along with any other files you may need. Your program will be run in one of three ways:

  1. When run with one argument (the directory path), your program should print a list of the most popular pairs of books. The count should be first, followed by a newline, and each pair should be printed on its own line with only a plus sign separating the two books in the pair. For example,
              Lewis Carroll, Alice in Wonderland+Douglas Hofstadter, Godel, Escher, Bach
              Frank Herbert, Dune+Arthur C. Clarke, Childhood's End
              Jane Austen, Emma+Emily Bronte, Wuthering Heights
    Each pair of books should only appear once, and order is irrelevant. Note that
              Lewis Carroll, Alice in Wonderland+Douglas Hofstadter, Godel, Escher, Bach
              Douglas Hofstadter, Godel, Escher, Bach+Lewis Carroll, Alice in Wonderland
    count as the same pair.
  2. Your program may also be run with two arguments, accessable as args[0] and args[1] in your main method. The first argument will once again be the directory path and the second argument will be a book description string in the same format as the input lists. Your program should print a list of books most frequently paired with it, preceded by the count as above. For example,
              Lewis Carroll, Alice in Wonderland
              Larry Niven, Ringworld
              N.K. Stouffer, The Legend of Rah and the Muggles
    If the given book description cannot be found in any input list, your program should print an error message and exit.
  3. Your program may also be run with no arguments. In this situation your main method should invoke your GUI interface, as described above.

No matter which way your program is run, it should print an error message and exit if the given directory is inaccessible or empty. If one of the input list files in the directory is inaccessible, your program should ignore that file and continue.

Note: Java automatically splits command line arguments at spaces, so if you gave as arguments

     /some/path Edwin A. Abbott, Flatland
you would get the args array
    ["/some/path", "Edwin", "A.", "Abbott,", "Flatland"]
To avoid this, you must put quotes around your book descriptions, like so:
    /some/path "Edwin A. Abbott, Flatland"
resulting in the correct args array
    ["/some/math", "Edwin A. Abbott, Flatland"]

Your goal in this assignment is to produce a well-designed, working implementation. You do not need to focus on efficiency. As with all code you write, make sure to include comments.

You may use any of the Java standard libraries, and the NileIO class we provide, but no outside code.

Part 2 – The README

Included in your handin we want a README file, which will include several sections.

Most importantly, we want you to informally analyze the big-O time complexity of your implementation of the two operations. You do not need to rigorously prove your bounds, but you do need to explain your reasoning.

Next, suppose you wanted your program to be interactive and process several requests in a short period of time. In particular, if your program is part of the back-end of a popular server with lots of data, it should respond in as close to constant time as possible. How would you modify your solutions to Part 1 to obtain this? You don't need to provide code; just discuss the algorithms and data structures you would use, and how you might modify the design overall.

Normally, a README is also where you would include system tests and directions to invoke unit tests. You do not need to write rigorous tests for this project (though you should!), but we'd like you to at least discuss in your README what sorts of tests would prove the correctness of your program.

Part 3 – Honesty Policy

Before we will grade this assignment or any of your other assignments, you must read the honesty policy and sign off that you understand and agree to abide by it. To do this, you must type the text in the following image into a plain text file and hand it in with your project.

Honesty Statement

What to turn in: