Homework 10: Book Bandit (Part Two)

Note: Hashtables and dictionaries refer to the same to the same data structure in Python.

The Assignment

Now that the book club members have been using your website to access contraband books, they have been complaining at how inefficient and clunky the website is, and several of your tech savvy cowboy friends decided to investigate your code under the hood (or under the saddle, we might say), and have noticed that you use a ridiculous number of lists, they suggest that there is a more efficient way of storing book data: hashtables!

Getting Started

Download the support code here!

It will contain all of the support code needed to run the website (same as HW9), testlight.py, hw10_test.py (where you will write your tests), and the two new files described below. Please make sure the project you’re working in has all of these files.

You might notice that we don’t provide a hw10_code.py stencil file! We’ll be providing this on Canvas, since it contains our solutions to some of the functions from HW9. This assignment will be building off of these implementations.

If you haven’t yet turned in hw9, please don’t get the hw10 stencil file from Canvas before you do (unless you aren’t going to turn in hw9 at all). Bear in mind that Canvas can show us when people access files.

The Stencil

We’ll be providing the hw10_code.py stencil on Canvas. Please copy this file into the same folder where you downloaded the hw10 source code, so that the stencil, along with any files that were in hw10.zip (including the subfolders) all live in the same folder.

In the stencil, we’ve clarified which functions you should edit with TODOs, and have provided our implementations of the list-versions of all of the functions. Because the main goal of this homework is to edit these functions, please make sure to understand them before getting started, and feel free to ask at hours/Campuswire if any of them are unclear!

New Files

books.csv: books_csv is a much, much larger version of the small_books.csv that you encountered in Homework 9. It also contains more than five genres!
genres.txt: Now that there are so many genres, it is a little unwieldy of us to keep them all in a list inside our code file. This time, we’re keeping track of all of our genres in this text file. This is also convenient because it allows us to add/remove genres if we need to without having to edit our code!
To read in the content of this file, we’ve added the following code block in our stencil:

    # loads list of genres to display on main page into list of strings
    display_genres = list(map(lambda line: line[0], csv.reader(f, delimiter="\n

This will read in all our genres from genres.txt as a list of strings called display_genres.

New Global Variables

You should have three hashtables. We’ve defined these at the top of the stencil:

id_to_book_dict Hashtable that maps a ID number to a Book
Key: ID
Value: Book
genre_to_books_dict Hashtable that maps a genre (a string) to a list of all the books in that genre
Key: Genre
Value: A list of all the books that are in that genre
Note: This global variable is replacing the functionality of the list created by HW9’s make_genre_list() function. We will be filling it out in setup(), just like the id_to_book dict. For more details, read about the setup() function below.
book_to_score_dict Hashtable that maps a Book to its recommendation score
Key: Book
Value: Recommendation score for that book

Note: In this class, in order to tell Python that we can use the Book data type as a key for our hashmap, we need to make sure we can’t alter a Book’s fields once its created. To do so, we’re adding the frozen=True annotation to our dataclass. This change is present in the stencil.

Note: Data structures dealing with carts and purchases will stay as lists. Reference the tasks below for the things you do need to change.

Coding Tasks

Your primary task will be updating our old implementation to use hashtables instead of lists for various features.

Instead of using lists to keep track of our books, genres, and recommendations, we’ll be using hash tables!

Functions to Change from HW 9 (required for passing)

The stencil code from Canvas will provide detailed TODOs in the comments, so be sure to check those out.

setup():
Reads through the full books.csv CSV file and populates a global book_dict, which is a hashtable with integer book IDs as keys and Books (the dataclass) as values.
Another important change: We’ll also be using this function to populate the global genre_to_books_dict. This is different from HW9, in which we used the make_genre_list() function to organize our books by genre. We no longer have this function, so we will do this in setup().
Note: books.csv contains 32 different genres, while genres.txt contains 25! genre_to_books_dict should contain all of the books from books.csv.
get_book(ident: int) -> Book
Returns the book with ID ident. If a book with the IDident is not found, Python will throw an error for you (since we’re looking in a hashtable), so no worries about raising your own error here.
search_books(query: str) -> list:
Given a query, return up to 50 of the books that contain the query in the title, author, or genre (case-insensitive in both directions). If there are fewer than 50 books that match, return all of those books. (We no longer have a master list of books to search through, so think about how you can access the books from id_to_book_dict)

New Function to (Re)Implement (required for passing)

The functions below will be ones we made entirely for you in HW9! To account for our de-listifying of HW9, they’re going to need a few changes.

get_book_dict() -> dict: will return a hashtable mapping each genre in genres.txt to a list of 25 randomly selected books in that genre. To do this, we’ll need a list of genres to display, which we stored in display_genres.
- Hint: Your function will much different from HW9’s get_book_dict! It will be way more concise and will allow for more genres other than the 6 we specified in HW9.
- Note: In HW9, this function sampled 10 books for each genre, but you will be sampling 25, since there is more data to work with!
- Note: genres.txt does not contain all of the genres in books.csv! This is okay. You are only responsible for displaying the books in the genres in genres.txt.

Recommendation system (for full credit; not needed for passing):

For Homework 10, you’ll be taking charge of Homework 9’s recommendation scoring system! In HW9, we implemented the recommendation system by using the dataclass BookScorePair, and keeping a list of those pairs. For this homework, you’ll be going into the functions we wrote and editing them as necessary so that they only use the book_to_score hashtable. In other words, we shouldn’t need the BookScorePair dataclass anymore, or any of the functions that work with them (i.e., find_book_score_pair)! It might be a good idea to look back at our list-heavy implementations of these functions, understand how the code is working, and then get to de-listifying. If you have any questions about what our code was doing, feel free to ask at hours/Campuswire!

Here are the rules for the recommendation system, and the functions that apply those rules.

Initially, there should be no books recommended for the user. This means that there are no keys and values in the book_to_score hashmap. When
(1) the user buys a book (which calls buy_books_in_cart),
(2) clicks on a book (which will call get_book), or
(3) searches for something (which calls search_books),

the recommendation scores for books should be updated:

(1) and (2):

All other books of the same genre receive one “point”.
All other books with the same author receive three additional points.
The book being purchased (and the books that have been purchased in general) will never be recommended again.
This update of recommendation points happens when the functions get_book or buy_books_in_cart get called.

Function to implement for rules 1 and 2:
update_recs(book: book):
This function will be called when get_book or buy_books_in_cart is called. It will update the recommendation score of the book passed in accordingly. You will be editing update_recs to work with the recommendations hashtable, instead of relying on a list of BookScorePairs.

Correction to Stencil: In buy_books_in_cart, the purchases.append(book) should happen before update_recs(book) in the loop.

(3):

For each book, you should look at each word in the query – and for every word that is contained in the book’s title, author or genre (case-insensitive), you would add 1 to that book’s recommendation score.
- For example: If the query is 'Harry Potter and the Sorcerer's Stone is searched, it will look at the words, one at a time. The system should loop through['Harry', 'Potter', 'and', 'the', 'Sorcerer's, 'Stone'] to check if each word in this list is in that book’s title, author, or genre or not, and update the recommendation score for that book accordingly. If "Harry" is found in the title, author or genre, it will update the score by one. And if "Potter" is also found in the title, author or genre it will update the existing score by one more, and so on and so forth. "Harry Potter and the Sorcerer's Stone" would receive six points (one for each word in the query), while "Harry Potter and the Chamber of Secrets" would receive four points (for the first four matching words).
- This updating of recommendation points occurs when search_books is called.

Function to implement for rules 3:
update_recs_given_query_words(queries: list): will be called in the function search_books. You will take in a query in a list form and update the books’ scores accordingly based on each element in queries. Similar to update_recs, you will be editing our implementation to work with the recommendations hashtable.

Note: Even if the entire search query does not match a book, if part of it does, the book’s recommendation will still be updated accordingly. For example, if "Harry Potter and the Chamber" is searched, the recommendation score for "Harry Potter and the Sorcerer's Stone" will be updated because it partially matches (though it will not be returned in the search results because "Harry Potter and the Chamber" is not part of "Harry Potter and the Sorcerer's Stone").

Clarification: the only keys that should live in the book_to_score hashmap are books that are recommended: in other words, books with scores greater than zero. This is currently how the listified versions of update_recs and update_recs_given_query_words work. In other words, books are only added as keys to the dictionary when they become a recommended book.

Note: Books that have been purchased will not be recommended again.

get_recommendations:
Last but not least, re-implement/edit get_recommendations() which returns a list of books should be sorted by their scores. It should contain no more than 50 books.

Running the code

It’s the same as Homework 9, so refer to that handout for more detailed instructions.

Written Portion

For this homework, you will also be submitting a written reflection on what you have learned. In a block comment underneath your code in hw10_test.py, submit roughly a paragraph on each of the following questions:

Contrast working with lists vs dictionaries from the programming perspective. When would you recommend using one vs the other? Which one do you prefer to use if given a choice?
What are some of the main lessons you are taking away from the course about how to write working programs? What steps or habits have you found most helpful in doing programming assignments?

You will get credit for the written portion for (a) taking it seriously, and (b) making accurate statements about lists and dictionaries for the first part of question 1 (the last question in 1—about your preference—has no correct answer. You’ll get credit for that just for providing an answer).

Testing

Test the following functions in hw10_test.py:

get_book() (Note that you no longer have to testValueError.)
get_recommendations()
update_recs()
update_recs_given_query()
search_books(), but you do not need to test the edge case of having to return only 50 books when more than 50 books match the query.
Any helpers you write.

You do not have to test setup(), get_book_dict(), or any of the functions dealing with carts/purchases.

Grading Requirements

Minimum Functionality:

Re-implementing all functions that rely on a list of books/list of genres so that they now use the id-to-book hashtable and genre-to-book hashtable. In other words, we should have a working implementation without the book_master_list or make_genre_list().
Completing the written portion.

Note: you will have to change portions of the recommendation functions to get the associated data from corresponding book in the data structures that you created, but you will not need to change the implementation details (i.e. you can still use the BookScorePair dataclass that is provided if you’re aiming for minimum functionality.)

Full Functionality:

All of the above.
Storing book to recommendation score in a hashtables.
Implementing the recommendation scoring system, meaning changing the functions of update_recs, update_recs_given_query_words, and get_recommendations() so that it doesn’t use the BookScorePair dataclass.

Correctly implementing minimum functionality will warrant a passing grade, assuming some attempt at testing was made. This is a reminder that your code does not have to be functional to get testing points, so make sure to write tests for your code!

Yee-haw! Once you’ve completed this homework, you’ve successfully implemented what was previously Project 3. Also, congratulations on finishing your last CS111 homework assignment! It’s been one heck of a rodeo :)

Theme Song

The Last Rodeo by Kix Brooks