Homework 2-7

Interested Topics Survey

Please respond to this survey to help us select topics for the final lectures and activities of the course

Comparing Texts with Python

In this homework, you’ll compare the lyrics of various musical artists. Click these links to download text files for the lyrics of five artists: Adele, the Beatles, Beyonce, Coldplay, and Drake.

All of the lyrics have been cleaned, meaning that punctuation has been removed, all words are lowercase, and stopwords (e.g. “the,” “it,” “and”) have been eliminated.

Your Program

Your program should compare two artists’ lyrics. You’ll specify which two via command-line arguments. Feel free to try out whichever pairs you’re interested in!

You should have three functions (aside from main), which execute the following:

1. Find the 20 most common words in each text file. Then print out the overlapping most common words for both texts.

Hint: You’ll need to use the Counter tool to find the most common words. To find the overlapping words, you’ll need to extract just the words (i.e. the “keys”) and add them to a set. Use the following syntax to create a set:


most_common = counter.most_common(20)
word_dict = dict(most_common)
most_common_words = set(word_dict.keys())

An explanation of how the code above works:
  • The most_common variable is a list of the 20 most common key/value pairs
  • word_dict is a dictionary of the 20 most common key/value pairs
  • most_common_words is a set of the the 20 most common keys

2. Find the 20 most common words in each text file. Then print out the words that are in the 20 most common of the first text but not of the second.

Hint: You’ll need to use Counter and sets, in a process similar to function #1

3. Print out how frequently a user-specified word appears in each text.

Note: this word should be specified with a command-line argument, not written explicitly in your Python program

Each time you run your program, only one of the three above functions should execute. Specify which one (i.e. your “mode”) via a command-line argument.

When you run your program fro the command-line, it should look something like
python3 your_file.py mode lyricsone.txt lyricstwo.txt word_to_count

Make sure that all of your functions include docstrings. Feel free to include additional comments throughout your code.


Extra Credit

As with every homework, you are welcome to add to the functionality of your program. We will be awarding points based on complexity and effort.


Final Task

In three separate commented lines at the top, place the number of hours you worked on this assignment, any collaborators you worked with and whether you went to TA hours for this assignment.


Handin

Once you're done, email your Python script to cs0030handin@gmail.com by midnight, 3/20.

Make sure your submission has your name in the filename: FirstLast_HW2-7.py. “FirstLast” should be replaced with your first and last name or we will take off points. Make sure every task has been completed.