Homework 2-8

Readability Algorithms

Introduction


Master Shifu has a lot of scrolls containing legendary stories about Kung Fu history. He wants to assign some of these scrolls to his students to read, but is worried that some of these texts may be too hard to decipher. Fortunately, Master Shifu was able to convert these ancient scrolls into .txt files on his computer and was able to 'clean' the data. Your task is to write an algorithm that scans these .txt files and assigns a grade-level readability score to the file.


During the presidential election one of the criticisms of Hillary Clinton's campaign was that she did not do an effective job appealing to less educated voters. One of the reasons for Donald Trump's success was that he supposedly talked at a less-complicated level. In this assignment you will be attempting to determine if this was the case.

You will be implementing two different readability algorithms: the Automated Readability Index and the Coleman - Liau Readability Index. Your program should provide both readability scores (grade level or age level to one decimal point) for both of the .txt files we provide.

In your submision, attach a few sentences about your implementation and your results. We highly recommend that you complete the in class activity before starting your implementations.

Note: Each algorithm produces different results from each other

Text Files:
  1. Clinton's responses
  2. Trump's responses

Program Specifications

Both algorithms should be included in one file. An algorithm and file to be tested should be specified from the command line.

From the command line:
Input: python3 file.py algorithm text-file
Output: Index

Algorithm 1: Automated Readability Index

The automated readability index computes the number of characters, the number of words, and the number of sentences in a text. It uses these three variables to compute an index. This index corresponds to a grade level.

Hint: You can use some of the code you have written from hw2-6 to count sentences and words!

Formula:
ARI = 4.71 x (characters/words) + 0.5 x (words/sentences) - 21.43

Algorithm 2: Coleman - Liau Readability Index

The Coleman-Liau Readability index first breaks the text into 100 word chunks, and for each chunk computes the number of letters per 100 words and number of sentences per 100 words. The averages are computed and stored.

L = the average number of letters per 100 words
S = the average number of sentences per 100 words

Formula:
Index = 0.0588 x L - 0.296 x S - 15.8

Conclusion

Include this section in a google doc and share it with us when you turn it in.

  1. Run both algorithm on both of the .txt files we have provided, and create a table of your readability scores.
  2. Write a few sentences explaining your results.
  3. Include the number hours you spend on this assignment, the names of anyone you collaborated with, and whether or not you went to TA hours

Extra Credit

As with any homework assignment, you have the opportunity to expand on the assignment above and add more. Maybe try running these algorithms through other texts and seeing if you can evaluate the results.

Handin

Once you're done, email your Python script to cs0030handin@gmail.com by midnight, 3/24, and share the google doc with the handing email.

Make sure your submission has your name in the filename: FirstLast_HW2-8.py. “FirstLast” should be replaced with your first and last name or we will take off points. Make sure every task has been completed.