Activity 2-11

Syllable Counter v2.0

In this lab, you will once again be creating a syllable counter. As a reminder, the syllable counter works by counting vowels according to the following rules:

  1. A vowel counts as one syllable.
  2. However, a sequence of adjacent vowels (a, e, i, o, u, y) is only one syllable.
  3. If the word ends with the vowel e, it does not count as a separate syllable.
  4. If the algorithm yields a total syllable count of zero, change the syllable count to one.

This time, you will implement the counter using regex.


Task One

We recommend that you use RegExr to help write and test your regex. You can use this data for testing by copying it into the Text area of the site. Each part of the word that matches the regex will be highlighted so you can check that your regex is matching the correct number of syllables for each word.

You should first copy your code from ACT2-9, including your words_syllable_counts dictionary and the for loop where you check your syllable count for each word against the true syllable count. Replace the call to the count_syllables(word) function with count_syllables_regex(word), which is the the new function you will be writing.


Task Two

At the top of your code, import the re module.

You will design a regular expression to count the number of syllables. To fulfill the requirements of the rules, you will break the problem into three separate cases.

Your regex will end up being structured like this: (expression 1)|(expression 2)|(expression 3) for the three cases. The vertical bar | means "or" in regex. We want to count part of a word as a syllable if it matches expression 1 or expression 2 or expression 3.

As you construct your regex, you can use the re.findall() function to find all the parts of the word that matches the regex. This returns a list of the parts of each word that match your regex. Then you can simply return the length of that list to get the number of syllables.


Task Three

You will be constructing the first expression in the regex here.

The first case occurs when a vowel (excluding e) is present. This vowel may be followed by zero or more adjacent vowels (including e).

We do it like this because we want to count a group of consecutive vowels as one syllable.

Hint: you can match one of multiple possible letters by including them in square brackets. If you write the regex [abc], it will match any part of a word that has the letters "a" or "b" or "c". You can match a letter zero or more times by using an asterisk. So if you did something like: [abc][abc]*, this would match the strings "a", "aa", "aaaaa", "bc", and "baaacba", for example.

Now run your code. Your output should be like this:


Task Four

You will be constructing the second expression in the regex here.

The second case occurs when the vowel e is present, followed by one or more adjacent vowels (including e). This ensures that the e is not the last letter in a word.

Hint: this will be similar to what you did for the first expression. You can match a letter one or more times with the plus symbol. So if you did something like: [abc][abc]+, this would match the strings "aa", "aaaaa", "bc", and "baaacba", but not "a".

Your regex should now look like this: (expression 1)|(expression 2)

Now run your code. Your output should be the same as in task three.


Task Five

The third case occurs when the vowel e is followed by exactly one alphabetical letter. For simplicity, you can use a-z instead of listing out all the non-vowel letters. Again, this ensures that e is not the last letter in a word.

Hint: this will be similar to what you did for the other expressions.

Now combine all three expressions and run your code. Your output should be like this:


Task Six

Remember to account for monosyllabic words where the regex indicates that the number of syllables is zero. Change the syllable count to be one in that case.

Now run your program to make sure that there is no output and that your code is counting the number of syllables correctly.


Task Seven

We will be comparing the time it takes to run the regex version of your syllable counter versus the iterative version from ACT2-9.

At the top of your code, import the time module.

In your main function, add in this code:

start = time.time()
for i in range(10000):
    for word, true_count in word_syllable_counts.items():
        my_count = count_syllables_regex(word)
end = time.time()
print("Time {}".format(end - start))
								

Now copy that code again, changing count_syllables_regex(word) to count_syllables(word). Here, we are running the for loop 10,000 times for both the regex syllable counter from this lab and the iterative counter from ACT2-9, keeping track of the amount of time that it takes.

Run this code and see which version of the syllable counter is faster.


Once you're done, please check off your lab with a TA or share your file with cs0030handin@gmail.com by midnight, 4/6.