Activity 3-1

March 31, 2015

Task 1: String Matching & Regular Expressions

The website http://regexpal.com/ allows you to test regular expressions on sample text. We will use the page in the steps below. Enter regular expressions in the top box and sample text in the bottom box.

  1. Copy the following text into the bottom box:

    Ahab was born in 1802. Starbuck, in '98 --- 1998, that is. And (as everyone knows), Ishmael was born in 2036 but died in 1879 (making him [1879-2036=]-57 years old at the time of his death, or 48, depending on how you count). And the white whale was immortal.

    Am I telling this story right? Anyhow, each of them had access to a time machine, a harpoon, and 20+ sheep for barter. Including the whale. As was common in his era, Starbuck had his own name tattooed on the back of his arm: "*$", it read, "Star-buck", a kind of pun.

    Our story begins in interstellar space, in the year 2000 BC...

  2. Enter the following lines into the top box. What do you think brackets do?
    • n
    • fn
    • [fn]
    • [aeiou]
  3. Enter the following lines into the top box. What do you think \w does? What do you think the + sign does?
    • f
    • f\w
    • f\w\w
    • f\w+
  4. Enter the following lines into the top box. What do you think \s does?
    • \s
    • \sm
  5. Find an expression that matches all the words that start with b, f, or m.
  6. Remember that \n is a special character that denotes a line break. Find an expression that matches all the words that appear at the end of a line.
  7. Add some numbers to the bottom box. Test what \d does.

Task 2: More Examples of Regular Expressions

Download ACT3-1.py and poem.txt in the same directory, then open ACT3-1.py with IDLE. (Remember why we need to put poem.txt in the same directory as the Python program?). Press F5.

  1. Run the following statements and figure out what each line does.
    myStr = readShel()
    
    printRegex('g\w+', myStr)
    printRegex('\sg\w+', myStr)
    printRegex('\s[gG]\w+', myStr)
    
    printRegex('out', myStr)
    printRegex('\wout', myStr)
    printRegex('\w+out', myStr)
    printRegex('[\s\w]out', myStr)
    
  2. Design regular expressions that print the following. Don't worry about capitalization (just get all instances of lowercase).
    1. Print all occurrences of the substring it.
    2. Print all occurrences of the word it. A word should be surrounded by whitespace. There are five instances of the word it.
    3. Print all words that contain at least one letter, followed by it, followed by at least one other letter. There are four such words.
    4. Print all words that end in ings. Hint:There are two such words that are followed by punctuation.
    5. Print all phrases surrounded by double quotes (all occurrences of speech). There is only one phrase.
    6. Print all contractions (words with single quotes). Remember that single quotes are “special” - they require a \. There are two such words in the poem (She'd and I'll), but write the expression to return any contraction.