Homework 3-2

Due April 7, 2015, 9:00 am

Reminders

For the following problems you may discuss the concepts that will help solve these problems with classmates and course staff. You are never allowed to copy down the answers of your classmates, as that is a violation of the collaboration policy.

Advice

In this homework, you'll be getting more practice with regular expressions.

Task 1

Download HW3-2 and casey.txt to the same directory. Open HW3-2.py and make sure you understand the code we've provided. Run the file. In the following task, you'll design regular expressions that match certain patterns.

Look at the following example of a regex expression

longRegExExample= r'Casey\s+\w*bat'

The r that appears before the string pattern means "raw text", and tells Python that it shouldn't try to process the regular expression - which has its own special set of characters - using Python's special string characters. We'll use this r from now on with regular expressions as a safety measure.

Look at the pattern (surrounded by single quotes) and figure out what kind of strings this pattern can match. Remember that * usually means "zero or more of the preceding thing", so in order to treat * as a character (rather than a special character), you can put a \ in front of it. Other special regex characters include: . (wild card, not new line), + (one or more of the previous), | ("or"), \s (space), $ (end of line). Refer to the links in the "Advice" section of this page for more info about these characters and others.

  1. In the interpreter, design regular expressions for printRegexInCasey() that print the following:
    1. Print all occurences of the name Casey
    2. Print all words that have the ending at
    3. Print all the words that are followed by a semi-colon ;
    4. Print all words that contain ll, tt, or dd somewhere in the word. Hint: This would be a good time to use the | character!
    5. Print all the words that are followed by a period. Hint: Remember that periods are special characters!
    6. Print all occurrences of speech that contain at least one exclamation mark (phrases surrounded by double quotes that contain an exclamation)
  2. Record your answers in the homework file and rename the file as FirstLast_HW3-2.py. Do not write your calls as comments, they should each run when we run your file.

Task 2

Remember how you learned to import the data from a website in Google spreadsheets? Well, you're about to learn how to the same thing in Python!

  1. The steps for reading data from a website are very similar to the steps for reading data from a file (i.e. you have to open the file, read it and then close the file). Look at importURL(). This function imports data from an HTML website and outputs it as a string. Go to this website. We're going to import its data. Run the HW3-2 file. In the interpreter, make a call to importURL() using the website (provided as a string) as an input. What do you see? This is html code in its purest form. Experiment with importURL using a couple other URLs. Provide a couple examples of your calls in the HW file (as comments).

    Html pages generally have the same structure; all the content is surrounded by tags. There are lots of different tags, but pretty much all html pages have 2 tags in common: <html> (which basically says the page contains html code) and <body> (which is what holds basically all the siginificant content of the page). We'll be exploring this further soon.

Task 3

You can pass an extra parameter to re.search and re.match that makes the function to behave in a case-insensitive way. Compare the results of re.match('A\w+a', 'abracadabra') and re.match('A\w+a', 'abracadabra', re.IGNORECASE), and note that the pattern matches only in the second time.

Make a function that given two strings and a text, returns the text removing everything up to the occurrence of the first string, and everything after the occurrence of the second. The search is done case insensitive (use re.IGNORECASE, described above).

Handin

Rename your program FirstLast_HW3-2.py and share it with cs0931handin2015@gmail.com.

Note: Before you turn in your Python files, make sure they run properly(Save your Python file. Then select Run > Run Module or hit F5 on your keyboard)! If nothing appears in the Shell, don't worry as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!