In this homework, you'll be getting more practice with regular expressions.
Download HW3-1 and casey.txt to the same directory. Open HW3-1.py
and make sure you understand the code we've provided. Run the file. In the following task, you'll design regular expressions that match certain patterns.
Look at the following example of a regex expression:
longRegExExample= r'Casey\s+\w*bat'
The r
that appears before the string pattern means "raw text", and tells Python that it shouldn't try to process the regular expression—which has its own special set of characters—using Python's special string characters. We'll use this r
from now on with regular expressions as a safety measure.
Look at the pattern (surrounded by single quotes) and figure out what kind of strings this pattern can match. Remember that *
usually means "zero or more of the preceding thing", so in order to treat * as a character (rather than a special character), you can put a \
in front of it. Other special regex characters include: .
(wild card, not new line), +
(one or more of the previous), |
("or"), \s
(space), $
(end of line). Refer to the links in the "Advice" section of this page for more info about these characters and others.
Casey
at
;
ll
, tt
, or dd
somewhere in the word. Hint: This would be a good time to use the |
character!FirstLast_HW3-2.py
. Do not write your calls as comments; they should each run when we run your file.
Remember how you learned to import the data from a website in Google spreadsheets? Well, you're about to learn how to the same thing in Python!
importURL()
. This function imports data from an HTML website and outputs it as a string. Go to this website. We're going to import its data. Run the HW3-2 file. In the interpreter, make a call to importURL() using the website (provided as a string) as an input. What do you see? This is html code in its purest form. Experiment with importURL using a couple other URLs. Provide a couple examples of your calls in the HW file (as comments). <html>
(which basically says the page contains html code) and <body>
(which is what holds basically all the siginificant content of the page). We'll be exploring this further soon. You can pass an extra parameter to re.search
and re.match
that makes the function behave in a case-insensitive way. Compare the results of re.match('A\w+a', 'abracadabra')
and re.match('A\w+a', 'abracadabra', re.IGNORECASE)
, and note that the pattern matches only in the second time.
Rename your program FirstLast_HW3-2.py
and share it with .
Note: Before you turn in your Python files, make sure they run properly (save your Python file. Then select Run > Run Module
or hit F5
on your keyboard)! If nothing appears in the Shell, don't worry as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!