If you want to know whether or not a text contains a particular regular expression, you can use re.search(regex, text)
. The re.search
function takes two arguments: regex, a string representing a regular expression (e.g. "[a-z]" or "\w+"), and text, a string representing the text you wish to search (e.g. "The quick brown fox jumps over the lazy dog."). You can use re.search
to find words or patterns like this:
match = re.search("[0-9]", "This sentence contains zero digits.") if match: print("Found a digit!") else: print("No digits found.") # the text contained no digits, so this will be printed
As you can see in the example above, re.search
evaluates to True
if the regular expression is found in the text, and False
otherwise. Open ACT3-2.py
and use re.search
to determine if the following patterns can be found in exampleText
:
There might be situations where you'll want to check if a certain string matches a regular expression. For instance, the regular expression "d\w+" would match any word starting with "d". The re.match
function works similarly to re.search
, but with a slight different meaning. The re.match
function takes two arguments: pattern, a string representing a regular expression, and string, a string to check for adherence to the regular expression. The re.match
function returns True
only if the beginning of the string matches the pattern. In other words, you are not looking for occurrences of a pattern inside a string, you are checking whether the string (or an initial prefix of the string) matches the pattern. If it does, the result will evaluate to True
, otherwise it will evaluate to False
. Here's an example of re.match
:
import re match = re.match("d\w+", "potato donut") if match: print("'potato donut' starts with the letter 'd'!?!") else: print("'potato donut' does not start with 'd'!") # correct answer
Try out re.match
in the provided Python file. Check each string in myList
to see if it rhymes with "ping pong." There are more detailed instructions in the Python file.
So far we've been treating the output of re.search
and re.match
as a boolean value, True
or False
. Actually, the output is a bit more useful than a boolean, and it's technically called a MatchObject
. There are certain functions you can call on a MatchObject
.
The function which will probably be most useful for this course is the group
function. For group
to be particularly useful, you'll need to organize your regular expression into "groups". This is done with parentheses, like this: "(d\w+)\s(d\w+)". That previous regular expression has two groups (and each group must individually match as usual) separated by a whitespace character. Modify your code for Task 2 so that your regular expression has two groups (i.e. one group for each word). Now, print out each match's groups like this:
import re match = re.match(regex, string) if match: print('Whole match: ', match.group(0)) print('First sub-match: ', match.group(1)) print('Second sub-match: ', match.group(2))
What if you wanted to find all the matches in a particular string? The function re.finditer
is exactly what you need. It takes the same arguments as re.search
and re.match
and returns all the matches in the string. Here's an example of how to use re.finditer
to find all the words beginning with "qu":
import re sentence = "I quickly ate my quiche." matches = re.finditer("qu\w+", sentence) for match in matches: print(sentence[match.start():match.end()])
Notice the start
and end
functions we call on the resulting MatchObject
to find the position of the match in the original string. These functions return the position of the first and last characters in the match.
You can also obtain the whole match by using match.group(0)
for each match.
Now, try out re.finditer
by following the instructions in the provided template script.
If you are interested in replacing parts of strings that match a certain regular expression, re.sub
is your friend. It takes three arguments: the pattern to be matched, the replacement for all occurrences of that pattern, and the string you are operating on.
import re sentence = "I quickly ate my quiche." modifiedSentence = re.sub("qu", "kw", sentence) print(modifiedSentence)
Now, let's try out re.sub
:
"Telephone (401) 555-1234"
with the character "#".
"Another cat phrase"
with underline characters.
To split strings using regular expressions, use re.split
. It takes two arguments: the pattern used to split the string, and the string you are operating on. Let's try out re.split
:
"first,second,third"
along the commas."fir1st sec2ond thi3rd"
along digits or whitespace characters.Download the file pip.py
and
design regular expressions that match the following:
\"
.
re.finditer
by reusing/modifying the code in pip.py
for each regular expression. Also, do the following task:
re.split
. Note that paragraphs are separated by two newline characters.