Basics of Python – Regular Expression Module

In this Page, We are Providing Basics of Python – Regular Expression Module. Students can visit for more Detail and Explanation of Python Handwritten Notes Pdf.

Basics of Python – Regular Expression Module

Regular expression module

A regular expression (also called RE, or regex, or regex pattern) is a specialized approach in Python, using which programmers can specify rules for the set of possible strings that need to be matched; this set might contain English sentences, e-mail addresses, or anything. REs can also be used to modify a string or to split it apart in various ways.

Meta characters

Most letters and characters will simply match themselves. For example, the regular expression test will match the string test exactly. There are exceptions to this rule; some characters are special “meta characters”, and do not match themselves. Instead, they signal that some out-of-the-ordinary thing should be matched, or they affect other portions of the RE by repeating them or changing their meaning. Some of the meta characters are discussed below:

Meta character

Description

Example

[ ]

Used to match a set of characters.

[time]

The regular expression would match any of the characters t, i, m, or e.

[a-z]

The regular expression would match only lowercase characters.

Used to complement a set of characters. [time]

The regular expression would match any other characters than t, i, m or e.

$

Used to match the end of string only. time$

The regular expression would match time in ontime, but will not match time in timetable.

*

Used to specify that the previous character can be matched zero or more times. tim*e

The regular expression would match strings like timme, tie and so on.

+

Used to specify that the previous character can be matched one or more times. tim+e

The regular expression would match strings like timme, timmme, time and so on.

?

Used to specify that the previous character can be matched either once or zero times. tim ?e

The regular expression would only match strings like time or tie.

{ }

The curly brackets accept two integer values. The first value specifies the minimum number of occurrences and the second value specifies the maximum of occurrences. tim{1,4}e

The regular expression would match only strings time, timme, timmme or timmmme.

Regular expression module functions

Some of the methods of re module as discussed below:

re. compile ( pattern )
The function compile a regular expression pattern into a regular expression object, which can be used for matching using its match ( ) and search ( ) methods, discussed below.

>>> import re
>>> p=re.compile ( ' tim*e ' )

re. match ( pattern, string )
If zero or more characters at the beginning of the string match the regular expression pattern, match-( ) return a corresponding match object instance. The function returns None if the string does not match the pattern.

re. group ( )
The function return the string matched by the RE.

>>> m=re .match ( ' tim*e' , ' timme pass time ' )
>>> m. group ( )
' timme '

The above patch of code can also be written as:

>>> p=re. compile ( ' tim*e ' )
>>> m=p.match ( ' timme pass timme ' )
>>> m.group ( )
'timme'

re. search ( pattern, string )
The function scans through string looking for a location where the regular expression pattern produces a match, and returns a corresponding match object instance. The function returns None if no position in the string matches the pattern.

>>> m=re.search( ' tim*e ' ' no passtimmmeee ' )
>>> m.group ( )
' timmme '

The above patch of code can also be written as:

>>> p=re.compile ( ' tim*e ' )
>>> m=p.search ( ' no passtimmmeee ' )
>>> m.group ( )
' timmme '

re.start ( )
The function returns the starting position of the match.

re.end ( )
The function returns the end position of the match.

re.span ( )
The function returns a tuple containing the ( start, end ) indexes of the match.

>>> m=re.search ( ' tim*eno passtimmmeee ' )
>>> m.start ( )
7
>>> m.end ( )
13
>>> m.span ( )
( 7 , 13 )

The above patch of code can also be written as:

>>> p=re.compile ( ' tim*e ' )
>>> m=p.search ( ' no passtimmmeee ' )
>>> m.start ( )
7 
>>> m.end ( )
13
>>> m.span ( )
( 7 , 13 )

re. findall ( pattern, string )
The function returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

>>> m=re.findall ( ' tim*e ' , ' timeee break no pass timmmeee ' )
>>> m 
[ ' time ' , ' timmme ' ]

The above patch of code can also be written as:

>>> p=re . compile ( ' tim*e ' )
>>> m=p.findall ( ' timeee break no pass timmmeee ' )
>>> m
[ ' time ', ' timmme ' ]

re. finditer ( pattern, string )
The function returns an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in a string. The string is scanned left-to-right, and matches are returned in the order found.

>>> m=re.finditer ( ' tim*e ', ' timeee break no pass timmmeee ' )
>>> for match in m :
. . .           print match.group ( )
. . .           print match.span ( )
time 
( 0 , 4 ) 
timmme 
( 21 , 27 )

The above patch of code can also be written as:

>>> p=re.compile( ' tim*e ' )
>>> m=p.finditer ( ' timeee break no pass timmmeee ' )
>>> for match in m :
. . .         print match.group ( )
. . .         print match.span ( )
time 
( 0 , 4 ) 
timmme
( 21 , 27 )