Regular expression in python

Regular expression (regex) is a special sequence of characters that aid us to match or find out the string or set of string, using a specialized syntax held in a pattern. The inbuilt module re in python helps us to do a string search and manipulation. It is used for web scraping. Import the module in your program.

import re

Quick guide to regex: Python

  • + = 1 or more {For ex – (0-9)+ will match 2, 14, 543 etc.}.
  • * = 0 or more {the match will succeeds regardless of the presence of the search string}.
  • . = matches any single character except the newline {For ex – “.en” will match hen, ten, men}.
  • ^ = matches the expression if at the start of the string {For ex – “^.en” would match hen, ten if located at the start of the string }.
  • [] = matches the single character within the bracket {For ex – “[th]en” will match ten, hen}.
  • [^] = matches a single character NOT contained within the bracket {For ex – “[^w]hen” will match then, but not when }.
  • \w= it matches the word character: [A-Za-z0-9].
  • \W= it matches the non-word character: [^A-Za-z0-9].
  • \d= it matches the digit: [0-9].
  • \D= it matches a non-digit: [^0-9].
  • \s= it matches a white space characters: [\t\r\n\f].
  • \S= it matches a non-white space characters: [^\t\r\n\f].
  • \A = matches beginning of the string.
  • \z = matches the end of the string.
  • re{n,} = n or more occurrences.
  • \Z = matches the end of the string. If a new line exists, it just matches before new line.
  • \G = matches point where last match finished.
  • $ = matches the characters at the end {For ex – “.at$” would match cat, hat, sat if located at the end of the string}.
  • a|b = matches either a or b.
  • re? = matches 0 or 1 occurrence of the preceding expression {For ex – “sleepy?” would match sleep or sleepy. Here y is optional}.
  • re{n} = matches exactly n occurrences.
  • re{m, n} = at least m and at most n occurrences.
  • ( = it shows that where the extraction is started.
  • ) = it shows that where the extraction is ended.
  • re.I = perform case insensitive matching.
  • re.M = makes $ match the end of a line and makes ^ match the start of any line.
  • re.S = makes a .(dot) match any character, including a newline.

Basic methods that are used in regex : Python

 

Match() method:

Then match() method will search for the pattern in the start of the string.
In mat1, the pattern(‘i understand’) is at the start of the string(str), therefore gives output match found.
In mat2, the pattern(‘understand’) is not at the start of the string(str), therefore gives output no match found.

Note: Group() method will return all matching subgroups of a tuple(empty if there were not any).

import re
str ="i understand the concept of regular expression" 
mat1=re.match(r'i understand',str)
mat2=re.match(r'understand',str)
if mat1:
    print("match found: " + mat1.group())
else:
    print("No match found for mat1")
if mat2:
    print("match found: " + mat2.group())
else:
    print("No match found for mat2")

Output:




match found: i understand
No match found for mat2

Search() method:

Search() method will search for the pattern(‘think’) in the whole string(str), if the pattern is found in the string then it will return search found else search not found.

import re
str=" My name is Apoorva Gupta "
searchresult=re.search(r'Apoorva',str)
if searchresult:
        print("search found: ",searchresult.group())
else:
        print("search not found")

Output:

search found: Apoorva

Sub() method:

Sub() method will replace the pattern with the repl string.
Syntax: re.sub(pattern, repl, string).

import re
house_addr = "48A- Aptitude Apartment,Civil Lines, Delhi"

pat1 = re.sub(r'-.\D+', " # 48 number house in block A", house_addr)
print("Apartment number : ",pat1)

pat2 = re.sub(r',.+', "", house_addr)

pat= re.sub(r'\d+\D-', "", pat2)
print("Apartment name : ",pat)

pat = re.sub(r'\d+\D- [A-Z][a-z]+ [A-Z][a-z]+,',"",house_addr)
print("New house_addr : ",pat)

Output:

Apartment number :  48A # 48 number house in block A
Apartment name :   Aptitude Apartment
New house_addr :  Civil Lines, Delhi

Findall() method:

Findall() method will tell that how many times the pattern has occurred in the string.

import re
string="Each one is different one."
word=re.findall('one',string)
print(word[0])
print(word[1])

Output:

one
one

A program to understand regular expression: Python

  1. Import re module and give the string in which you have to find out the pattern.
  2. [0-9]+ will find out all the numbers in the string.
  3. \[email protected]\S+ means the non-white space character attached with the @ on both of its sides.
  4. [A-Z][a-z]+[^0-9] means that the first letter should be capital followed by 1 or more other alphabets without any digits.
  5. [A-Z][a-z]+\d+ means that the first letter should be capitalized followed by other alphabets with 1 or more digits.
import re

string= 'my favourite 3 numbers are 7 , 8 and 6; my email id is [email protected] and [email protected] .; today is June14, June04 nice day, Dec12.'

# Extracting the numeric digit from a string by regular expression.
li = re.findall('[0-9]+', string)    
print(li)

# Extracting the emails from the string By regular expression.
lst = re.findall('\[email protected]\S+', string)    
print(lst)

# To get the months of each date we can use the following pattern
regex = r"[A-Z][a-z]+[^0-9]"
matches = re.findall(regex,string)
for match in matches:
    print("Match month:",(match))

# To get the momths with the dates
regex = r"[A-Z][a-z]+\d+"
matches = re.findall(regex,string)
for match in matches:
    print("Full match:",(match))

Output:

['3', '7', '8', '6', '02', '0698', '14', '04', '12']
['[email protected]', '[email protected]']
Match month: June
Match month: June
Match month: Dec
Full match: June14
Full match: June04
Full match: Dec12

Go and check other tutorials on python:

Function argument in Python

Python File Handling


Leave a Reply

Your email address will not be published. Required fields are marked *