Introduction to regular expressions and a sample problem in Python

This post will prove to be useful for those who wish to learn and kickstart with the Regular Expressions (REs or Regex) in Python Programming Language.

In simple words, Regular Expressions are a sequence of characters used to match the pattern within a string. It has many practical applications and the most known one is the find and replaces the function in our text editors.

The post will prove to be helpful for those who are willing to start with REs. Before starting remember that we first import re module for implementing regex.

import re

Regular Expressions in Python

Let’s know some of the basic metacharacters used in regular expressions and their functions,

  1. – It will match anything except the newline character.
  2. \ – It will escape a metacharacter or used to indicate a sequence.
    Example:- “ . ” matches anything except newline character,
    \. ” will match only (.).
  3. \d – It will match any digit character from 0 to 9.
  4. \D – It is the complement of \d, any character except a digit.
  5. \w – Used for matching alpha-numeric characters and ‘_'(underscore).
  6. \W – It will match any character other than alpha-numeric characters and underscore.
  7. \s – It will match any whitespace character.
  8. \S – It will match any character other than the whitespace characters.
  9. [ ] – The character class matches only one character out of the several characters placed inside it.
  10. [^ ] – This character class will match any character other than the characters placed inside it.
  11. ^– It will match the start of the string with the pattern.
  12. $ – It will match the end of the string with the pattern.
  13. ( ) – It is used to group a pattern and also to capture a match.
  14. – It works as or operation i.e. matches to a single one of the given patterns.
  15. { } – Matches a character for the specified number of times as per the arguments;
    • { x } – For exactly ‘x number of times.
    • { a, } – For ‘a’ or more number of times.
    • { a, b} – For an inclusive range of ‘a’ to ‘b’ number of times.
  16. * – Asterix matches a character for zero or more times.
  17. – It matches a character one or more times.

search() and match() functions

Let’s understand them with the help of a simple code.

import re
test_input = input()
re_pattern = "xyz"
print (( re.search( re_pattern, test_input)))  #search() method
print (( re.match( re_pattern, test_input)))   #match() method

Output

xyz
<re.Match object; span=(0, 3), match='xyz'>
<re.Match object; span=(0, 3), match='xyz'>
w xyz
<re.Match object; span=(2, 5), match='xyz'>
None
w
None
None

From the above examples, we can conclude that search() and match() both functions return the object of the matched string else None.
They differ in their matching techniques as search() searches for the re_pattern in the complete test_string whereas match() starts matching for the re_pattern in the test_string from the starting.

Date Validation

import re

input1 = input()            #input pattern dd/mm/yy
re_pattern = "^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/(\d\d)$"

match = re.match(re_pattern , input1)
if (match):
    print("Valid")
else:
    print("Invalid")

Output

21/1/12
Valid

32/1/23
Invalid

1/1/12
Invalid

01/01/20
Valid

Hope this proved to be helpful. Check out here for validating the IPv4 Regex problem.

Leave a Reply

Your email address will not be published. Required fields are marked *