Introduction to regular expressions and a sample problem in Python
This post will prove to be useful for those who wish to learn and kickstart with the Regular Expressions (REs or Regex) in Python Programming Language.
In simple words, Regular Expressions are a sequence of characters used to match the pattern within a string. It has many practical applications and the most known one is the find and replaces the function in our text editors.
The post will prove to be helpful for those who are willing to start with REs. Before starting remember that we first import re module for implementing regex.
import re
Regular Expressions in Python
Let’s know some of the basic metacharacters used in regular expressions and their functions,
- . – It will match anything except the newline character.
- \ – It will escape a metacharacter or used to indicate a sequence.
Example:- “ . ” matches anything except newline character,
” \. ” will match only (.). - \d – It will match any digit character from 0 to 9.
- \D – It is the complement of \d, any character except a digit.
- \w – Used for matching alpha-numeric characters and ‘_'(underscore).
- \W – It will match any character other than alpha-numeric characters and underscore.
- \s – It will match any whitespace character.
- \S – It will match any character other than the whitespace characters.
- [ ] – The character class matches only one character out of the several characters placed inside it.
- [^ ] – This character class will match any character other than the characters placed inside it.
- ^– It will match the start of the string with the pattern.
- $ – It will match the end of the string with the pattern.
- ( ) – It is used to group a pattern and also to capture a match.
- | – It works as or operation i.e. matches to a single one of the given patterns.
- { } – Matches a character for the specified number of times as per the arguments;
- { x } – For exactly ‘x‘ number of times.
- { a, } – For ‘a’ or more number of times.
- { a, b} – For an inclusive range of ‘a’ to ‘b’ number of times.
- * – Asterix matches a character for zero or more times.
- + – It matches a character one or more times.
search() and match() functions
Let’s understand them with the help of a simple code.
import re test_input = input() re_pattern = "xyz" print (( re.search( re_pattern, test_input))) #search() method print (( re.match( re_pattern, test_input))) #match() method
Output
xyz <re.Match object; span=(0, 3), match='xyz'> <re.Match object; span=(0, 3), match='xyz'>
w xyz <re.Match object; span=(2, 5), match='xyz'> None
w None None
From the above examples, we can conclude that search() and match() both functions return the object of the matched string else None.
They differ in their matching techniques as search() searches for the re_pattern in the complete test_string whereas match() starts matching for the re_pattern in the test_string from the starting.
Date Validation
import re input1 = input() #input pattern dd/mm/yy re_pattern = "^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/(\d\d)$" match = re.match(re_pattern , input1) if (match): print("Valid") else: print("Invalid")
Output
21/1/12 Valid 32/1/23 Invalid 1/1/12 Invalid 01/01/20 Valid
Hope this proved to be helpful. Check out here for validating the IPv4 Regex problem.
Leave a Reply