re.DOTALL in Python

In this tutorial, we will learn about re.DOTALL in Python. re.DOTALL flag can come handy while working with multi-line strings.
However, if you are not familiar with the concepts of Regular Expressions, please go through this link first.
Regular expression in python

re.DOTALL

The ‘.’ special character in Python matches with any character excluding the new line, but using DOTALL flag in python we can extend its functionality.
With the help of DOTALL flag the ‘.’ character can match any character including newline.

When to use it?

While working on real-life projects there may arise scenarios where we have to process multi-line strings(separated by newline characters ā€“ ā€˜\nā€™). In such situations, we use re.DOTALL.

Example

Suppose, from the following HTML snippet we want to display the contents of the paragraph tag. Then we can not do it using ‘.’ character alone because by default ‘.’ cannot match with newline characters.

<!DOCTYPE html>
<html>
<head>
<title>Title of the document</title>
</head>
<p>
This tutorial is provided by CodeSpeedy.
Hope you like this.
</p>
</html>

Here when we try to print the HTML paragraph using ‘.’ character only, we see that no matches are found.
import re
txt = '''<!DOCTYPE html>
<html>
<head>
<title>Title of the document</title>
</head>
<p>
This tutorial is provided by CodeSpeedy.
Hope you like this.
</p>
</html>'''
x = re.findall("<p>.*</p>", txt)
print(x)

OUTPUT

re.DOTALL in Python

No matches are found, Therefore re.findall() returns an empty list.

Let’s see how can we overcome this limitation.

import re
txt = '''<!DOCTYPE html>
<html>
<head>
<title>Title of the document</title>
</head>
<p>
This tutorial is provided by CodeSpeedy.
Hope you like this.
</p>
</html>'''
x = re.findall("<p>.*</p>", txt,re.DOTALL)
print(x)

OUTPUT

re.DOTALL in Python

With DOTALL, ‘.’ character was able to process newlines as well which it couldn’t alone.
Therefore, using the re.DOTALL flag, we can match patterns with multiple lines span.

Leave a Reply

Your email address will not be published. Required fields are marked *