SequenceMatcher in Python

The topic of this tutorial: SequenceMatcher in Python using difflib.

introduction: 

String is an interesting topic in programming. We use so many methods and build-in functions to program strings. SequenceMatcher class is one of them. With the help of SequenceMatcher we can compare the similarity of two strings by their ratio. For this, we use a module named “difflib”. By this, we import the sequence matcher class and put both of the strings into it.

We can use only two strings where it compares string two with string 1 and shows the ration of how string two is similar to string one. It is a better idea to compare two strings with a few lines of code. The idea behind this is to find the longest matching subsequence which should be continued and compare it with full string and then get the ration as output.

#import the class
from difflib import SequenceMatcher
s1 = "gun" 
s2 = "run"
sequence = SequenceMatcher(a=s1 , b=s2) #comparing both the strings
print(sequence.ratio())

output:0.6666666666666666

This “difflib” class also provides some extra features. But two features are mostly used for programs first one is get_close_matches and differ.

With get_close_matches we compare a particular list of string elements with a given string and find out those strings who are close to the given cutoff. The below code will explain this very well.

from difflib import SequenceMatcher , get_close_matches
s1 = "abcdefg"
list_one = ["abcdefghi" , "abcdef" , "htyudjh" , "abcxyzg"]
match = get_close_matches(s1,list_one , n=2 , cutoff=0.6)
print(match)

output:

['abcdef' , 'abcdefghi']

In the get_close_matches class I am defining four things:

s1 : Takes the string s1

list1: Takes the list list1

n : How many strings I want in my output it can be any number but should be less than total elements in the list.

cutoff: Defining how much ratio I want between them.

There is another important method we use of this module named differ.

Differ compare two texts which contain some sentences and give common sentences in output. Let me explain in the code.

from difflib import Differ
text1 = '''
hello world!
i like python and code in it.'''.splitlines()
text2 = '''
hello world!
i like java and coding'''.splitlines()
dif = Differ()
df = list(dif.compare(text1 , text2))

from pprint import pprint
pprint(df)

output:

[' ', ' hello world!',

'- i like python and code in it.',

'+ i like java and coding']

Here in output, We can see “hello world!” is common in both the strings, So it is printing only one time. But the rest of the content is different so it is printing separately.

SequenceMatcher class is mostly used for comparing two string. Which comes in many programming challenges. Even it reduces the time complexity and makes the code more efficient.

Also read:

Leave a Reply