How to implement KMP String Matching algorithm in Python

This Python tutorial helps you to understand what is the KMP String Matching algorithm and how Python implements this algorithm. First, we will learn what is string matching then we will go for KMP string matching in Python with example.

KMP stands for Knuth Morris Pratt.

KMP String Matching in Python

Definition :

For a given string ‘S’, string matching algorithm determines whether a pattern ‘p’ occurs in the given string ‘S’.

Example:-

Input :

String = "Welcome to CodeSpeedy"

Pattern = "Code"

Output :

Pattern found at index 11.

Here, the pattern ‘Code’ found in the string at index number 11 where the index starts from number 0.

The disadvantage of a naive string matching algorithm is that this algorithm runs very slow. That means the time complexity of this algorithm is very high. To solve this problem, the KMP string matching algorithm comes into existence. It improves the time complexity of a normal string matching algorithm to O(n), linear time.

How KMP String Matching works

The working idea behind this algorithm is that whenever a mismatch is detected after some matches we know some of the characters in the given string of the next shift. This information is useful in avoiding the matching characters.

Example:-

String = “AAAAABAAAAAAAC”

Pattern = “AAAAC”

Here the pattern first checks with the string. At index 4 there will be a mismatch. Now the pattern shifts one position. That means, now the pattern starts checking from index 1. Here KMP String Matching algorithms optimizes over Normal String Matching. According to Normal String Matching algorithm, the pattern starts checking from string ‘A’, that means index 0 in pattern to the end of the pattern. Even though similar strings are present in both the pattern and in the given string from index 0 to index 3, Normal String Matching algorithm checks from the starting of the pattern.

But, KMP String Matching algorithm starts checking from index 4 of letter ‘C’ because we know first four characters will anyway match, we skipped matching first four characters. This is how optimization is done in this algorithm.

Implementation of KMP String Matching in Python

Source Code: Python program KMP string matching

def KMP_String(pattern, text):
    a = len(text)
    b = len(pattern)

    prefix_arr = get_prefix_arr(pattern, b)

  
    initial_point = []

    m = 0
    n = 0

  
    while m != a:
       
        if text[m] == pattern[n]:
            m += 1
            n += 1
      
        else:
            n = prefix_arr[n-1]

       
        if n == b:
            initial_point.append(m-n)
            n = prefix_arr[n-1]
        elif n == 0:
            m += 1
   
    return initial_point



def get_prefix_arr(pattern, b):
    prefix_arr = [0] * b
    n = 0
    m = 1

    while m != b:
        if pattern[m] == pattern[n]:
            n += 1
            prefix_arr[m] = n
            m += 1
        elif n != 0:
                n = prefix_arr[n-1]
        else:
            prefix_arr[m] = 0
            m += 1

    return prefix_arr

string = "ABABDABACDABABCABABCABAB"
pat = "ABABCABAB"

initial_index = KMP_String(pat, string)

for i in initial_index:
    print('Pattern is found in the string at index number',i)

Output :

Pattern is found in the string at index number 10

Pattern is found in the string at index number 15
You can also read,

One response to “How to implement KMP String Matching algorithm in Python”

  1. freddy says:

    This algorithm is correct in 90% cases, but in some special cases, it goes wrong.
    e.g. string = ‘ababc’, pattern = ‘abc’
    then it returns an empty initial_index list.

Leave a Reply

Your email address will not be published. Required fields are marked *