Knuth-Morris-Pratt (KMP) Algorithm in C++

In this tutorial, we are going to learn about the KMP algorithm in C++ with code implementation. There are other algorithms like Naive Algorithm and Rabin Karp Algorithm which are also used for pattern matching. If we compare the algorithms then Naive Algorithm and Rabin Karp takes O((n-m)*m) time complexity but Rabin Karp performs better than Naive algorithm in general. On the other hand, KMP algorithms work with O(n) time complexity and O(m) extra space. Here m and n are the length of pattern string and text string respectively.

**Note:-  Understanding of LPS array is a must for learning the KMP algorithm. If you want to learn about LPS array please click here

Let us see an example :
We are given two strings pattern string and text string and we have to print the position of occurrences of the pattern in the text.
Input-
text – “ababcdabcb”
pattern – “abc”

Output-
2 6
The pattern “abc” occurs at index 2 and index 6.

Understanding the algorithm

KMP algorithm is just like the naive algorithm when there is a match, but whenever there is a mismatch KMP algorithm uses LPS array of the pattern to decide what to do ahead.

Let us take an example,

text – “aaebcaaeaaebcaadaa”

pattern – “aaebcaadaa”

Output – “8”

The LPS array for the pattern is {0, 1, 0, 0, 0, 1, 2, 0, 1, 2}

“aaebcaaeaaebcaadaa”=>(text)
“aaebcaadaa”———— =>(pattern)

At the start, the pattern starts from the beginning. The algorithm starts comparing each character in pattern and text. Till i=7, j=7 it runs like the naive algorithm(where i = text index and j = pattern index) but at i=6, j=6 there is a mismatch between ‘e’ and ‘d’. At this point, the algorithm uses LPS array. Whenever there is a mismatch,the algorithm updates index j of the pattern to LPS of last matched character. Therefore, in our example, the last matched character is ‘a'(index j =6) hence we update j as LPS[j-1 = 6]. And again the algorithm starts comparing text[i] and pattern[j]. So after updating the value of j we get j = 2 and algorithm starts comparing text[i=7] and pattern[j=6].

Note that the algorithm essentially did shifts the pattern all the way to index i = 5 from i=0 as shown below,

“aaebcaaeaaebcaadaa”
——–“aaebcaadaa”—-

Hence we have saved the computations for the text – “aaebcaa” which would not be possible if we use the naive algorithm. This is where the KMP algorithm gives better results. Let us continue with our example. The algorithm compares until another mismatch which occurs for i=8 between ‘a’ and ‘b’. As a result, the algorithm refers to LPS array and since the last matched character is ‘e’, the index j gets updated to LPS[2] which is equal to 0. Thus the pattern shifts ahead as shown below,

“aaebcaaeaaebcaadaa”
————-“aaebcaadaa”

The algorithm compares all characters and outputs 8.

Let us see the code implementation of the algorithm.

Code

Below is our C++ code for Knuth-Morris-Pratt (KMP) Algorithm:

#include <iostream>
#include <bits/stdc++.h> 
using namespace std;

/**************************************
              LPS function
***************************************/          
void lps_func(string txt, vector<int>&Lps){
    Lps[0] = 0;                   
    int len = 0;
    int i=1;
    while (i<txt.length()){
        if(txt[i]==txt[len]){   
            len++;
            Lps[i] = len;
            i++;
            continue;
        }
        else{                   
            if(len==0){         
                Lps[i] = 0;
                i++;
                continue;
            }
            else{              
                len = Lps[len-1];
                continue;
            }
        }
    }
}

/**************************************
              KMP Function
***************************************/  
void KMP(string pattern,string text){
    int n = text.length();
    int m = pattern.length();
    vector<int>Lps(m);
    
    lps_func(pattern,Lps); // This function constructs the Lps array.
    
    int i=0,j=0;
    while(i<n){
        if(pattern[j]==text[i]){i++;j++;} // If there is a match continue.

        if (j == m) { 
            cout<<i - m <<' ';    // if j==m it is confirmed that we have found the pattern and we output the index
                                  // and update j as Lps of last matched character.
            j = Lps[j - 1]; 
        } 
        else if (i < n && pattern[j] != text[i]) {  // If there is a mismatch
            if (j == 0)          // if j becomes 0 then simply increment the index i
                i++;
            else
                j = Lps[j - 1];  //Update j as Lps of last matched character
        }
    }
}

int main()
{
    string text = "ababcdabcb";
    string pattern = "abc";
    KMP(pattern, text);
    
    return 0; 
}

Output:-

2 6

Hope you understood the algorithm. Personally, I would recommend to dry run the code on at least two examples of your own to get a good grasp on the algorithm.

Thank You!

Leave a Reply

Your email address will not be published. Required fields are marked *