Rabin Karp algorithm for pattern matching in C++

Post Views: 1,367

In this tutorial, we are going to learn about the Rabin Karp algorithm in C++ with code implementation. Rabin Karp algorithm is an optimization of the naive algorithm which is O(n*m) where

n is the text string length
m is the pattern string length.

Rabin Karp performs same as naive algorithm in worst case but it works better in general. It is highly recommended to learn the naive algorithm before proceeding ahead. If you want to learn about the naive algorithm click here.

Let us see an example :
We are given two strings pattern string and text string and we have to print the position of occurrences of the pattern in the text.
Input-
text – “ababcdabcb”
pattern – “abc”

Output-
2 6
The pattern “abc” ocuurs at index 2 and index 6.

Key Characteristics of Rabin Karp Algorithm

Like naive algorithm we slide the pattern over the string one by one and compare every character in pattern with the text.
To reduce the number of comparisons, we use hashing. We compare the hash values of the pattern and the current text window if the hash value match, then only we proceed to compare individual characters of pattern and the text window.
To calculate the hash value of the current window we use the concept of rolling hash. In rolling hash we compute hash value of current window using the hash value of the previous window.

Computing Hash Values

We use weighted hash function to calculate hash value. This also ensures to avoid spurious hit(cases where hash values are same despite the text being different).
Suppose the pattern is “abc”, we create hash values as following
hash = [ASCII of ‘a’] * t^2 + [ASCII of ‘b’] * t^1 + [ASCII of ‘c’] * t^0
where t = number of characters in universal set. [at maximum 256 if every character is present]

In general, for pattern of length m,
hash = [ASCII of ‘pat[0]’] * t^(m-1) + [ASCII of ‘pat[1]’] * t^(m-2) + …….. + [ASCII of ‘pat[m-1]’] * t^(0).

We take modulo w (%w) where w is a large prime number because hash calculated in the above equation can become very large due to powers of t. Hence to fit hash in int variable we do modulo w.

NOTE – If we use small value of w then there will be many spurious hit.

Rolling Hash

Let us see an example to understand rolling hash,
text – “abdabc”
pattern – “abc”

For the sake of example instead of using ASCII value, I am using the following
a : 1
b : 2
c : 3
d : 4

text – “a b d a b c”
m = pattern length = 3
t = 4

hash_0 = 1 * 4^2 + 2 * 4^1 + 4 * 4^0 = 28
hash_1 = 4 * {hash_0 – 1 * 4^2} + 1 = 49
hash_2 = 4 * {hash_1 – 2 * 4^2} + 2 = 70
hash_3 = 4 * {hash_2 – 4 * 4^2} + 3 = 27

Hence in general
hash_i+1 = t * {hash_i – text[i] * t^(m-1)} + text[i+m]

Now that we have learnt all the basics let us move on to see the code impelentation.

#include <bits/stdc++.h> 
using namespace std; 
const int t = 256;
const int w = 1283;

void Rabin_Karp_Algorithm(string text,string pattern){
    
    //Length of text string.
    int tlen = text.length();
    
    //Length of pattern string.
    int plen = pattern.length();
    
    int flag;
    int c=1,i=0;
    
    //Calculate hash_p(hash value of pattern) and hash_0
    int hash_p=0,hash=0;
    while(i<plen){
        hash_p=(hash_p*t + pattern[i])%w;
        hash=(hash*t + text[i])%w;
        i++;
    }
    
     //Calculate (t^(plen-1))%w
    for(int i=1;i<=plen-1;i++)
        c=(c*t)%w;
    
    i=0;
    while(i<=tlen-plen){
       if(hash_p==hash){
            flag=1;
            for(int j=0;j<plen;j++){
                if(pattern[j]==text[i+j]){
                    continue;
                }
                else{
                    flag = 0;
                    break;
                }
           }
            if(flag==1)cout<<i<<" ";
       }
       
       
       //Calculate hash value of next window
       //hash_i+1 = t * {hash_i - text[i] * t^(plen-1)} + text[i+plen]
       if(i<tlen-plen){
           hash=((t*(hash-text[i]*c))+text[i+plen])%w;
            if(hash<0){
                hash = hash + w;
            }
       }
       i++;
    }
}

          
 
int main() 
{ 
    string text = "ababcdabcb"; 
    string pattern="abc";
    Rabin_Karp_Algorithm(text,pattern);
    return 0; 
}

Output

2 6

Hope this tutorial helped you. Thank You!

Rabin Karp algorithm for pattern matching in C++

Key Characteristics of Rabin Karp Algorithm

Computing Hash Values

Rolling Hash

Leave a Reply Cancel reply