Longest Proper Prefix Suffix Array in C++ efficient approach(precursor to KMP algorithm)

In this tutorial, we are going to learn about how to construct the longest proper prefix suffix array(LPS array) in C++. In this blog, we are going to discuss an efficient solution. If you want, read about naive approach first then click here. The efficient algorithm has Θ(n) time complexity.

For a string “VTRDF” the prefix are – “”, “V”, “VT”, “VTR”, “VTRD”, “VTRDF”
but proper prefix are – “”, “V”, “VT”, “VTR”, “VTRD” (length of proper prefix is less than length of given string)
similarly the suffix are – “”, “F”, “DF”, “RDF”, “TRDF”, “VTRDF”.

Let us see an example:

We are given a string text as input and we need to construct an LPS array.
Input : “abacabad”

Output : LPS array is  0 0 1 0 1 2 3 0
LPS[i] stores the length of the longest proper prefix suffix of text[0…i]
for text = “pqprpqps”
LPS[0] = 0 (because length of proper prefix of any single character string is always 0)
LPS[1] = 0 (for “pq” there is no prefix which is also a suffix.)
LPS[2] = 1 (for “pqp” longest prefix which is also a suffix =”p”)
LPS[3] = 0 (for “pqpr” there is no prefix which is also a suffix)
LPS[4] = 1 (for “pqprp” longest prefix which is also a suffix =”p”)
LPS[5] = 2 (for “pqprpq” longest prefix which is also a suffix =”pq”)
LPS[6] = 3 (for “pqprpqp” longest prefix which is also a suffix =”pqp”)
LPS[7] = 0 (for “pqprpqps” there is no prefix which is also a suffix)

Basic Idea: Longest Proper Prefix Suffix Array in C++

There are two cases for the approach

For any index i in the input string text let us consider len = Lps at index (i-1) = Lps[i-1]

Case 1: Now if text[len] and text[i] are same then Lps[i] = len + 1

Case 2:  If text[len] and text[i] are not same

Subcase a – If len =0 then Lps[i] = 0
Subcase b – If len != 0 then recursively call the function for len = lps[len-1] and compare text[i] and text[len].

Let us see the code implementation.

#include <iostream>
#include <bits/stdc++.h> 
using namespace std;
void lps_func(string txt, vector<int>&Lps){
    Lps[0] = 0;                   //Lps[0] is always 0
    int len = 0;
    int i=1;
    while (i<txt.length()){
        if(txt[i]==txt[len]){    //Case 1
            len++;
            Lps[i] = len;
            i++;
            continue;
        }
        else{                   //Case 2
            if(len==0){         //Subcase 1
                Lps[i] = 0;
                i++;
                continue;
            }
            else{              //Subcase 2
                len = Lps[len-1];
                continue;
            }
        }
    }
}

int main()
{
    string text = "pqprpqps";
    vector<int>Lps(text.length());
    lps_func(text,Lps);
    cout << "LPS array is ";
    for(int i=0;i<text.length();i++){
        cout<<Lps[i]<<" ";
    }
    
    return 0; 
}

Output –

LPS array is 0 0 1 0 1 2 3 0

Understanding the Algorithm

Let the input text be “pqprpqps

We explicitly put Lps[0] = 0, initialize len=0 and run a loop from i=1 to i=length of text -1.

i=1 => String being considered is “pq“. We compare text[len] (len=0) and text[i], and we fall into case 2 (subcase-a) hence we put Lps[1] = 0 and move on to next character at i=2.

i=2 => String being considered is “pqp“. We compare text[len] which is text[0] and text[2], and we fall into case 1. Therefore Lps[2] = 1, and update len = 1.

i=3 => The concerned string is “pqpr“. Compare text[len] (len=1) which is text[1] and text[3], and we fall into case 2 (subcase-b). Now new len = Lps[len-1] = Lps[0] = 0.  Again we run through the while loop and compare text[len] which is text[0] and text[3], and we fall into case 2 (subcase-a). Thus Lps[3] = 0 and len =0. Note that here we are going through while loop twice for a single character.

i=4 => For i =4 string is “pqprp” and len = 0. Compare text[0] and text[4], it falls to case 1 thus Lps[4] = 1 and len=1.

i=5 =>Here the string becomes “pqprpq” and len =1. Compare text[1] and text[5], and we fall into case 1 thus Lps[5] =2 and len = 2.

i=6 => The concerned string is “pqprpqp” and len =2. Compare text[2] and text[6], and we fall into case 1 thus Lps[6] =3 and len = 3.

i=7 => String becomes “pqprpqps” and len =3. Comparing text[3] and text[7], we see that it falls into case 2 (subcase-b). Now we update new len = Lps[len-1] = Lps[2] = 1. Going through while loop again and comparing text[1] and text[7] we fall into case 2(subcase-b). Thus we update new len = Lps[len-1] = Lps[0] = 0. Once more we traverse through while loop and compare text[0] and text[7] and go into case 1. Thus Lps[7] = 0 and we terminate the loop.

Note

For i = 3 and i = 7 we fall into case 2(subcase-b). For such cases, we iterate through while loop again till we fall into one of either case 1 or case 2 (subcase-a). And it may seem that the time complexity should be O(n^2) but if we look carefully whenever we fall into case 2 (subcase-b) we are going to do at most len matches again. Hence we might have to travel at most len characters again. Thus overall the time complexity is Θ(n).

Hope this tutorial helped you. Thank You!

Also read: How to Sort an Array contains 0s,1s ,2s in C++

Leave a Reply

Your email address will not be published. Required fields are marked *