Longest Unique Substring using Sliding Window in C++

In this tutorial, we will learn about the concept of Sliding Window and apply it to solve the Longest Unique Substring problem.

Given a string, the task is to find the longest unique substring in it. A unique substring of a string is one where there is no repeated character, that is, there is no character with a frequency greater than one. There may be multiple longest unique substrings in a string, we will find out any one of them.

Examples

  1. Input: “abba”
    Output: ab
    Explanation:    There are two unique substrings of length = 2. They are “ab”, “ba”.
                            There is no unique substring of length more than 3.
  2. Input: “ababc”.
    Output: abc
    Explanation:    There is only one unique substring of length = 3. It is “abc”.                                   
                            There is no unique substring of length more than 3.
  3. Input: “abcbdacbb”
    Output: cbda
    Explanation:    There are three unique substrings of length = 4.
                            They are “cbda”, “bdac”, “dacb”.
                            There are no unique substrings of length more than 4.

Terminologies

Window: – A window is a range of elements in an ordered and iterable data-structure, satisfying some given constraints. It has two main attributes – opening index (inclusive), closing index (exclusive). Window size is another attribute that can be derived from the main attributes.

                                    Window Size = Closing Index – Opening Index

The window is represented as [Opening Index, Closing Index).

Sliding Window: – A sliding window is an abstract problem-solving technique used to solve problems on ordered and iterable data-structures like arrays, strings, lists, etc. It can be used to solve problems based on sub-arrays or sub-strings.

The technique is to maintain a window by enforcing some constraints as per the problem. Whenever a constraint is violated, the algorithm “slides” the window (in one direction only) by changing the opening and closing indexes, such that the new window satisfies all the constraints.

The fact that the change is unidirectional (opening and closing indexes are always either increased or decreased) makes the time complexity of this technique O (N), N being the size of the string.

Idea

Now, let us come to the Longest Unique Substring problem.

Here, we will be applying the Sliding Window technique.

The constraint enforced on the window will be that the window will NOT contain any repeated characters.

We will maintain this constraint throughout the process and update our longest unique substring attributes as per the size of the windows.

Algorithm

  • Take care of the edge case. If the string is empty, return an empty string.

  • Since there are at most 256 characters (ASCII = 0 to 255), maintain a list of 256 indexes to store the index of the previous occurrence of a character while traversing the string. Initialize all elements in the list with -1.

  • Initialize the current window and largest window as [0, 1).

  • Update the previous occurrence of the first character as 0.

  • Traverse the string and in each iteration and do the following: –

    • If the next character in the string is already present in the current window, that is, the previous occurrence of the character is greater than or equal to the opening index of the current window, then update the opening index of the current window.

                      Opening Index = 1 + Previous occurrence of the character

    • Update the previous occurrence of the character.

    • Previous occurrence of the character = Closing Index

    • Update the closing index of the current window.

    • Update the largest window, if required.

C++ Implementation of the above Algorithm

Below is our C++ code:

#include <bits/stdc++.h>
using namespace std;

// Returns the longest unique substring in the string.
string longest_unique_substring (string str)
{
    int n = str.size();
    // Length of the string.
   
    // If the string is empty, return an empty string.
    if (n == 0) 
    {
        return "";
    }
       
    int opening_index, closing_index, window_size;
    // To store the opening index (included), closing index (excluded) and the size of the window.

    int max_length, largest_opening_index, largest_closing_index;
    // To store the largest window attributes.
    
    // Current window = [0..0]
    opening_index = 0, closing_index = 1;
    
    // Initialize largest window with current window.
    largest_opening_index = 0, largest_closing_index = 1;
    max_length = 1;
    
    vector <int> prev_occurence (256, -1);
    // Stores the previous occurence of a character while traversing.
        
    prev_occurence[str[0]] = 0;

    // Computing length of longest unique substring using sliding window concept.
    while (closing_index < n)
    {
        /*
            If the next character (say, c) is already present in the current window, that is, 
            the previous occurence of c is greater than or equal to the opening index of the 
            current window, then update the opening index of the current window.
        */
        if (prev_occurence[str[closing_index]] >= opening_index) 
        {
            opening_index = prev_occurence[str[closing_index]] + 1;
        }
        
        // Update the previous occurence of the character.
        prev_occurence[str[closing_index]] = closing_index;

        // Update closing index of the current window for the next iteration.
        closing_index += 1;

        // Compute the new window size.  
        window_size = closing_index - opening_index;
            
        // Update the largest window, if required.
        if (window_size > max_length)
        {
            max_length = window_size;
            largest_opening_index = opening_index;
            largest_closing_index = closing_index;
        }
    }
    
    // Extract the longest unique substring from the string. 
    string l (str, largest_opening_index, max_length);

    return l;
}

// Driver Code.
int main ()
{
    // Taking a string to test the code.
    string s = "abcbdacbb";
    
    // Computing the longest unique substring.
    string l = longest_unique_substring (s);
    
    cout << "Longest unique substring = " << l << "\n";
    
    cout << "Length of the longest unique substring = " << l.size();
   
    return 0;
}

Output

Longest unique substring = cbda
Length of the longest unique substring = 4

Complexity Analysis

The Sliding Window technique is unidirectional.
Therefore, the time complexity of the above code is O (N) [N being the size of the string].

We are using a vector of size 256 to store the previous occurrence of a character while traversing the string. But this size is independent of the size of the string.
Therefore, the space complexity of the above code is O (1).

Leave a Reply

Your email address will not be published. Required fields are marked *