Construct a String from another String using Suffix Trie in Python

In this tutorial, we are going to take a look at the task of making a string from substrings of another string using a data structure known as Suffix Trie in Python. Specifically, we’ll look to build the string with a minimum number of substrings of the other string. 

Let’s take a look at an example to understand our task better:

Consider these two strings: 

String 1: ‘abbcba’

String 2: ‘bbacbcba’

And the task here is to construct String 1 from substrings of string 2 with a minimum number of substrings of string 2.

Although, there are a lot of possible ways to build the String 1 using substrings from String 2, the minimum number of strings would be used in the following solution:

‘a’, ‘bb’, ‘cba’ as the number of substrings used is 3 which is minimal.

 

Now, let’s briefly talk about what is a suffix trie?

What is a Suffix Trie?

A suffix trie is a retrieval data structure for strings. The suffix trie is a tree-like structure that stores all the suffixes of a given string. For example, the string ‘match’ would have the following strings stored in its suffix trie:

  • match
  • atch
  • tch
  • ch
  • h

 

The suffix trie data structure for a string is useful for different kinds of query processing on a string such as finding a substring, counting the number of occurrences of a certain substring in the string, etc.

How to construct a string from substrings of another string using Suffix trie?

For this task, we’ll take the following steps:

  1. Build the suffix trie of the string from which we need to find the substrings (String 2). 
  2. Start with the first letter of the string that needs to be built (String 1). Mark this as substring.
  3. Keep increasing the substring by 1 character at a time from string 1 until the substring is not found in the suffix trie of string 2.
  4. Remove the last letter of the substring due to which the substring was not found in String 2 and append this substring (if length>=1) (without the last letter) to our answer. But if the length of this substring is 0, then we can straightaway return False as this means that String 1 cannot be built with substrings from string 2.
  5. Check whether we’ve reached the end of the string 1. If reached the end then return the answer list and length of answer list (this is the number of substrings).
  6. Mark the removed letter from the previous string as start and repeat steps 2-5.

 

Implementing the code in Python

Below is the given Python program to construct a String from another String using Suffix Trie:

# Implementing Trie using Trie and TrieNode classes
class TrieNode: 
      
    # Trie node class 
    def __init__(self): 
        self.children = [None]*26
  
        # property for representing the end of a word in the trie 
        self.isEndOfWord = False
  
class Trie: 
      
    # Trie data structure class 
    def __init__(self): 
        self.root = self.getNode() 
  
    def getNode(self): 
      
        # Returns new trie node with Null values
        return TrieNode() 
  
    def _charToIndex(self,ch): 
          
        """ private helper function 
            Converts key current character into index 
            only chracters a-z allowed in lowercase
        """
          
        return ord(ch)-ord('a') 
  
  
    def insert(self,key): 
          
        """ If word is already present in trie,
            just marks the leaf node to indicate
            If word isn't present then it creates the word in the trie
        """
        word = self.root 
        length = len(key) 
        for level in range(length): 
            index = self._charToIndex(key[level]) 
  
            # if character is not present in trie 
            if not word.children[index]: 
                word.children[index] = self.getNode() 
            word = word.children[index] 
  
        # mark last node as leaf 
        word.isEndOfWord = True
  
    def search(self, key): 
          
        """ Search substring in the trie 
            Returns true if substring is present 
            in trie, else false
        """
        word = self.root 
        length = len(key)
        level=0
        while level<length: 
            index = self._charToIndex(key[level]) 
            if not word.children[index]: 
                return False
            word = word.children[index]
            level+=1

        if level==length:
            return True
        else:
            return False
       

#Task1

def build_from_substrings(S,T):
    """
        Input: Two strings, S and T. The strings
        will consist only of lowercase a-z characters,
        and both will be non-empty, and can be of any size.
        Output: The smallest set of substrings of S which, when
        concatenated, equal T. In the form of a list of tuples.
        Complexity: O(N^2+M) where:
            • N is the number of characters in S
            • M is the number of characters in T
    """
    # handling when length of S is 1
    if len(S)==1:
        for i in range(len(T)):
            if T[i]!=S:
                return False
        return [(0,0)]*len(T)
    else:
        # creating suffix trie
        x=Trie()
        for i in range(len(S)):
            x.insert(S[i:])

        start_pos=0
        substrings=[]
        y=True
        k=1

        # searching substrings in the trie
        while k<=len(T):
            y=x.search(T[start_pos:k])
            if y==False:
                # when search is unsuccessful for a
                # single lettered substring then we know
                # that it doesn't exist in the word
                # so we return false
                if k==start_pos+1:
                    return False
                
                elif k!=start_pos+1:
                    # when search fails for a substring
                    # greater than length =1
                    # then we use the completed search of
                    # the substring before the last character
                    # was added to the search
                    # and append that substring to our result
                    sub=T[start_pos:k-1]
                    lt=len(sub)
                    m=S.find(sub)
                    substrings.append((m,m+lt-1))
                    start_pos=k-1
                    k=k-1
                    y=True
            elif y==True and k==len(T):
                # for the last substring of our word
                # we check whether we have reached the
                # last letter
                sub=T[start_pos:]
                lt=len(sub)
                m=S.find(sub)
                substrings.append((m,m+lt-1))
            k=k+1

        if y==True and substrings==[]:
            # handling the case when whole string exists
            # in the given word
            return [(S.find(T),len(T)-1)]
        else:
            return substrings

 

Let’s run the example given in the first section of this article:

build_from_substrings('bbacbcba','abbcba')

Output:

[(2, 2), (0, 1), (5, 7)]

The output here is the required solution of the given task in the form of tuples where each tuple represents a substring and consists of a start index and an end index of the substring that it represents.

  • (2,2) represents ‘a’
  • (0,1) represents ‘bb’
  • (5,7) represents ‘cba’

 

Thank you for sparing your valuable time to read this article. You can check out other articles as well:

 

Leave a Reply

Your email address will not be published. Required fields are marked *