Construct a String from another String using Suffix Trie in Python
In this tutorial, we are going to take a look at the task of making a string from substrings of another string using a data structure known as Suffix Trie in Python. Specifically, we’ll look to build the string with a minimum number of substrings of the other string.
Let’s take a look at an example to understand our task better:
Consider these two strings:
String 1: ‘abbcba’
String 2: ‘bbacbcba’
And the task here is to construct String 1 from substrings of string 2 with a minimum number of substrings of string 2.
Although, there are a lot of possible ways to build the String 1 using substrings from String 2, the minimum number of strings would be used in the following solution:
‘a’, ‘bb’, ‘cba’ as the number of substrings used is 3 which is minimal.
Now, let’s briefly talk about what is a suffix trie?
What is a Suffix Trie?
A suffix trie is a retrieval data structure for strings. The suffix trie is a tree-like structure that stores all the suffixes of a given string. For example, the string ‘match’ would have the following strings stored in its suffix trie:
The suffix trie data structure for a string is useful for different kinds of query processing on a string such as finding a substring, counting the number of occurrences of a certain substring in the string, etc.
How to construct a string from substrings of another string using Suffix trie?
For this task, we’ll take the following steps:
- Build the suffix trie of the string from which we need to find the substrings (String 2).
- Start with the first letter of the string that needs to be built (String 1). Mark this as substring.
- Keep increasing the substring by 1 character at a time from string 1 until the substring is not found in the suffix trie of string 2.
- Remove the last letter of the substring due to which the substring was not found in String 2 and append this substring (if length>=1) (without the last letter) to our answer. But if the length of this substring is 0, then we can straightaway return False as this means that String 1 cannot be built with substrings from string 2.
- Check whether we’ve reached the end of the string 1. If reached the end then return the answer list and length of answer list (this is the number of substrings).
- Mark the removed letter from the previous string as start and repeat steps 2-5.
Implementing the code in Python
Below is the given Python program to construct a String from another String using Suffix Trie:
# Implementing Trie using Trie and TrieNode classes class TrieNode: # Trie node class def __init__(self): self.children = [None]*26 # property for representing the end of a word in the trie self.isEndOfWord = False class Trie: # Trie data structure class def __init__(self): self.root = self.getNode() def getNode(self): # Returns new trie node with Null values return TrieNode() def _charToIndex(self,ch): """ private helper function Converts key current character into index only chracters a-z allowed in lowercase """ return ord(ch)-ord('a') def insert(self,key): """ If word is already present in trie, just marks the leaf node to indicate If word isn't present then it creates the word in the trie """ word = self.root length = len(key) for level in range(length): index = self._charToIndex(key[level]) # if character is not present in trie if not word.children[index]: word.children[index] = self.getNode() word = word.children[index] # mark last node as leaf word.isEndOfWord = True def search(self, key): """ Search substring in the trie Returns true if substring is present in trie, else false """ word = self.root length = len(key) level=0 while level<length: index = self._charToIndex(key[level]) if not word.children[index]: return False word = word.children[index] level+=1 if level==length: return True else: return False #Task1 def build_from_substrings(S,T): """ Input: Two strings, S and T. The strings will consist only of lowercase a-z characters, and both will be non-empty, and can be of any size. Output: The smallest set of substrings of S which, when concatenated, equal T. In the form of a list of tuples. Complexity: O(N^2+M) where: • N is the number of characters in S • M is the number of characters in T """ # handling when length of S is 1 if len(S)==1: for i in range(len(T)): if T[i]!=S: return False return [(0,0)]*len(T) else: # creating suffix trie x=Trie() for i in range(len(S)): x.insert(S[i:]) start_pos=0 substrings= y=True k=1 # searching substrings in the trie while k<=len(T): y=x.search(T[start_pos:k]) if y==False: # when search is unsuccessful for a # single lettered substring then we know # that it doesn't exist in the word # so we return false if k==start_pos+1: return False elif k!=start_pos+1: # when search fails for a substring # greater than length =1 # then we use the completed search of # the substring before the last character # was added to the search # and append that substring to our result sub=T[start_pos:k-1] lt=len(sub) m=S.find(sub) substrings.append((m,m+lt-1)) start_pos=k-1 k=k-1 y=True elif y==True and k==len(T): # for the last substring of our word # we check whether we have reached the # last letter sub=T[start_pos:] lt=len(sub) m=S.find(sub) substrings.append((m,m+lt-1)) k=k+1 if y==True and substrings==: # handling the case when whole string exists # in the given word return [(S.find(T),len(T)-1)] else: return substrings
Let’s run the example given in the first section of this article:
[(2, 2), (0, 1), (5, 7)]
The output here is the required solution of the given task in the form of tuples where each tuple represents a substring and consists of a start index and an end index of the substring that it represents.
- (2,2) represents ‘a’
- (0,1) represents ‘bb’
- (5,7) represents ‘cba’
Thank you for sparing your valuable time to read this article. You can check out other articles as well: