Construct a String from another String using Suffix Trie in Python
In this tutorial, we are going to take a look at the task of making a string from substrings of another string using a data structure known as Suffix Trie in Python. Specifically, we’ll look to build the string with a minimum number of substrings of the other string.
Let’s take a look at an example to understand our task better:
Consider these two strings:
String 1: ‘abbcba’
String 2: ‘bbacbcba’
And the task here is to construct String 1 from substrings of string 2 with a minimum number of substrings of string 2.
Although, there are a lot of possible ways to build the String 1 using substrings from String 2, the minimum number of strings would be used in the following solution:
‘a’, ‘bb’, ‘cba’ as the number of substrings used is 3 which is minimal.
Now, let’s briefly talk about what is a suffix trie?
What is a Suffix Trie?
A suffix trie is a retrieval data structure for strings. The suffix trie is a tree-like structure that stores all the suffixes of a given string. For example, the string ‘match’ would have the following strings stored in its suffix trie:
- match
- atch
- tch
- ch
- h
The suffix trie data structure for a string is useful for different kinds of query processing on a string such as finding a substring, counting the number of occurrences of a certain substring in the string, etc.
How to construct a string from substrings of another string using Suffix trie?
For this task, we’ll take the following steps:
- Build the suffix trie of the string from which we need to find the substrings (String 2).
- Start with the first letter of the string that needs to be built (String 1). Mark this as substring.
- Keep increasing the substring by 1 character at a time from string 1 until the substring is not found in the suffix trie of string 2.
- Remove the last letter of the substring due to which the substring was not found in String 2 and append this substring (if length>=1) (without the last letter) to our answer. But if the length of this substring is 0, then we can straightaway return False as this means that String 1 cannot be built with substrings from string 2.
- Check whether we’ve reached the end of the string 1. If reached the end then return the answer list and length of answer list (this is the number of substrings).
- Mark the removed letter from the previous string as start and repeat steps 2-5.
Implementing the code in Python
Below is the given Python program to construct a String from another String using Suffix Trie:
# Implementing Trie using Trie and TrieNode classes
class TrieNode:
# Trie node class
def __init__(self):
self.children = [None]*26
# property for representing the end of a word in the trie
self.isEndOfWord = False
class Trie:
# Trie data structure class
def __init__(self):
self.root = self.getNode()
def getNode(self):
# Returns new trie node with Null values
return TrieNode()
def _charToIndex(self,ch):
""" private helper function
Converts key current character into index
only chracters a-z allowed in lowercase
"""
return ord(ch)-ord('a')
def insert(self,key):
""" If word is already present in trie,
just marks the leaf node to indicate
If word isn't present then it creates the word in the trie
"""
word = self.root
length = len(key)
for level in range(length):
index = self._charToIndex(key[level])
# if character is not present in trie
if not word.children[index]:
word.children[index] = self.getNode()
word = word.children[index]
# mark last node as leaf
word.isEndOfWord = True
def search(self, key):
""" Search substring in the trie
Returns true if substring is present
in trie, else false
"""
word = self.root
length = len(key)
level=0
while level<length:
index = self._charToIndex(key[level])
if not word.children[index]:
return False
word = word.children[index]
level+=1
if level==length:
return True
else:
return False
#Task1
def build_from_substrings(S,T):
"""
Input: Two strings, S and T. The strings
will consist only of lowercase a-z characters,
and both will be non-empty, and can be of any size.
Output: The smallest set of substrings of S which, when
concatenated, equal T. In the form of a list of tuples.
Complexity: O(N^2+M) where:
• N is the number of characters in S
• M is the number of characters in T
"""
# handling when length of S is 1
if len(S)==1:
for i in range(len(T)):
if T[i]!=S:
return False
return [(0,0)]*len(T)
else:
# creating suffix trie
x=Trie()
for i in range(len(S)):
x.insert(S[i:])
start_pos=0
substrings=[]
y=True
k=1
# searching substrings in the trie
while k<=len(T):
y=x.search(T[start_pos:k])
if y==False:
# when search is unsuccessful for a
# single lettered substring then we know
# that it doesn't exist in the word
# so we return false
if k==start_pos+1:
return False
elif k!=start_pos+1:
# when search fails for a substring
# greater than length =1
# then we use the completed search of
# the substring before the last character
# was added to the search
# and append that substring to our result
sub=T[start_pos:k-1]
lt=len(sub)
m=S.find(sub)
substrings.append((m,m+lt-1))
start_pos=k-1
k=k-1
y=True
elif y==True and k==len(T):
# for the last substring of our word
# we check whether we have reached the
# last letter
sub=T[start_pos:]
lt=len(sub)
m=S.find(sub)
substrings.append((m,m+lt-1))
k=k+1
if y==True and substrings==[]:
# handling the case when whole string exists
# in the given word
return [(S.find(T),len(T)-1)]
else:
return substrings
Let’s run the example given in the first section of this article:
build_from_substrings('bbacbcba','abbcba')Output:
[(2, 2), (0, 1), (5, 7)]
The output here is the required solution of the given task in the form of tuples where each tuple represents a substring and consists of a start index and an end index of the substring that it represents.
- (2,2) represents ‘a’
- (0,1) represents ‘bb’
- (5,7) represents ‘cba’
Thank you for sparing your valuable time to read this article. You can check out other articles as well:
Leave a Reply