Print all the Longest Common Subsequences in Lexicographical order in Python

In this tutorial, first, we will see a short description of what subsequence and longest common subsequence are, and then go straight into the code. In the code section, first, we will learn how to find the length of the longest common subsequence using recursion and dynamic programming technique. Then we will see how to print all the longest common subsequences in lexicographical order in Python. Let’s begin without any further delay.

 

Subsequence – A subsequence of a string is a new string that is generated by deleting some of the characters from the original string by keeping the relative position of the characters unchanged. For example –
Original string = “ABCDVWXYZ”
Valid subsequences = “ACDW”, ”BYZ”, ”ACWXYZ”, “ABCDVWXYZ”
Invalid subsequences = “VAYZ”, “DYAZ”, “XBACW”

 

Longest Common Subsequence (LCS) – Given a number of sequences, the longest common subsequence is the problem of finding the longest subsequence common among all the sequences. The solution to the problem of the longest common subsequence is not necessarily unique. There can be many common subsequences with the longest possible length.
For example –
Sequence1 = “BAHJDGSTAH”
Sequence2 = “HDSABTGHD”
Sequence3 = “ABTH”
Length of LCS = 3
LCS = “ATH”, “BTH”

Now we will see how to code the problem of the Longest Common Subsequence.

 

Length of the longest common subsequence in Python

To find the length of the longest common subsequence, two popular techniques are –

 

1.Recursion

In recursion, we start comparing the strings from the end, one character at a time. Let lcs be the fubction to find the length of the of the longest subsequence common between two strings. Two possible cases are-

  1.  Characters are the same – Add 1 to lcs, and make a recursive call to the function using the modified strings by removing the last characters – lcs(str1, str2, m-1, n-1).
  2.  Characters are different – Maximum of (recursive call with sring 1 with last character removed, recursive call with string 2 with last character removed).
def lcs(str1, str2, m, n):
    
    # Base case
    if m==0 or n==0: # Base case 
        return 0 
    
    # If the last characters are same
    elif str1[m-1] == str2[n-1]: 
        return 1+lcs(str1, str2, m-1, n-1) 
    
    # If the last characters are different
    else: 
        return max(lcs(str1, str2, m-1, n),lcs(str1, str2, m,n-1))
    
str1 = input("Enter first string: ")
str2 = input("Enter second string: ")

lcs_length = lcs(str1, str2, len(str1), len(str2))

print("length of LCS is : {}".format(lcs_length))

Output:

Enter first string: BCECBEC

Enter second string: CEEBC

length of LCS is : 4

For a string of length n, 2n subsequences are possible. In the worst-case scenario, when both the strings are completely different and the length of LCS is 0, the time complexity will be O(2n). In recursion, many subproblems are computed again and again which is a waste of resources. To avoid this, we use dynamic programming.

 

2.Dynamic Programming

This technique follows the bottom-up approach. The solution to the subproblems is stored in a matrix for future use. This is known as memoization. If lengths of two the strings are m and n respectively, then the time complexity of dynamic programming is O(mn) which is much less than that of recursion. The last element of the matrix has the length of the LCS.

def lcs(str1 , str2):
    m = len(str1)
    n = len(str2)
    
    # matrix for storing solutions of the subproblems
    matrix = [[0]*(n+1) for i in range(m+1)] 
    
    for i in range(m+1):
        for j in range(n+1):
            
            if i==0 or j==0:
                matrix[i][j] = 0
                
            elif str1[i-1] == str2[j-1]:
                matrix[i][j] = 1 + matrix[i-1][j-1]
                
            else:
                matrix[i][j] = max(matrix[i-1][j] , matrix[i][j-1])
        
        # To see all the stages of the matrix formation                
        print(matrix)
        print(" ")
        
    return matrix[-1][-1]

str1 = input("Enter first string: ")
str2 = input("Enter second string: ")

lcs_length = lcs(str1, str2)

print("Length of LCS is : {}".format(lcs_length))

Output:

Enter first string: BCECBEC

Enter second string: CEEBC
[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 1, 1, 1, 1, 2], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 1, 1, 1, 1, 2], [0, 1, 2, 2, 2, 2], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 1, 1, 1, 1, 2], [0, 1, 2, 2, 2, 2], [0, 1, 2, 2, 2, 3], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 1, 1, 1, 1, 2], [0, 1, 2, 2, 2, 2], [0, 1, 2, 2, 2, 3], [0, 1, 2, 2, 3, 3], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 1, 1, 1, 1, 2], [0, 1, 2, 2, 2, 2], [0, 1, 2, 2, 2, 3], [0, 1, 2, 2, 3, 3], [0, 1, 2, 3, 3, 3], [0, 0, 0, 0, 0, 0]]

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 1], [0, 1, 1, 1, 1, 2], [0, 1, 2, 2, 2, 2], [0, 1, 2, 2, 2, 3], [0, 1, 2, 2, 3, 3], [0, 1, 2, 3, 3, 3], [0, 1, 2, 3, 3, 4]]

Length of LCS is : 4

 

Finally, we have come to the last part of the tutorial. Now we will see how to print all the longest common subsequences in lexicographical order. Construct the 2D array as shown in the previous code and traverse the array from the rightmost bottom cell. Again two cases are possible –

  1. Last characters of both the strings are same- append the last character to all the LCS, remove it from the strings and make a recursive call to the function with the modified strings.
  2.  Last characters are different – LCS can be constructed from either the top side of the current cell or from the left side depending upon whichever value is greater or from both sides if they are equal. So, based on the values at the top and left cells of the array, we go either in the direction of greater value or both the directions if the values are equal.
def printLcs(str1, str2, m, n): 
  
    # set to store all the possible LCS 
    s = set()  
    # Base case
    if m == 0 or n == 0: 
        s.add("") 
        return s 
  
    # If the last characters are same 
    if str1[m - 1] == str2[n - 1]: 
  
        # recurse with m-1 and n-1 in the matrix
        tmp = printLcs(str1, str2, m - 1, n - 1) 
  
        # append current character to all possible LCS of the two strings 
        for i in tmp: 
            s.add(i + str1[m - 1]) 
  
    # If the last characters are not same 
    else: 
  
        # If LCS can be constructed from top side of matrix
        if matrix[m - 1][n] >= matrix[m][n - 1]: 
            s = printLcs(str1, str2, m - 1, n) 
  
        # If LCS can be constructed from left side of matrix 
        if matrix[m][n - 1] >= matrix[m - 1][n]: 
            tmp = printLcs(str1, str2, m, n - 1) 
  
            # Merge two sets if matrix[m-1][n] == matrix[m][n-1] 
            # s will be empty if matrix[m-1][n] != matrix[m][n-1] 
            for i in tmp: 
                s.add(i) 
    return s 
  
# To find the length of LCS 
def lengthOfLcs(str1, str2): 
    m = len(str1)
    n = len(str2)
    matrix = [[0]*(n+1) for i in range(m+1)]
    
    for i in range(m + 1): 
        for j in range(n + 1): 
            
            if i == 0 or j == 0: 
                matrix[i][j] = 0
                
            elif str1[i - 1] == str2[j - 1]: 
                matrix[i][j] = matrix[i - 1][j - 1] + 1
                
            else: 
                matrix[i][j] = max(matrix[i - 1][j], 
                              matrix[i][j - 1]) 
    return matrix

str1 = input("Enter first string: ")
str2 = input("Enter second string: ")

matrix = lengthOfLcs(str1,str2) 
lcs = printLcs(str1,str2,len(str1),len(str2))
lcs  = sorted(lcs)

print("\nLength of LCS: {}".format(matrix[-1][-1]))
print("All the possible LCSs are: {}".format(lcs))

Output:

Enter first string: BCECBEC

Enter second string: CEEBC

Length of LCS: 4
All the possible LCSs are: ['CEBC', 'CEEC']

 

For further reading:

How to rotate an array in Python
Usage of variables starting with underscore in Python

Leave a Reply

Your email address will not be published. Required fields are marked *