Burrows Wheeler Transform in Python

In this tutorial, we will learn about burrows wheeler transform (BWT) in Python. As scary as this algorithm may look, it is simple as pie if we actually check it out.

What is BWT?

Invented in 1994 by Michael Burrows and David Wheeler, BWT is the transformation which structures the data in such a way that helps in efficient compression of data. It simply rearranges similar characters together. Hence, it is used in many compression algorithms. Let us solve BWT step by step by taking the example: codespeedy$

Step 1: Rotate the string by incrementing the character position by 1. Creating a table will make it easy.

codespeedy$
odespeedy$c
despeedy$co
espeedy$cod
speedy$code
peedy$codes
eedy$codesp
edy$codespe
dy$codespee
y$codespeed
$codespeedy

Step 2: Arrange the rows alphabetically (lexicographically-dictionary order). NOTE– Special characters have the first priority. So our next table would look something like this:

$codespeedy
codespeedy$
despeedy$co
dy$codespee
edy$codespe
eedy$codesp
espeedy$cod
odespeedy$c
peedy$codes
speedy$code
y$codespeed

Step 3: Finally, extract the characters of the last columns only.

y$oeepdcsed

And here we are with our transformed data, ready to be compressed. Easy peasy!

BWT – Burrows Wheeler Transform in Python

Let us now implement the same in Python. The code given below has been made in the simplest form for better understanding. We will follow the same basic steps as above. Here is the algorithm:

  1. Take input from the user.
  2. Convert the input string into list. This will help in rearranging the characters of the string.
  3. Create an empty list.
  4. Using for loop, rotate characters of the string cyclically and append them in the empty list.
  5. Sort the list alphabetically/lexicographically.
  6. Finally, take the last character of each element of the list. That will be our transformed data.

It will be much clearer when we implement the code part by part.

Part 1: The pre-requisites

a = input()
words = list(a)
list = []

Here, the input is taken from the user and converted that string into list. This will separate each characters and rearrangement of them will be easier. Also, an empty list is created.

Part 2: Rotation of the string

for i in range(len(words)):
    word = a[-1] + a[:-1]
    new = ''.join(word)
    a = new
    list.append(new)
    i += 1
print(list)

We’ve used the for loop which will iterate till the length of the list of the user input, i.e. the length of the string. a[-1] will give you the last character of the string. Whereas, a[:-1] will give all other characters except last one. Adding both of them will increment the position of each character by 1. Next, we use join() method to join all the characters. This new string is replaced by original input a. Further, it appended into the empty list using the append() method. Empty list was created in part 1.

Part 3: Sorting elements alphabetically/lexicographically

sort = sorted(list)
print(sort)

Simply, just use the sorted() function. Doing so will automatically return the elements of the list, sorted.

Part 4: Extracting last characters

for i in range(len(words)):
    element = sort[i]
    last = element[-1]
    i = i + 1
    print(last)

Here again, we use the for loop in range of the length of the input string. sort[i] will pick up one element of the sort list. Storing that element in a variable, element[-1] will give the last character of that element. And done!

The whole code:

a = input("Enter a string:")
words = list(a)
list = []

for i in range(len(words)):
    word = a[-1] + a[:-1]
    new = ''.join(word)
    a = new
    list.append(new)
    i += 1
print(list)

sort = sorted(list)
print(sort)

for i in range(len(words)):
    element = sort[i]
    last = element[- 1]
    i = i + 1
    print(last)

Output:

Enter a string:>? codespeedy$
['$codespeedy', 'y$codespeed', 'dy$codespee', 'edy$codespe', 'eedy$codesp', 'peedy$codes', 'speedy$code', 'espeedy$cod', 'despeedy$co', 'odespeedy$c', 'codespeedy$']
['$codespeedy', 'codespeedy$', 'despeedy$co', 'dy$codespee', 'edy$codespe', 'eedy$codesp', 'espeedy$cod', 'odespeedy$c', 'peedy$codes', 'speedy$code', 'y$codespeed']
y
$
o
e
e
p
d
c
s
e
d

Finally, we’ve successfully implemented Burrows Wheeler Transform in Python. There may be other ways to solve this problem.

Also read:

Leave a Reply

Your email address will not be published. Required fields are marked *