Language Translator (RNN BiDirectional LSTMs and Attention) in Python

Hey ML Enthusiasts, I hope you are safe and healthy. Do you know, how Google Translator works? So here are we.

In this article, we are going to create a Language Translator using Recurrent BiDirectional LSTMs and Attention Mechanism in Python. We are going to create a Translator which can Translate from the English Language to the Hindi Language.

You can download the Dataset and notebook from my Github repo.

Encoder-Decoder Sequence to Sequence Model

For the purpose of Language Translator, we will be using the Sequence-to-Sequence Model which contains two recurrent neural networks known as Encoder-Decoder, where we will first encode the input and by providing their cell states to the decoder, we will decode the sentence. Here, BiDirectional LSTMs and Attention Mechanism are taken under consideration which is used by Google.

LTSM Encoder-Decoder Sequence to Sequence Model

Requirements:

  • Tensorflow
  • Keras
  • Python=3.6

Code Overview and Explanation:

First, We are going to import the Python Libraries.

import numpy as np
import pandas as pd
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM
from tensorflow.keras.layers import Dense,TimeDistributed,Embedding
from tensorflow.keras.layers import Bidirectional,Concatenate,Attention
from sklearn.model_selection import train_test_split
from string import digits
import nltk
import re
import string

The main task for whole Text classification or Text-based models is text preprocessing. Below is the given Python program:

# Lowercase all characters
lines['english_sentence']=lines['english_sentence'].apply(lambda x: x.lower())
lines['hindi_sentence']=lines['hindi_sentence'].apply(lambda x: x.lower())

# Remove quotes
lines['english_sentence']=lines['english_sentence'].apply(lambda x: re.sub("'", '', x))
lines['hindi_sentence']=lines['hindi_sentence'].apply(lambda x: re.sub("'", '', x))

# Remove all the special characters
exclude = set(string.punctuation) # Set of all special characters
lines['english_sentence']=lines['english_sentence'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
lines['hindi_sentence']=lines['hindi_sentence'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))

Now, we will create the vocabulary for the English and the Hindi language

### Get English and Hindi Vocabulary
all_eng_words=set()
for eng in lines['english_sentence']:
    for word in eng.split():
        if word not in all_eng_words:
            all_eng_words.add(word)

all_hindi_words=set()
for hin in lines['hindi_sentence']:
    for word in hin.split():
        if word not in all_hindi_words:
            all_hindi_words.add(word)

Now, we have to create a  dictionary where all words have been provided a number for model training.

input_token_index = dict([(word, i+1) for i, word in enumerate(input_words)])
target_token_index = dict([(word, i+1) for i, word in enumerate(target_words)])

Above in the code, input_token_index refers to a dictionary related to the English language and target_token_index related to the Hindi language.

The Architecture of the Language Translator Model

In the Model, as we discussed there will be two models in a single model i.e. an Encoder and a Decoder. In Encoder, we will be using 3 BiDirectional LSTMs and in Decoder, we will be using 1 LSTM layer. This is not fixed because you have to do experiments to get a good accuracy score.

encoder_inputs = Input(shape=(25,))

# Embedding Layer
embedding_1 = Embedding(num_encoder_tokens,128)
embedding_1 = embedding_1(encoder_inputs)

# Adding 1st Bidirectional Layers
encoder_1 = Bidirectional(LSTM(latent_dim,return_state=True,return_sequences=True))
encoder_1_output_1,forward_h1,forward_c1,backward_h1,backward_c1 = encoder_1(embedding_1)

# Adding 2nd Bidirectional Layers
encoder_2 = Bidirectional(LSTM(latent_dim,return_state=True,return_sequences=True))
encoder_2_output_2,forward_h2,forward_c2,backward_h2,backward_c2 = encoder_2(encoder_1_output_1)

# Adding 3rd Bidirectional Layers
encoder_3 = Bidirectional(LSTM(latent_dim,return_state=True,return_sequences=True))
encoder_3_output_3,forward_h3,forward_c3,backward_h3,backward_c3 = encoder_3(encoder_2_output_2)

# Adding Cncatenation Layers
state_h = Concatenate()([forward_h3,backward_h3])
state_c = Concatenate()([forward_c3,backward_c3])

encoder_states = [state_h,state_c]

Embedding Layer: You can turn out the positive integers(indexes) into Dense vectors as You can read in detail about them here.

Bidirectional LSTMs are connected with each other’s output and the last layer will provide hidden and cell state and then they will be connected to the Decoder model as we have discussed above.

Now, Let’s see the decoder Model.

# Decoder
decoder_inputs = Input(shape=(None,))
embedding_2 = Embedding(num_decoder_tokens,128)

dec_emb = embedding_2(decoder_inputs)
decoder_lstm = LSTM(600, return_sequences=True, return_state=True)
decoder_lstm_output, _, _ = decoder_lstm(dec_emb,initial_state=encoder_states)

attention = Attention()([encoder_3_output_3,decoder_lstm_output])
decoder_concat_output = Concatenate()([decoder_lstm_output,attention])
decoder_outputs = TimeDistributed(Dense(num_decoder_tokens,activation='softmax'))(decoder_concat_output)

We have introduced an attention layer that helps you to focus on necessary words because all sentences can be explained by defining two or three words.

Now we will train our model for 100 epochs and voila we have achieved an accuracy of 70 %.

Let’s see the prediction——-

(input_seq, actual_output), _ = next(train_gen)
decoded_sentence = decode_sequence(input_seq)
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Hindi Translation:', y_train[k:k+1].values[0][6:-4])
print('Predicted Hindi Translation:', decoded_sentence[:-4])
Input English sentence: deep shade of white mausoleum could clearly be seen in the lake
Actual Hindi Translation:  श्वेत मकबरे की गहरी छाया को स्पष्ट देखा जा सकता था उस सरोवर में। 
Predicted Hindi Translation:  श्वेत मकबरे की गहरी छाया को स्पष्ट देखा जा सकत


Now,  you have to experiment on the model to reach higher accuracy as experimenting is the only way to increase the accuracy and  If you have any doubt, please share your feedback in the comments box!!!

Also, read: Real time object detection using TensorFlow in Python

Leave a Reply

Your email address will not be published. Required fields are marked *