Language Translator (RNN BiDirectional LSTMs and Attention) in Python
Hey ML Enthusiasts, I hope you are safe and healthy. Do you know, how Google Translator works? So here are we.
In this article, we are going to create a Language Translator using Recurrent BiDirectional LSTMs and Attention Mechanism in Python. We are going to create a Translator which can Translate from the English Language to the Hindi Language.
Encoder-Decoder Sequence to Sequence Model
For the purpose of Language Translator, we will be using the Sequence-to-Sequence Model which contains two recurrent neural networks known as Encoder-Decoder, where we will first encode the input and by providing their cell states to the decoder, we will decode the sentence. Here, BiDirectional LSTMs and Attention Mechanism are taken under consideration which is used by Google.
Code Overview and Explanation:
First, We are going to import the Python Libraries.
import numpy as np import pandas as pd from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, LSTM from tensorflow.keras.layers import Dense,TimeDistributed,Embedding from tensorflow.keras.layers import Bidirectional,Concatenate,Attention from sklearn.model_selection import train_test_split from string import digits import nltk import re import string
The main task for whole Text classification or Text-based models is text preprocessing. Below is the given Python program:
# Lowercase all characters lines['english_sentence']=lines['english_sentence'].apply(lambda x: x.lower()) lines['hindi_sentence']=lines['hindi_sentence'].apply(lambda x: x.lower()) # Remove quotes lines['english_sentence']=lines['english_sentence'].apply(lambda x: re.sub("'", '', x)) lines['hindi_sentence']=lines['hindi_sentence'].apply(lambda x: re.sub("'", '', x)) # Remove all the special characters exclude = set(string.punctuation) # Set of all special characters lines['english_sentence']=lines['english_sentence'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude)) lines['hindi_sentence']=lines['hindi_sentence'].apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
Now, we will create the vocabulary for the English and the Hindi language
### Get English and Hindi Vocabulary all_eng_words=set() for eng in lines['english_sentence']: for word in eng.split(): if word not in all_eng_words: all_eng_words.add(word) all_hindi_words=set() for hin in lines['hindi_sentence']: for word in hin.split(): if word not in all_hindi_words: all_hindi_words.add(word)
Now, we have to create a dictionary where all words have been provided a number for model training.
input_token_index = dict([(word, i+1) for i, word in enumerate(input_words)]) target_token_index = dict([(word, i+1) for i, word in enumerate(target_words)])
Above in the code, input_token_index refers to a dictionary related to the English language and target_token_index related to the Hindi language.
The Architecture of the Language Translator Model
In the Model, as we discussed there will be two models in a single model i.e. an Encoder and a Decoder. In Encoder, we will be using 3 BiDirectional LSTMs and in Decoder, we will be using 1 LSTM layer. This is not fixed because you have to do experiments to get a good accuracy score.
encoder_inputs = Input(shape=(25,)) # Embedding Layer embedding_1 = Embedding(num_encoder_tokens,128) embedding_1 = embedding_1(encoder_inputs) # Adding 1st Bidirectional Layers encoder_1 = Bidirectional(LSTM(latent_dim,return_state=True,return_sequences=True)) encoder_1_output_1,forward_h1,forward_c1,backward_h1,backward_c1 = encoder_1(embedding_1) # Adding 2nd Bidirectional Layers encoder_2 = Bidirectional(LSTM(latent_dim,return_state=True,return_sequences=True)) encoder_2_output_2,forward_h2,forward_c2,backward_h2,backward_c2 = encoder_2(encoder_1_output_1) # Adding 3rd Bidirectional Layers encoder_3 = Bidirectional(LSTM(latent_dim,return_state=True,return_sequences=True)) encoder_3_output_3,forward_h3,forward_c3,backward_h3,backward_c3 = encoder_3(encoder_2_output_2) # Adding Cncatenation Layers state_h = Concatenate()([forward_h3,backward_h3]) state_c = Concatenate()([forward_c3,backward_c3]) encoder_states = [state_h,state_c]
Embedding Layer: You can turn out the positive integers(indexes) into Dense vectors as You can read in detail about them here.
Bidirectional LSTMs are connected with each other’s output and the last layer will provide hidden and cell state and then they will be connected to the Decoder model as we have discussed above.
Now, Let’s see the decoder Model.
# Decoder decoder_inputs = Input(shape=(None,)) embedding_2 = Embedding(num_decoder_tokens,128) dec_emb = embedding_2(decoder_inputs) decoder_lstm = LSTM(600, return_sequences=True, return_state=True) decoder_lstm_output, _, _ = decoder_lstm(dec_emb,initial_state=encoder_states) attention = Attention()([encoder_3_output_3,decoder_lstm_output]) decoder_concat_output = Concatenate()([decoder_lstm_output,attention]) decoder_outputs = TimeDistributed(Dense(num_decoder_tokens,activation='softmax'))(decoder_concat_output)
We have introduced an attention layer that helps you to focus on necessary words because all sentences can be explained by defining two or three words.
Now we will train our model for 100 epochs and voila we have achieved an accuracy of 70 %.
Let’s see the prediction——-
(input_seq, actual_output), _ = next(train_gen) decoded_sentence = decode_sequence(input_seq) print('Input English sentence:', X_train[k:k+1].values) print('Actual Hindi Translation:', y_train[k:k+1].values[6:-4]) print('Predicted Hindi Translation:', decoded_sentence[:-4])
Input English sentence: deep shade of white mausoleum could clearly be seen in the lake Actual Hindi Translation: श्वेत मकबरे की गहरी छाया को स्पष्ट देखा जा सकता था उस सरोवर में। Predicted Hindi Translation: श्वेत मकबरे की गहरी छाया को स्पष्ट देखा जा सकत
Now, you have to experiment on the model to reach higher accuracy as experimenting is the only way to increase the accuracy and If you have any doubt, please share your feedback in the comments box!!!