Predict Next Sequence using Deep Learning in Python
In this tutorial, we’ll learn about the Prediction of the Next Sequence using Deep Learning in Python.
The next sequence prediction means predicting the next value of a given input sequence.
For example, if the input sequence contains the values [0, 0.1, 0.2, 0.3] then the next predicted sequence should be [0.4].
To better understand this topic we’ll work on a real-life example which is the prediction of Stock prices. For this, we’ll use LSTM concepts.
We’ll work on NIFTY50 data from 19/06/18 to 18/06/19 which is available on www.nseindia.com. It consists of “Date”, “Open”, “High”, “Low”, “Close”, “Shares Traded”, and “Turnover (Rs. Cr)”.
First import the following Python packages like Pandas, Numpy, Matplotlib, Keras, etc. as shown below:
import pandas as pd import numpy as np %matplotlib inline import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import r2_score from keras.models import Sequential from keras.layers import Dense from keras.callbacks import EarlyStopping from keras.optimizers import Adam from keras.layers import LSTM from keras.layers import Dense, Activation, Embedding, Dropout, TimeDistributed,Input
Now, we’ll read the data from the data file using pandas.
df = pd.read_csv('nse50_data.csv') print(df[:5])
Date Open High ... Close Shares Traded Turnover (Rs. Cr) 0 19-Jun-2018 10789.45 10789.45 ... 10710.45 231382790 12290.16 1 20-Jun-2018 10734.65 10781.80 ... 10772.05 199467082 10858.35 2 21-Jun-2018 10808.45 10809.60 ... 10741.10 230507383 12211.18 3 22-Jun-2018 10742.70 10837.00 ... 10821.85 236898415 13520.01 4 25-Jun-2018 10822.90 10831.05 ... 10762.45 236693278 12012.41
A graphical representation of the Turnovers (in crores) is shown below.
data = df.iloc[:,6].values plt.figure(figsize=(10, 6)) plt.xlabel('Days') plt.ylabel('Turnover (in crores)') plt.plot(data)
We’ll use the turnover(in crores) data from 19/06/18 to 18/04/19 as train data and from 19/04/19 to 19/06/19 as test data.
df['Date'] = pd.to_datetime(df['Date']) mask = (df['Date'] == '2019-4-18') print(df.loc[mask]) # index for the date 18-Apr-2019 print('--------------------------------------------') train = data[:205] test = data[175:]
Date Open High ... Close Shares Traded Turnover (Rs. Cr) 205 2019-04-18 11856.15 11856.15 ... 11752.8 339653709 18271.27 [1 rows x 7 columns] --------------------------------------------
Now, we’ll Normalize the train and test data using a min-max scaler.
sc = MinMaxScaler(feature_range = (0, 1)) train = sc.fit_transform(train.reshape(-1,1)) test = sc.transform(test.reshape(-1,1))
We’ll take timesteps = 30, i.e, take the first 30 days of data as input to predict the turnover on the 31st day. Create X_train using 30 timesteps for each sample.
X_train = [] y_train = [] for i in range(30, train.shape[0]): X_train.append(train[i-30:i, 0]) y_train.append(train[i, 0]) X_train, y_train = np.array(X_train), np.array(y_train)
print(X_train.shape, y_train.shape) print(X_train) print(y_train[:2])
(175, 30) (175,) [[0.32014897 0.27753191 0.31779817 ... 0.59711237 0.40685077 0.39237244] [0.27753191 0.31779817 0.35675479 ... 0.40685077 0.39237244 0.40965785] [0.31779817 0.35675479 0.31188189 ... 0.39237244 0.40965785 0.38402232] ... [0.49944087 0.76165063 0.40110533 ... 0.43010574 0.61685008 0.38092919] [0.76165063 0.40110533 0.48890961 ... 0.61685008 0.38092919 0.35909428] [0.40110533 0.48890961 0.48566231 ... 0.38092919 0.35909428 0.41972985]] [0.40965785 0.38402232]
We’ll now design the model. We’ll use a single LSTM layer with 16 neurons and four dense layers having 8,4,2, and 1 neurons, respectively. We’ll use Adam optimizer and Mean-squared-error as a loss function.
# Training LSTM model X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1)) model = Sequential() # LSTM layer model.add(LSTM(16, input_shape=(X_train.shape[1], 1), activation='relu',kernel_initializer='lecun_uniform')) # Dense layer model.add(Dense(8)) model.add(Dense(4)) model.add(Dense(2)) model.add(Dense(1)) model.compile(optimizer = 'adam', loss = 'mean_squared_error') model.fit(X_train, y_train, epochs = 45, batch_size = 4)
Now, we’ll Create X_test using 30 timesteps for each sample.
X_test = [] y_test = [] for i in range(30, test.shape[0]): X_test.append(test[i-30:i, 0]) y_test.append(test[i, 0]) X_test, y_test = np.array(X_test), np.array(y_test) print(X_test.shape) X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1)) print(X_train.shape)
(40, 30) (175, 30, 1)
Now, we’ll plot the predictions VS real turnover on the training set.
predicted = model.predict(X_train) predicted = sc.inverse_transform(predicted) plt.plot(sc.inverse_transform(train[-175:]), color = 'blue', label = 'Turnover') plt.plot(predicted, color = 'yellow', label = 'Predicted Turnover') plt.title('NIFTY50 Turnover') plt.xlabel('Time') plt.ylabel('Turnover') plt.legend() plt.show()
The result is as follows:
Now, we’ll plot the predictions VS real turnover on the test set.
predicted = model.predict(X_test) predicted = sc.inverse_transform(predicted) plt.plot(sc.inverse_transform(test[-41:]), color = 'blue', label = 'Turnover') plt.plot(predicted, color = 'yellow', label = 'Predicted Turnover') plt.title('NIFTY50 Turnover') plt.xlabel('Time') plt.ylabel('Turnover') plt.legend() plt.show()
The result is as follows:
I hope you enjoyed this tutorial.
Can you do it for gene sequence of a genome?