How to choose number of epochs to train a neural network in Keras

One of the difficulties we face while training a neural network is determining the optimal number of epochs. Too many epochs can cause the model to overfit i.e your model will perform quite well on the training data but will have high error rates on the test data.

On the other hand, very few epochs will cause the model to underfit i.e. your model will have large errors on both the training and test data. This article will help you determine the optimal number of epochs to train a neural network in Keras so as to be able to get good results in both the training and validation data.

Determining the optimal number of epochs

In terms of Artificial Neural Networks, an epoch can is one cycle through the entire training dataset. The number of epoch decides the number of times the weights in the neural network will get updated. The model training should occur on an optimal number of epochs to increase its generalization capacity. There is no fixed number of epochs that will improve your model performance. The number of epochs is actually not that important in comparison to the training and validation loss (i.e. the error). As long as these two losses continue to decrease, the training should continue.

EarlyStopping

It is a technique that allows us to define an arbitrarily large number of epochs to train the model and stops the training once the model performance stops improving on the validation data. This requires validation data to be passed into the fit() method while fitting our model (i.e. the ANN) to the training data. Let us try to understand better with the help of an example.

Code

Dataset

The dataset used in this code can be obtained from kaggle. It has a total of 10000 rows and 14 columns out of which we’ll take only the first 1000 instances to reduce the time required for training. The target variable labeled as ‘Exited’ is a binary variable with values 0 and 1. Our task will be to find the optimal number of epochs to train the ANN that we’ll fit into this dataset.

# Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Importing the dataset
dataset = pd.read_csv('datasets_156197_358170_Churn_Modelling.csv')

X = dataset.iloc[:1000, 3:-1].values
y = dataset.iloc[:1000, 13:14].values

Here, ‘X’ is my set of independent variables and ‘y’ the target variable.

Data Preprocessing

We first split our data into training and test (validation) sets, encode the categorical columns of ‘X’ and then finally standardize the values in the dataset.

# Splitiing dataset into Train and Test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Encoding the categorical columns of X
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [1, 2])], remainder = 'passthrough')
X_train = ct.fit_transform(X_train)
X_test = ct.transform(X_test)

# Standardizing the dataset values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Defining the architecture of the ANN

# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

# Initialising the ANN
model = Sequential()

# Adding the input layer and the first hidden layer
model.add(Dense(output_dim = 7, init = 'uniform', activation = 'relu', input_dim = 13))

# Adding the second hidden layer
model.add(Dense(output_dim = 7, init = 'uniform', activation = 'relu'))

# Adding the output layer
model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Applying EarlyStopping

In order to be able to apply EarlyStopping to our model training, we will have to create an object of the EarlyStopping class from the keras.callbacks library.

from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1)

‘monitor’ refers to the value that the function will monitor. It can either be validation_accuracy or validation_loss. Training will stop when the chosen performance measure i.e. the ‘monitor’ stops improving. ‘mode’ indicates whether you want to minimize or maximize the ‘monitor’. By default, ‘mode’ is set to ‘auto’ and knows that you want to minimize loss and maximize accuracy. To discover the epoch on which the training will be terminated, the verbose parameter is set to 1.

Fitting the ANN to the Dataset

model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 100, batch_size = 15, callbacks = [es])

Once we execute the above lines of code, the callback will print the epoch number on which the training stopped.

OUTPUT :

Epoch 00017: early stopping

It indicates that at the 17th epoch, the validation loss started to increase, and hence the training was stopped to prevent the model from overfitting.

# Evaluating model performance

train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

print(f'Train accuracy: {train_acc*100:.3f} % || Test accuracy: {test_acc*100:.3f} %')
print(f'Train loss: {train_loss:.3f} || Test loss: {test_loss:.3f}')

OUTPUT :

Train accuracy: 0.789 || Test accuracy: 0.825
Train loss: 0.413 || Test loss: 0.390

Above is the model accuracy and loss on the training and test data when the training was terminated at the 17th epoch.

The patience parameter

The first sign of no improvement may not always be the best time to stop training. This is because the model performance may deteriorate before improving and becoming better. We can account for this by adding a delay using the patience parameter of EpochStopping.

# Using EarlyStooping with patience 

es = EarlyStopping(monitor = 'val_loss', patience = 20, verbose = 1)

In this case, we will wait for another 20 epochs before training is stopped. It means that we will allow training to continue for up to an additional 20 epochs after the point where the validation loss starts to increase (indicating model performance has reduced).

model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 100, batch_size = 15, callbacks = [es])

train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

print(f'Train accuracy: {train_acc*100:.3f} % || Test accuracy: {test_acc*100:.3f} %')
print(f'Train loss: {train_loss:.3f} || Test loss: {test_loss:.3f}')

OUTPUT :

Epoch 00084: early stopping

Train accuracy: 85.375 % || Test accuracy: 83.000 %
Train loss: 0.374 || Test loss: 0.387

As we can see, the training stopped much later, and also the model accuracy and loss improved.

A problem associated with the patience parameter

Suppose patience = 10. If the validation loss does not improve after an additional ten epochs, we won’t get the best model but the model ten epochs after the best model. Hence, an additional callback is required that will save the best model observed during training for later use. This is the ModelCheckpoint callback.

from keras.callbacks import ModelCheckpoint

es = EarlyStopping(monitor = 'val_loss', patience = 20, verbose = 1)
mc =  ModelCheckpoint('best_model.h5', monitor='val_loss', verbose=1, save_best_only=True)

model.fit(X_train, y_train, validation_data = (X_test, y_test),
          epochs = 100, batch_size = 15, callbacks = [es, mc])

# Loading the saved model
from keras.models import load_model
saved_model = load_model('best_model.h5')

train_loss, train_acc = saved_model.evaluate(X_train, y_train, verbose=0)
test_loss, test_acc = saved_model.evaluate(X_test, y_test, verbose=0)

print(f'Accuracy and loss of the best model : ')
print(f'Train accuracy: {train_acc*100:.3f} % || Test accuracy: {test_acc*100:.3f} %')
print(f'Train loss: {train_loss:.3f} || Test loss: {test_loss:.3f}')

OUTPUT :

Epoch 00076: early stopping

Accuracy and loss of the best model : 
Train accuracy: 85.625 % || Test accuracy: 83.500 %
Train loss: 0.346 || Test loss: 0.354

The best model obtained during the training was saved as ‘best_model.h5′. It was then loaded and evaluated using the load_model() function.

Also read:

  1. Overfit and underfit in TensorFlow
  2. How to add packages to Anaconda environment in Python
  3. Activation Function For Neural Network

 

One response to “How to choose number of epochs to train a neural network in Keras”

  1. Mehvish Farooq says:

    Easy to comprehend and follow. Well done.

Leave a Reply