How to choose number of epochs to train a neural network in Keras
One of the difficulties we face while training a neural network is determining the optimal number of epochs. Too many epochs can cause the model to overfit i.e your model will perform quite well on the training data but will have high error rates on the test data.
On the other hand, very few epochs will cause the model to underfit i.e. your model will have large errors on both the training and test data. This article will help you determine the optimal number of epochs to train a neural network in Keras so as to be able to get good results in both the training and validation data.
Determining the optimal number of epochs
In terms of Artificial Neural Networks, an epoch can is one cycle through the entire training dataset. The number of epoch decides the number of times the weights in the neural network will get updated. The model training should occur on an optimal number of epochs to increase its generalization capacity. There is no fixed number of epochs that will improve your model performance. The number of epochs is actually not that important in comparison to the training and validation loss (i.e. the error). As long as these two losses continue to decrease, the training should continue.
EarlyStopping
It is a technique that allows us to define an arbitrarily large number of epochs to train the model and stops the training once the model performance stops improving on the validation data. This requires validation data to be passed into the fit() method while fitting our model (i.e. the ANN) to the training data. Let us try to understand better with the help of an example.
Code
Dataset
The dataset used in this code can be obtained from kaggle. It has a total of 10000 rows and 14 columns out of which we’ll take only the first 1000 instances to reduce the time required for training. The target variable labeled as ‘Exited’ is a binary variable with values 0 and 1. Our task will be to find the optimal number of epochs to train the ANN that we’ll fit into this dataset.
# Importing the required libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt # Importing the dataset dataset = pd.read_csv('datasets_156197_358170_Churn_Modelling.csv') X = dataset.iloc[:1000, 3:-1].values y = dataset.iloc[:1000, 13:14].values
Here, ‘X’ is my set of independent variables and ‘y’ the target variable.
Data Preprocessing
We first split our data into training and test (validation) sets, encode the categorical columns of ‘X’ and then finally standardize the values in the dataset.
# Splitiing dataset into Train and Test sets from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) # Encoding the categorical columns of X from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [1, 2])], remainder = 'passthrough') X_train = ct.fit_transform(X_train) X_test = ct.transform(X_test) # Standardizing the dataset values from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
Defining the architecture of the ANN
# Importing the Keras libraries and packages import keras from keras.models import Sequential from keras.layers import Dense # Initialising the ANN model = Sequential() # Adding the input layer and the first hidden layer model.add(Dense(output_dim = 7, init = 'uniform', activation = 'relu', input_dim = 13)) # Adding the second hidden layer model.add(Dense(output_dim = 7, init = 'uniform', activation = 'relu')) # Adding the output layer model.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid')) # Compiling the ANN model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
Applying EarlyStopping
In order to be able to apply EarlyStopping to our model training, we will have to create an object of the EarlyStopping class from the keras.callbacks library.
from keras.callbacks import EarlyStopping es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1)
‘monitor’ refers to the value that the function will monitor. It can either be validation_accuracy or validation_loss. Training will stop when the chosen performance measure i.e. the ‘monitor’ stops improving. ‘mode’ indicates whether you want to minimize or maximize the ‘monitor’. By default, ‘mode’ is set to ‘auto’ and knows that you want to minimize loss and maximize accuracy. To discover the epoch on which the training will be terminated, the verbose parameter is set to 1.
Fitting the ANN to the Dataset
model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 100, batch_size = 15, callbacks = [es])
Once we execute the above lines of code, the callback will print the epoch number on which the training stopped.
OUTPUT :
Epoch 00017: early stopping
It indicates that at the 17th epoch, the validation loss started to increase, and hence the training was stopped to prevent the model from overfitting.
# Evaluating model performance train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0) test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0) print(f'Train accuracy: {train_acc*100:.3f} % || Test accuracy: {test_acc*100:.3f} %') print(f'Train loss: {train_loss:.3f} || Test loss: {test_loss:.3f}')
OUTPUT :
Train accuracy: 0.789 || Test accuracy: 0.825 Train loss: 0.413 || Test loss: 0.390
Above is the model accuracy and loss on the training and test data when the training was terminated at the 17th epoch.
The patience parameter
The first sign of no improvement may not always be the best time to stop training. This is because the model performance may deteriorate before improving and becoming better. We can account for this by adding a delay using the patience parameter of EpochStopping.
# Using EarlyStooping with patience es = EarlyStopping(monitor = 'val_loss', patience = 20, verbose = 1)
In this case, we will wait for another 20 epochs before training is stopped. It means that we will allow training to continue for up to an additional 20 epochs after the point where the validation loss starts to increase (indicating model performance has reduced).
model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 100, batch_size = 15, callbacks = [es]) train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0) test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0) print(f'Train accuracy: {train_acc*100:.3f} % || Test accuracy: {test_acc*100:.3f} %') print(f'Train loss: {train_loss:.3f} || Test loss: {test_loss:.3f}')
OUTPUT :
Epoch 00084: early stopping
Train accuracy: 85.375 % || Test accuracy: 83.000 %
Train loss: 0.374 || Test loss: 0.387
As we can see, the training stopped much later, and also the model accuracy and loss improved.
A problem associated with the patience parameter
SupposeĀ patience = 10. If the validation loss does not improve after an additional ten epochs, we won’t get the best model but the model ten epochs after the best model. Hence, an additional callback is required that will save the best model observed during training for later use. This is the ModelCheckpoint callback.
from keras.callbacks import ModelCheckpoint es = EarlyStopping(monitor = 'val_loss', patience = 20, verbose = 1) mc = ModelCheckpoint('best_model.h5', monitor='val_loss', verbose=1, save_best_only=True) model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 100, batch_size = 15, callbacks = [es, mc]) # Loading the saved model from keras.models import load_model saved_model = load_model('best_model.h5') train_loss, train_acc = saved_model.evaluate(X_train, y_train, verbose=0) test_loss, test_acc = saved_model.evaluate(X_test, y_test, verbose=0) print(f'Accuracy and loss of the best model : ') print(f'Train accuracy: {train_acc*100:.3f} % || Test accuracy: {test_acc*100:.3f} %') print(f'Train loss: {train_loss:.3f} || Test loss: {test_loss:.3f}')
OUTPUT :
Epoch 00076: early stopping Accuracy and loss of the best model : Train accuracy: 85.625 % || Test accuracy: 83.500 % Train loss: 0.346 || Test loss: 0.354
The best model obtained during the training was saved as ‘best_model.h5′. It was then loaded and evaluated using the load_model() function.
Also read:
- Overfit and underfit in TensorFlow
- How to add packages to Anaconda environment in Python
- Activation Function For Neural Network
Easy to comprehend and follow. Well done.