Checkpoint in Keras in machine learning

In this tutorial, we will learn about creating a Checkpoint in Keras in Machine Learning. This checkpoint creation in Keras helps us to return to a checkpoint if something goes wrong in the future. This method helps us feel safe to experiment with our code as we can return to a checkpoint we have saved at any point in time.

Creating Checkpoint in Keras

The checkpoint helps allows us to define weights, checkpoints, defining names under specific circumstances for a checkpoint. The fit() function can be used to call the ModelCheckpoint function for the training process. In this session, we will create a deep neural network and then try to create some checkpoints on the same.

Firstly make sure to download the dataset that we will use from this link. Keep in mind that this data has 2/3rd of its data for training and the rest 1/3rd for testing.

Let’s now get to the coding part:

There are two parts to it, first is creating a check-point, and the second is fetching it.

Creating a checkpoint:

from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
import numpy

numpy.random.seed(10)

dataset = numpy.loadtxt("/home/sumit/pima-indians-diabetes.data.csv", delimiter=",")

X = dataset[:,0:8]
Y = dataset[:,8]

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

filepath="weights-improvement-{epoch:02d}-{val_accuracy:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list, verbose=0)

In the above code, we run 150 epochs on the data and then store the results in a .hdf5 file in a specific directory.

Output:

Using TensorFlow backend.

Epoch 00001: val_accuracy improved from -inf to 0.51969, saving model to weights-improvement-01-0.52.hdf5

Epoch 00002: val_accuracy did not improve from 0.51969

Epoch 00003: val_accuracy did not improve from 0.51969

Epoch 00004: val_accuracy did not improve from 0.51969

Epoch 00005: val_accuracy did not improve from 0.51969

Epoch 00006: val_accuracy did not improve from 0.51969

Epoch 00007: val_accuracy improved from 0.51969 to 0.65748, saving model to weights-improvement-07-0.66.hdf5

Epoch 00008: val_accuracy did not improve from 0.65748

Epoch 00009: val_accuracy improved from 0.65748 to 0.66535, saving model to weights-improvement-09-0.67.hdf5

Epoch 00010: val_accuracy did not improve from 0.66535

Epoch 00011: val_accuracy did not improve from 0.66535

Epoch 00012: val_accuracy improved from 0.66535 to 0.68110, saving model to weights-improvement-12-0.68.hdf5

Epoch 00013: val_accuracy did not improve from 0.68110

Epoch 00014: val_accuracy did not improve from 0.68110

Epoch 00015: val_accuracy did not improve from 0.68110

Epoch 00016: val_accuracy did not improve from 0.68110

Epoch 00017: val_accuracy did not improve from 0.68110

Epoch 00018: val_accuracy did not improve from 0.68110

Epoch 00019: val_accuracy did not improve from 0.68110

Epoch 00020: val_accuracy did not improve from 0.68110

Epoch 00021: val_accuracy did not improve from 0.68110

Epoch 00022: val_accuracy did not improve from 0.68110

Epoch 00023: val_accuracy did not improve from 0.68110

Epoch 00024: val_accuracy did not improve from 0.68110

Epoch 00025: val_accuracy did not improve from 0.68110

Epoch 00026: val_accuracy improved from 0.68110 to 0.68898, saving model to weights-improvement-26-0.69.hdf5

Epoch 00027: val_accuracy did not improve from 0.68898

Epoch 00028: val_accuracy did not improve from 0.68898

Epoch 00029: val_accuracy did not improve from 0.68898

Epoch 00030: val_accuracy did not improve from 0.68898

Epoch 00031: val_accuracy did not improve from 0.68898

Epoch 00032: val_accuracy did not improve from 0.68898

Epoch 00033: val_accuracy did not improve from 0.68898

Epoch 00034: val_accuracy did not improve from 0.68898

Epoch 00035: val_accuracy did not improve from 0.68898

Epoch 00036: val_accuracy did not improve from 0.68898

Epoch 00037: val_accuracy did not improve from 0.68898

Epoch 00038: val_accuracy did not improve from 0.68898

Epoch 00039: val_accuracy did not improve from 0.68898

Epoch 00040: val_accuracy did not improve from 0.68898

Epoch 00041: val_accuracy did not improve from 0.68898

Epoch 00042: val_accuracy did not improve from 0.68898

Epoch 00043: val_accuracy did not improve from 0.68898

Epoch 00044: val_accuracy did not improve from 0.68898

Epoch 00045: val_accuracy did not improve from 0.68898

Epoch 00046: val_accuracy did not improve from 0.68898

Epoch 00047: val_accuracy improved from 0.68898 to 0.69291, saving model to weights-improvement-47-0.69.hdf5

Epoch 00048: val_accuracy did not improve from 0.69291

Epoch 00049: val_accuracy improved from 0.69291 to 0.69685, saving model to weights-improvement-49-0.70.hdf5

Epoch 00050: val_accuracy did not improve from 0.69685

Epoch 00051: val_accuracy did not improve from 0.69685

Epoch 00052: val_accuracy did not improve from 0.69685

Epoch 00053: val_accuracy did not improve from 0.69685

Epoch 00054: val_accuracy did not improve from 0.69685

Epoch 00055: val_accuracy did not improve from 0.69685

Epoch 00056: val_accuracy did not improve from 0.69685

Epoch 00057: val_accuracy did not improve from 0.69685

Epoch 00058: val_accuracy did not improve from 0.69685

Epoch 00059: val_accuracy did not improve from 0.69685

Epoch 00060: val_accuracy did not improve from 0.69685

Epoch 00061: val_accuracy improved from 0.69685 to 0.71260, saving model to weights-improvement-61-0.71.hdf5

Epoch 00062: val_accuracy did not improve from 0.71260

Epoch 00063: val_accuracy did not improve from 0.71260

Epoch 00064: val_accuracy did not improve from 0.71260

Epoch 00065: val_accuracy did not improve from 0.71260

Epoch 00066: val_accuracy did not improve from 0.71260

Epoch 00067: val_accuracy did not improve from 0.71260

Epoch 00068: val_accuracy did not improve from 0.71260

Epoch 00069: val_accuracy did not improve from 0.71260

Epoch 00070: val_accuracy did not improve from 0.71260

Epoch 00071: val_accuracy did not improve from 0.71260

Epoch 00072: val_accuracy did not improve from 0.71260

Epoch 00073: val_accuracy did not improve from 0.71260

Epoch 00074: val_accuracy did not improve from 0.71260

Epoch 00075: val_accuracy did not improve from 0.71260

Epoch 00076: val_accuracy did not improve from 0.71260

Epoch 00077: val_accuracy did not improve from 0.71260

Epoch 00078: val_accuracy did not improve from 0.71260

Epoch 00079: val_accuracy did not improve from 0.71260

Epoch 00080: val_accuracy improved from 0.71260 to 0.71654, saving model to weights-improvement-80-0.72.hdf5

Epoch 00081: val_accuracy improved from 0.71654 to 0.72047, saving model to weights-improvement-81-0.72.hdf5

Epoch 00082: val_accuracy did not improve from 0.72047

Epoch 00083: val_accuracy did not improve from 0.72047

Epoch 00084: val_accuracy did not improve from 0.72047

Epoch 00085: val_accuracy did not improve from 0.72047

Epoch 00086: val_accuracy did not improve from 0.72047

Epoch 00087: val_accuracy did not improve from 0.72047

Epoch 00088: val_accuracy did not improve from 0.72047

Epoch 00089: val_accuracy did not improve from 0.72047

Epoch 00090: val_accuracy did not improve from 0.72047

Epoch 00091: val_accuracy did not improve from 0.72047

Epoch 00092: val_accuracy did not improve from 0.72047

Epoch 00093: val_accuracy did not improve from 0.72047

Epoch 00094: val_accuracy did not improve from 0.72047

Epoch 00095: val_accuracy did not improve from 0.72047

Epoch 00096: val_accuracy did not improve from 0.72047

Epoch 00097: val_accuracy did not improve from 0.72047

Epoch 00098: val_accuracy did not improve from 0.72047

Epoch 00099: val_accuracy did not improve from 0.72047

Epoch 00100: val_accuracy did not improve from 0.72047

Epoch 00101: val_accuracy did not improve from 0.72047

Epoch 00102: val_accuracy did not improve from 0.72047

Epoch 00103: val_accuracy did not improve from 0.72047

Epoch 00104: val_accuracy did not improve from 0.72047

Epoch 00105: val_accuracy did not improve from 0.72047

Epoch 00106: val_accuracy did not improve from 0.72047

Epoch 00107: val_accuracy did not improve from 0.72047

Epoch 00108: val_accuracy did not improve from 0.72047

Epoch 00109: val_accuracy did not improve from 0.72047

Epoch 00110: val_accuracy did not improve from 0.72047

Epoch 00111: val_accuracy did not improve from 0.72047

Epoch 00112: val_accuracy did not improve from 0.72047

Epoch 00113: val_accuracy did not improve from 0.72047

Epoch 00114: val_accuracy did not improve from 0.72047

Epoch 00115: val_accuracy did not improve from 0.72047

Epoch 00116: val_accuracy did not improve from 0.72047

Epoch 00117: val_accuracy did not improve from 0.72047

Epoch 00118: val_accuracy did not improve from 0.72047

Epoch 00119: val_accuracy did not improve from 0.72047

Epoch 00120: val_accuracy improved from 0.72047 to 0.73228, saving model to weights-improvement-120-0.73.hdf5

Epoch 00121: val_accuracy did not improve from 0.73228

Epoch 00122: val_accuracy did not improve from 0.73228

Epoch 00123: val_accuracy did not improve from 0.73228

Epoch 00124: val_accuracy did not improve from 0.73228

Epoch 00125: val_accuracy did not improve from 0.73228

Epoch 00126: val_accuracy did not improve from 0.73228

Epoch 00127: val_accuracy did not improve from 0.73228

Epoch 00128: val_accuracy did not improve from 0.73228

Epoch 00129: val_accuracy did not improve from 0.73228

Epoch 00130: val_accuracy did not improve from 0.73228

Epoch 00131: val_accuracy did not improve from 0.73228

Epoch 00132: val_accuracy did not improve from 0.73228

Epoch 00133: val_accuracy did not improve from 0.73228

Epoch 00134: val_accuracy did not improve from 0.73228

Epoch 00135: val_accuracy did not improve from 0.73228

Epoch 00136: val_accuracy did not improve from 0.73228

Epoch 00137: val_accuracy did not improve from 0.73228

Epoch 00138: val_accuracy did not improve from 0.73228

Epoch 00139: val_accuracy did not improve from 0.73228

Epoch 00140: val_accuracy did not improve from 0.73228

Epoch 00141: val_accuracy did not improve from 0.73228

Epoch 00142: val_accuracy did not improve from 0.73228

Epoch 00143: val_accuracy did not improve from 0.73228

Epoch 00144: val_accuracy did not improve from 0.73228

Epoch 00145: val_accuracy did not improve from 0.73228

Epoch 00146: val_accuracy did not improve from 0.73228

Epoch 00147: val_accuracy did not improve from 0.73228

Epoch 00148: val_accuracy did not improve from 0.73228

Epoch 00149: val_accuracy did not improve from 0.73228

Epoch 00150: val_accuracy did not improve from 0.73228

This would have successfully created many weight-improvement.hdf5 files in the specified path directory. Through this, we have randomly made many checkpoints throughout the dataset. Some of these may feel to be unnecessary check-point files but it is a good start.

Also, read: Image Classification using Keras in TensorFlow Backend

The next thing we can do is to save a file by creating a check-point only of the validation accuracy is found to improve. This can be achieved by making a slight change in the same code which is that we will create a single file this time. So all of the improvements if and when found will be stored by overwriting the previous data.

from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
import numpy

dataset = numpy.loadtxt("/home/sumit/pima-indians-diabetes.data.csv", delimiter=",")

X = dataset[:,0:8]
Y = dataset[:,8]

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list, verbose=0)

This code on execution will create a file named weights.best.hdf5 file in the specified path directory. Now we have successfully created a single check-point file for our data.

Output:

Using TensorFlow backend.

Epoch 00001: val_accuracy improved from -inf to 0.48425, saving model to weights.best.hdf5

Epoch 00002: val_accuracy improved from 0.48425 to 0.58661, saving model to weights.best.hdf5

Epoch 00003: val_accuracy did not improve from 0.58661

Epoch 00004: val_accuracy improved from 0.58661 to 0.61024, saving model to weights.best.hdf5

Epoch 00005: val_accuracy did not improve from 0.61024

Epoch 00006: val_accuracy improved from 0.61024 to 0.67717, saving model to weights.best.hdf5

Epoch 00007: val_accuracy did not improve from 0.67717

Epoch 00008: val_accuracy did not improve from 0.67717

Epoch 00009: val_accuracy improved from 0.67717 to 0.70079, saving model to weights.best.hdf5

Epoch 00010: val_accuracy did not improve from 0.70079

Epoch 00011: val_accuracy did not improve from 0.70079

Epoch 00012: val_accuracy did not improve from 0.70079

Epoch 00013: val_accuracy did not improve from 0.70079

Epoch 00014: val_accuracy did not improve from 0.70079

Epoch 00015: val_accuracy did not improve from 0.70079

Epoch 00016: val_accuracy did not improve from 0.70079

Epoch 00017: val_accuracy did not improve from 0.70079

Epoch 00018: val_accuracy did not improve from 0.70079

Epoch 00019: val_accuracy did not improve from 0.70079

Epoch 00020: val_accuracy did not improve from 0.70079

Epoch 00021: val_accuracy did not improve from 0.70079

Epoch 00022: val_accuracy did not improve from 0.70079

Epoch 00023: val_accuracy did not improve from 0.70079

Epoch 00024: val_accuracy did not improve from 0.70079

Epoch 00025: val_accuracy did not improve from 0.70079

Epoch 00026: val_accuracy did not improve from 0.70079

Epoch 00027: val_accuracy did not improve from 0.70079

Epoch 00028: val_accuracy did not improve from 0.70079

Epoch 00029: val_accuracy did not improve from 0.70079

Epoch 00030: val_accuracy improved from 0.70079 to 0.71654, saving model to weights.best.hdf5

Epoch 00031: val_accuracy did not improve from 0.71654

Epoch 00032: val_accuracy did not improve from 0.71654

Epoch 00033: val_accuracy did not improve from 0.71654

Epoch 00034: val_accuracy did not improve from 0.71654

Epoch 00035: val_accuracy did not improve from 0.71654

Epoch 00036: val_accuracy did not improve from 0.71654

Epoch 00037: val_accuracy did not improve from 0.71654

Epoch 00038: val_accuracy did not improve from 0.71654

Epoch 00039: val_accuracy did not improve from 0.71654

Epoch 00040: val_accuracy did not improve from 0.71654

Epoch 00041: val_accuracy did not improve from 0.71654

Epoch 00042: val_accuracy did not improve from 0.71654

Epoch 00043: val_accuracy did not improve from 0.71654

Epoch 00044: val_accuracy did not improve from 0.71654

Epoch 00045: val_accuracy did not improve from 0.71654

Epoch 00046: val_accuracy did not improve from 0.71654

Epoch 00047: val_accuracy did not improve from 0.71654

Epoch 00048: val_accuracy did not improve from 0.71654

Epoch 00049: val_accuracy did not improve from 0.71654

Epoch 00050: val_accuracy did not improve from 0.71654

Epoch 00051: val_accuracy did not improve from 0.71654

Epoch 00052: val_accuracy did not improve from 0.71654

Epoch 00053: val_accuracy did not improve from 0.71654

Epoch 00054: val_accuracy did not improve from 0.71654

Epoch 00055: val_accuracy improved from 0.71654 to 0.72441, saving model to weights.best.hdf5

Epoch 00056: val_accuracy did not improve from 0.72441

Epoch 00057: val_accuracy did not improve from 0.72441

Epoch 00058: val_accuracy did not improve from 0.72441

Epoch 00059: val_accuracy did not improve from 0.72441

Epoch 00060: val_accuracy did not improve from 0.72441

Epoch 00061: val_accuracy did not improve from 0.72441

Epoch 00062: val_accuracy did not improve from 0.72441

Epoch 00063: val_accuracy did not improve from 0.72441

Epoch 00064: val_accuracy did not improve from 0.72441

Epoch 00065: val_accuracy did not improve from 0.72441

Epoch 00066: val_accuracy did not improve from 0.72441

Epoch 00067: val_accuracy did not improve from 0.72441

Epoch 00068: val_accuracy did not improve from 0.72441

Epoch 00069: val_accuracy did not improve from 0.72441

Epoch 00070: val_accuracy did not improve from 0.72441

Epoch 00071: val_accuracy did not improve from 0.72441

Epoch 00072: val_accuracy did not improve from 0.72441

Epoch 00073: val_accuracy did not improve from 0.72441

Epoch 00074: val_accuracy did not improve from 0.72441

Epoch 00075: val_accuracy did not improve from 0.72441

Epoch 00076: val_accuracy did not improve from 0.72441

Epoch 00077: val_accuracy did not improve from 0.72441

Epoch 00078: val_accuracy did not improve from 0.72441

Epoch 00079: val_accuracy did not improve from 0.72441

Epoch 00080: val_accuracy did not improve from 0.72441

Epoch 00081: val_accuracy did not improve from 0.72441

Epoch 00082: val_accuracy did not improve from 0.72441

Epoch 00083: val_accuracy did not improve from 0.72441

Epoch 00084: val_accuracy did not improve from 0.72441

Epoch 00085: val_accuracy improved from 0.72441 to 0.72835, saving model to weights.best.hdf5

Epoch 00086: val_accuracy did not improve from 0.72835

Epoch 00087: val_accuracy did not improve from 0.72835

Epoch 00088: val_accuracy did not improve from 0.72835

Epoch 00089: val_accuracy improved from 0.72835 to 0.73228, saving model to weights.best.hdf5

Epoch 00090: val_accuracy did not improve from 0.73228

Epoch 00091: val_accuracy did not improve from 0.73228

Epoch 00092: val_accuracy did not improve from 0.73228

Epoch 00093: val_accuracy did not improve from 0.73228

Epoch 00094: val_accuracy improved from 0.73228 to 0.73622, saving model to weights.best.hdf5

Epoch 00095: val_accuracy did not improve from 0.73622

Epoch 00096: val_accuracy did not improve from 0.73622

Epoch 00097: val_accuracy did not improve from 0.73622

Epoch 00098: val_accuracy did not improve from 0.73622

Epoch 00099: val_accuracy did not improve from 0.73622

Epoch 00100: val_accuracy did not improve from 0.73622

Epoch 00101: val_accuracy did not improve from 0.73622

Epoch 00102: val_accuracy did not improve from 0.73622

Epoch 00103: val_accuracy did not improve from 0.73622

Epoch 00104: val_accuracy did not improve from 0.73622

Epoch 00105: val_accuracy improved from 0.73622 to 0.75197, saving model to weights.best.hdf5

Epoch 00106: val_accuracy did not improve from 0.75197

Epoch 00107: val_accuracy did not improve from 0.75197

Epoch 00108: val_accuracy did not improve from 0.75197

Epoch 00109: val_accuracy did not improve from 0.75197

Epoch 00110: val_accuracy did not improve from 0.75197

Epoch 00111: val_accuracy did not improve from 0.75197

Epoch 00112: val_accuracy did not improve from 0.75197

Epoch 00113: val_accuracy did not improve from 0.75197

Epoch 00114: val_accuracy did not improve from 0.75197

Epoch 00115: val_accuracy did not improve from 0.75197

Epoch 00116: val_accuracy did not improve from 0.75197

Epoch 00117: val_accuracy did not improve from 0.75197

Epoch 00118: val_accuracy did not improve from 0.75197

Epoch 00119: val_accuracy did not improve from 0.75197

Epoch 00120: val_accuracy did not improve from 0.75197

Epoch 00121: val_accuracy did not improve from 0.75197

Epoch 00122: val_accuracy did not improve from 0.75197

Epoch 00123: val_accuracy did not improve from 0.75197

Epoch 00124: val_accuracy did not improve from 0.75197

Epoch 00125: val_accuracy did not improve from 0.75197

Epoch 00126: val_accuracy did not improve from 0.75197

Epoch 00127: val_accuracy did not improve from 0.75197

Epoch 00128: val_accuracy did not improve from 0.75197

Epoch 00129: val_accuracy did not improve from 0.75197

Epoch 00130: val_accuracy did not improve from 0.75197

Epoch 00131: val_accuracy did not improve from 0.75197

Epoch 00132: val_accuracy did not improve from 0.75197

Epoch 00133: val_accuracy improved from 0.75197 to 0.75591, saving model to weights.best.hdf5

Epoch 00134: val_accuracy did not improve from 0.75591

Epoch 00135: val_accuracy did not improve from 0.75591

Epoch 00136: val_accuracy did not improve from 0.75591

Epoch 00137: val_accuracy did not improve from 0.75591

Epoch 00138: val_accuracy did not improve from 0.75591

Epoch 00139: val_accuracy did not improve from 0.75591

Epoch 00140: val_accuracy did not improve from 0.75591

Epoch 00141: val_accuracy did not improve from 0.75591

Epoch 00142: val_accuracy did not improve from 0.75591

Epoch 00143: val_accuracy did not improve from 0.75591

Epoch 00144: val_accuracy did not improve from 0.75591

Epoch 00145: val_accuracy did not improve from 0.75591

Epoch 00146: val_accuracy did not improve from 0.75591

Epoch 00147: val_accuracy did not improve from 0.75591

Epoch 00148: val_accuracy did not improve from 0.75591

Epoch 00149: val_accuracy did not improve from 0.75591

Epoch 00150: val_accuracy did not improve from 0.75591

One can use any of the two above mentioned ways of creating a checkpoint file. Both the methods have their perks, in one you create many check-point files, which may be difficult to handle, but provides more options to return to. While the other just creates a single file but only when an improvement is observed.

Fetching/Loading the created checkpoints:

Now we shall learn to access the created checkpoints to use them whenever required. To do you must have a good understanding of the network structure. So for this particular example, we will try to load the previously created weights.best.hdf5 file from the directory it was stored into.

import numpy
import matplotlib.pyplot as plt
from keras.layers import Dense
from keras.models import Sequential
from keras.callbacks import ModelCheckpoint

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.load_weights("weights.best.hdf5")

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print("Created model and loaded weights from file")

dataset = numpy.loadtxt("/home/sumit/pima-indians-diabetes.data.csv", delimiter=",")

X = dataset[:,0:8]
Y = dataset[:,8]

scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

Output:

Using TensorFlow backend.
Created model and loaded weights from file
accuracy: 76.04%

So clearly we have successfully loaded the file and then performed a task on it using a model. The checkpoint here helped us to directly perform the testing part over the data as its training part was already completed and stored in the file in the previous code.

I hope you know how to create checkpoints in your code and also load them as and when required. I hope you will use this method in your upcoming model in machine learning.
This was a basic tutorial on checkpoints in Keras, hope you enjoyed it. Have a good day and happy learning.

 

Leave a Reply