Using bfloat16 with TensorFlow models in Python
Fellow coders, in this tutorial we are going to learn how to use ‘bfloat16’ with TensorFlow models in Python. When using bfloat16 as opposed to 32 bit often proves to be a good choice. A lot of models reach the same accuracy with bfloat16 when compared to 32 bit and some models even show improved converged accuracy, which is a good option.
What is bfloat16:
Google’s custom 16-bit brain floating point is called bfloat16. There are several performance advantages of using bfloat16 which we will discuss later.
TensorFlow stores all the variables in a 32-bit floating-point by default. We will change it and use the bfloat16 for activations and gradients. The advantage of using bfloat16 over 32 bit is that it increases the device step time and decreases memory uses.
The standard 16-bit floating format is:
- 1 sign bit
- 5 exponent bits
- 10 fraction bits
But, in bfloat16 we use a different format:
- 1 sign bit
- 8 exponent bits
- 7 fraction bits
Using bfloat16 reduces the size of data in memory and thus allowing for larger models to fit in the same memory. As we can see, it proves itself to be really efficient. It can also reduce rematerialization. Remarkable speedup % increment can be witnessed after switching to bfloat16.
Steps to implement bfloat16 with TensorFlow:
Now after looking at all the benefits of using bfloat16 with TensorFlow. Let’s look at the steps involved in order to change the model to bfloat16:
- Run the model in floating-point 32.
- Cast the input to bfloat16. Doing this will convert all the activations and gradients in the model to bfloat16.
image = tf.cast(image, tf.bfloat16)
- Cast the outputs of the model to float32.
Now we have successfully implemented bfloat16 with TensorFlow with all its advantages. Thank you for reading this tutorial. Keep Coding!