Neural style transfer in TensorFlow – Python

In this tutorial, we will learn about Neural style transfer in TensorFlow. In this algorithm, we optimize the loss functions to get pixel values. Neural Style transfer takes two images and merges them to get us an image that is a perfect blend. It is used in art generation where we take two images one style image and one general image. In this model, we convert the general image in the style of style image. This used transfer learning that uses a previously trained model to build on top of that. This is used when the data we have for our problem is very small. In this tutorial, we will use VGG-19 to build the model.

IMPORTING LIBRARIES

import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *
import numpy as np
import tensorflow as tf
import pprint
%matplotlib inline

Now in further few lines let us load this VGG-19  pre-trained weights to our model.

pp = pprint.PrettyPrinter(indent=4)
model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
pp.pprint(model)

Output:

{   'avgpool1': <tf.Tensor 'AvgPool:0' shape=(1, 150, 200, 64) dtype=float32>,
    'avgpool2': <tf.Tensor 'AvgPool_1:0' shape=(1, 75, 100, 128) dtype=float32>,
    'avgpool3': <tf.Tensor 'AvgPool_2:0' shape=(1, 38, 50, 256) dtype=float32>,
    'avgpool4': <tf.Tensor 'AvgPool_3:0' shape=(1, 19, 25, 512) dtype=float32>,
    'avgpool5': <tf.Tensor 'AvgPool_4:0' shape=(1, 10, 13, 512) dtype=float32>,
    'conv1_1': <tf.Tensor 'Relu:0' shape=(1, 300, 400, 64) dtype=float32>,
    'conv1_2': <tf.Tensor 'Relu_1:0' shape=(1, 300, 400, 64) dtype=float32>,
    'conv2_1': <tf.Tensor 'Relu_2:0' shape=(1, 150, 200, 128) dtype=float32>,
    'conv2_2': <tf.Tensor 'Relu_3:0' shape=(1, 150, 200, 128) dtype=float32>,
    'conv3_1': <tf.Tensor 'Relu_4:0' shape=(1, 75, 100, 256) dtype=float32>,
    'conv3_2': <tf.Tensor 'Relu_5:0' shape=(1, 75, 100, 256) dtype=float32>,
    'conv3_3': <tf.Tensor 'Relu_6:0' shape=(1, 75, 100, 256) dtype=float32>,
    'conv3_4': <tf.Tensor 'Relu_7:0' shape=(1, 75, 100, 256) dtype=float32>,
    'conv4_1': <tf.Tensor 'Relu_8:0' shape=(1, 38, 50, 512) dtype=float32>,
    'conv4_2': <tf.Tensor 'Relu_9:0' shape=(1, 38, 50, 512) dtype=float32>,
    'conv4_3': <tf.Tensor 'Relu_10:0' shape=(1, 38, 50, 512) dtype=float32>,
    'conv4_4': <tf.Tensor 'Relu_11:0' shape=(1, 38, 50, 512) dtype=float32>,
    'conv5_1': <tf.Tensor 'Relu_12:0' shape=(1, 19, 25, 512) dtype=float32>,
    'conv5_2': <tf.Tensor 'Relu_13:0' shape=(1, 19, 25, 512) dtype=float32>,
    'conv5_3': <tf.Tensor 'Relu_14:0' shape=(1, 19, 25, 512) dtype=float32>,
    'conv5_4': <tf.Tensor 'Relu_15:0' shape=(1, 19, 25, 512) dtype=float32>,
    'input': <tf.Variable 'Variable:0' shape=(1, 300, 400, 3) dtype=float32_ref>}

This gives information about the layers of the model.

In this model, We have two cost functions namely style cost function and content cost function which we will assemble together to get out the final cost function to minimize. First, let us talk about Content Cost function.

CONTENT COST FUNCTION

The image of the Louvre Museum in Paris is used as content image. This is the base image which will be styled by the end of this tutorial. So let’s import it. The generated image is going to be matching this content image.

content = scipy.misc.imread("images/louvre.jpg")
imshow(content);

Now that we have imported the image we will work on the definition of a Content cost function. A content cost function is defined as follows:

J_{content}(C,G) = *1/(4*n_H * n_W * n_C))* (a^{(C)} – a^{(G)})^2

def compute_content_cost(a_C, a_G):
    # Retrieve dimensions from a_G 
    m, n_H, n_W, n_C = a_G.get_shape().as_list()
   
    a_C_unrolled = tf.reshape(a_C, shape=[m, n_H * n_W, n_C])
    a_G_unrolled = tf.reshape(a_G, shape=[m, n_H * n_W, n_C])
    
    # compute the cost with TensorFlow
    J_content = (1/(4*n_H*n_W*n_C))*tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled)))
    return J_content
tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    a_C = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    J_content = compute_content_cost(a_C, a_G)
    print("J_content = " + str(J_content.eval()))

Output:

J_content = 6.76559

This is the value of J_contents cost function. This cost function has a hidden layer and basically tells how different the content image and generated image is. This ensures that when we minimize the cost function later G has the same content as C.

STYLE COST FUNCTION

We will import the style image that we will use here.

style_image = scipy.misc.imread("images/monet.jpg")
imshow(style_image);

Style Matrix:

  • Its second name is gram matrix.
  • This matrix compares the similarity between the two pictures.

 

In neural style transfer, the Gram matrix is computed using the following formula:

Gram matrix formula

 

def gram_matrix(A):
    GA = tf.matmul(A,tf.transpose(A))
    return GA

tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    A = tf.random_normal([3, 2*1], mean=1, stddev=4)
    GA = gram_matrix(A)
    
    print("GA = \n" + str(GA.eval()))

Output:

GA = 
[[  6.42230511  -4.42912197  -2.09668207]
 [ -4.42912197  19.46583748  19.56387138]
 [ -2.09668207  19.56387138  20.6864624 ]]

This is the Gram Matrix. Now we will get style cost from this Gram matrix. We are using one hidden layer and the style cost function will be as follows:

style cost function formula of gram matrix

 

G (s, gram) is Gram matrix of style image and G(G,gram)  is a Gram matrix for the generated image. Let us convert this into a function.

def compute_style_cost_layer(a_S, a_G):
    # Retrieve dimensions from a_G 
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    a_S = tf.transpose(tf.reshape(a_S, ([n_H*n_W, n_C])))
    a_G = tf.transpose(tf.reshape(a_G, ([n_H*n_W, n_C])))

    
    GS = gram_matrix(a_S)
    GG = gram_matrix(a_G)

    
    J_style_layer = 1./(4*n_C**2 *(n_H*n_W)**2)*tf.reduce_sum(tf.pow((GS - GG), 2))
    return J_style_layer
tf.reset_default_graph()

with tf.Session() as test:
    tf.set_random_seed(1)
    a_S = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    J_style_layer = compute_layer_style_cost(a_S, a_G)
    
    print("J_style_layer = " + str(J_style_layer.eval()))

Output:

J_style_layer = 9.19028

Till now we have taken the stye cost from only one layer. It would be more accurate if we add the cost of different layers too. We will give weights to different layers to give importance to important layers. Let’s first give the weight.

STYLE_LAYERS = [
    ('conv1_1', 0.2),
    ('conv2_1', 0.2),
    ('conv3_1', 0.2),
    ('conv4_1', 0.2),
    ('conv5_1', 0.2)]

We can combine style costs for different layers as follows:

combine style costs

Let’s write a function for the same.

def compute_style(model, STYLE_LAYERS):
    
    J_style = 0

    for layer_name, coeff in STYLE_LAYERS:

        
        output = model[layer_name]

        
        a_S = sess.run(output)

        
        a_G = output
        
        
        J_style_layer = compute_style_cost_layer(a_S, a_G)

        
        J_style += coeff * J_style_layer

    return J_style

 

COST OPTIMIZATION

Minimizing the total cost function step is as follows:

def total_cost(J_content, J_style, alpha = 10, beta = 40):
    J = alpha*J_content + beta*J_style
    return J
tf.reset_default_graph()

with tf.Session() as test:
    np.random.seed(3)
    J_content = np.random.randn()    
    J_style = np.random.randn()
    J = total_cost(J_content, J_style)
    print(str(J))

Output:

35.34667875478276

Now we will put all the functions above together to give us the results.

tf.reset_default_graph()

sess = tf.InteractiveSession()

Working on content and style images.

content_image = scipy.misc.imread("images/louvre_small.jpg")
content_image = reshape_and_normalize_image(content_image)

style_image = scipy.misc.imread("images/monet.jpg")
style_image = reshape_and_normalize_image(style_image)

We will now create a generated image that will be co-related to content image but mostly will be a noise image.

generated_image = generate_noise_image(content_image)
imshow(generated_image[0]);

Neural style transfer in TensorFlow - Python

Let us call the above content cost and style functions in this running session now.

  
sess.run(model['input'].assign(content_image))


output = model['conv4_2']


a_C = sess.run(output)


a_G = output


J_content = compute_content_cost(a_C, a_G)


sess.run(model['input'].assign(style_image))


J_style = compute_style(model, STYLE_LAYERS)

We will work on calling the total cost function with alpha =10 and beta=40.

J = total_cost(J_content,J_style,alpha=10,beta=40)

We will be using adam optimizer with a learning rate of 0.2 for our model.

# define optimizer
optimizer = tf.train.AdamOptimizer(2.0)


train_step = optimizer.minimize(J)

Let us compile and train our model now.

def model_nn(sess, input_image, num_iterations = 200):
    sess.run(tf.global_variables_initializer())
    

    sess.run(model["input"].assign(input_image))
    
    
    for i in range(num_iterations):
    
        
        sess.run(train_step)
        
        
       
        
        generated_image = sess.run(model['input'])
      

       
        if i%20 == 0:
            Jt, Jc, Js = sess.run([J, J_content, J_style])
            print("Iteration " + str(i) + " :")
            print("total cost = " + str(Jt))
            print("content cost = " + str(Jc))
            print("style cost = " + str(Js))
            
            
            save_image("output/" + str(i) + ".png", generated_image)
    
  
    save_image('output/generated_image.jpg', generated_image)
    
    return generated_image
model_nn(sess, generated_image)

Output:

Iteration 180 :
total cost = 9.7414e+07
content cost = 18489.9
style cost = 2.43073e+06

 

generated image tensorflow

In conclusion, We have learned about transfer learning and style and content cost functions. We can use this in many applications which is a very interesting part of this method.

Leave a Reply

Your email address will not be published. Required fields are marked *