Binary Classification using Neural Networks

Post Views: 1,360

This article will help you to understand binary classification using neural networks. ( Only using Python with no in-built library from the scratch )

Neural Network

Definition : A computer system modeled on the human brain and nervous system is known as Neural Network.
Read this interesting article on Wikipedia – Neural Network

Binary Classification

Binary classification is the task of classifying the elements of given set into two groups on the basis of classification rule. For example, classifying images of humans to that of animals.
It is a kind of Supervised Learning where there are only two labels. Though Binary Classification may seem very basic but it has large applications in the industry.

Spam Detection
Credit Card fraudulent transaction detection
Medical Diagnosis (ex : whether a patient has cancer or not)

Logistic Regression

Logistic Regression is used in the case of categorical dependent variable (target). It is a kind of classification algorithm and not a regression algorithm.
Logistic Function is written as inverse of Logit Function, also known as Sigmoid Function.
Mathematically, Φ(z) = 1/(1+exp(-z))
where, z = w.x + b
Z Score

def z_score(w,x,b):
        return np.dot(w,x)+b

Weights and Biases

The parameter w and b while calculating z score are weight and bias respectively. For example, our task is to hit a six in a cricket match. Here, our output becomes ‘ball crosses boundary without touching ground’ and input becomes ‘hitting the ball with the bat’. The ball crosses the fence will depend on the ‘strength’ and ‘timing’ in playing the shot. Agree? This ‘strength’ is actually the ‘weight’ and ‘timing’ is actually the ‘bias’ term in z score. Read an interesting thread on stackexchange about Weights and Biases.

Initializing Weights and Biases
Let’s see how to initialize weights and biases in Python. We must keep in mind that when the weights are initialized to zero, neural network tend to get stuck in the local minima (and could not reach global minima). So we try to initialize weights with random values.

def init_params(n_x,n_h):

        ''' It is supposed that you have imported numpy as np
        We want to create a weight matrix of size 
        n_h X n_x with random values.
        And a Bias matrix of size n_h X 1 and initialize it
        randomly.'''
        w=np.random.rand(n_h,n_x)*np.sqrt(1/n_x) # Xavier Initialization.
        b=np.random.randn(n_h,1)
        return w,b

Activation Functions

Definition : The activation function of a node defines the output of that node given an input or set of inputs. This output is then used as input for next node and so on until a desired solution to the original solution is found. The above given Sigmoid Function is a kind of activation function. There are many types of activation functions. For Example: sigmoid, tanh, relu, softmax, softplus, etc. We can define ReLU functions as, Φ(z) = max(z,0). There are different kinds of ReLU functions, one of then which is mostly used is Leaky ReLU. We define Leaky ReLU function as Φ(z) = max(z, e*z + c), where e and c are very small constants.
Calculating Activations

def activation(z,fn = 'linear'):
        act_fn ={'linear':z,
                 'relu':np.maximum(z,0),
                 'tanh':np.tanh(z),
                 'sigmoid':1/(1+np.exp(-z)),
                 'softmax':np.exp(z)/np.sum(np.exp(z))}
        return act_fn[fn]

Forward Propagation

The input X is the initial information we have and we have reached to the original solution. The sequential calculation of z and activation functions where the previous result acts as input for the next layer. This process is basically trying to carry and process the initial information and conclude some result.
Implementing Forward Propagation

def forward_prop(x, parameters):
        L = len(parameters)//2
        z_scores = {}
        activations = {'a0':x}
        for i in range(1,L+1):
            z_scores['z'+str(i)] = z_score(parameters['w'+str(i)],activations['a'+str(i-1)],parameters['b'+str(i)])
            z = z_scores['z'+str(i)]
            activations['a'+str(i)] = activation(z,fn=self.args[i-1][1])
        
        return z_scores, activations

Cost and Loss Functions
Definition from Wikipedia: A loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. There are many types of loss functions used in Artificial Neural Network. For Example : Mean Squared Error (MSE), Mean Absolute Error (MAE), Cross-Entropy Loss, etc.
We will discuss Cross-Entropy Loss for the task we have selected i.e Binary Classification.
We can define Cross-Entropy loss as, L(y,a) = – y log(a) – (1-y) log(1 – a).
and Cost Function as J(y,a) = (-1/m) * ∑ L(y,a) , where m = number of samples.
Implementing Cost Function

def compute_cost(y,y_hat):
        m = y.shape[0]
        epsilon = 0.0000001
        cost = (-1/m)*(np.dot(y, np.log(y_hat.T+epsilon)) + np.dot(1-y, np.log(1-y_hat.T+epsilon)))
        return np.squeeze(cost)

Backward Propagation

In Backward Propagation, we basically try to find the gradients of loss function with respect to different parameters. These gradients help the parameters to reach the desired values step-by-step. In easy language try to understand it as if you were to find the square root of 50 using differential calculus. You know that the answer lies somewhere around 7 (as √49 is 7). So you will take some very small value dx and add to 7, then calculate the square of (7 + dx). You will get closer and closer to √50 after every step by increasing the dx value. And you will reach √50 with certain accuracy. In Backward Propagation, we use a similar approach to reach the desired value. I will suggest you watch the YouTube video by 3Blue1Brown on Backward Propagation.
Implementing Backward Propagation

def backprop(y, parameters, z_scores, activations):
        gradients = {}
        L = len(parameters//2)
        m = y.shape[0]
        for i in range(L,0,-1):
            if i==L:
                # Derivative of loss function wrt. z
                # when activation function is sigmoid.
                gradients['dz'+str(i)]=activations['a'+str(i)]-y
            else:
                # when activation function is ReLU
                gradients['dz'+str(i)] = np.multiply(np.dot(parameters['w'+str(i+1)].T, gradients['dz'+str(i+1)]), 1*(z_scores['z'+str(i)]>=0))
            dz = gradients['dz'+str(i)]
            gradients['dw'+str(i)] = (1/m)*np.matmul(dz,activations['a'+str(i-1)].T)
            gradients['db'+str(i)] = (1/m)*np.sum(dz,axis=1,keepdims=True)
        return gradients

Update weights and biases
After calculating the gradients we need to update the parameters and then again forward propagate to see the loss. We keep repeating the process
Forward Propagate —> Calculate Cost —> Backward Propagation —> Update Parameters —> Again Forward Propagation, and so on.
The one Hyperparameter used in updating the parameters is the Learning Rate (η) (for this simple implementation). Hyperparameter are those values which cannot be trained and need to be selected cleverly. After every iteration,

w := w - η * (dJ/dw)

b := b - η * (dJ/db)

def update_params(parameters, gradients, learning_rate):
        eta = learning_rate
        for i in range(1,len(parameters)//2+1):
            parameters['w'+str(i)]-=eta*gradients['dw'+str(i)]
            parameters['b'+str(i)]-=eta*gradients['db'+str(i)]
        return parameters

Train the Model

Training the model means simply iterating the above steps multiple times until the loss minimizes to a certain value. Select the number of iterations carefully. Along with good accuracy, we need less computation time also.
Algorithm :

Initialize Parameters
for i = 1 to i = n:
     forward propagate
     calculate cost
     backward propagate ( i.e find gradients )
     update parameters
return parameters

Predict for new data

You now have data for prediction and the only thing we need is the correct parameters. After that we need to do nothing, just put the data in the trained model and get the output. The Python implementation of the function is shown below.

def predict(x_test,params):
        z_scores, activations = forward_prop(x_test,params)
        y_pred = 1*(activations['a'+str(len(params)//2)]>0.5)
        return np.squeeze(y_pred)

That’s all you need to do to build a neural network.
Since I have explained all the steps needed and how to implement a python code for them, even then if you need help, visit my GitHub Repository to see the actual implementation of Neural Network.
You may enjoy reading my other posts –

I hope you are clear with the concepts and if you need any support at any point, feel free to comment.

One response to “Binary Classification using Neural Networks”

maisam says:

September 20, 2022 at 7:50 pm

can you run this on a real data ?
please run it and show the results and explain

Reply