Confusion matrix using scikit-learn in Python

In this tutorial, we will learn how to plot a confusion matrix using the scikit-learn library in Python. We will be using some easy-to-go examples, getting down to the basics, and trying to understand the key terms of the confusion matrix as well.

In Machine Learning, we get the data, cleanse it, pre-process it, build an outstanding classification model and also get the best possible result. But wait, how can you know how effective your models it? That is what a Confusion Matrix does.

What is a Confusion Matrix?

A Confusion matrix is an n*n matrix that tells you the performance of your classification model. Now, classification in Machine Learning is the identification to which category/label a data point belongs, for which the true values are already known. It matches the predicted label of the model and the actual label of the data point.

I know there are a lot of terms some of how have not heard of, but we’ll get to all of it. Confusion matrix, in general, is easy, however, the terms can be confusing.

Confusion matrix using scikit-learn in Python

Let’s define the most basic terms:

  • True Positive – These are cases in which we predicted True and the actual result is True.

  • False Positive – We predicted True but the actual result is False.

  • False Negative – We predicted False but the actual result is True.

  • True Negative – Cases where we predicted False and the actual result is False.

Here’s an example of how to import and use Confusion matrix using scikit-learn, using a simple example

from sklearn.metrics import confusion_matrix
y_result = [1,1,0,0,0,0,1,1]        #Here 1 means True and 0 means False
y_pred = [0,1,0,0,0,1,1,1]
cfm = confusion_matrix(y_result, y_pred, labels=[1,0])

In the above example, the y_result are the actual results and y_pred are the predicted ones. Then we pass these two into the confusion matrix along with labels. Labels are just marking which rows and columns appear first. In this case [1,0] means, that [1,1] appear in the first quadrant, followed by [1,0], etc. As is the order shown in the above picture.

Now let’s look at the output.

[[3 1]
 [1 3]]

Now if you match the output with the corresponding inputs, you’ll be able to confirm the confusion matrix. And this is how Confusion Matrix is used using scikit-learn in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *