Side by side Boxplots in Python

ddubgHey Fellow Python coder! In this tutorial, we will be learning about Side-by-Side box plots and then how to implement them in Python programming. If you aren’t aware of what Boxplots are and how to implement the basic Boxplot using Python, have a look at the tutorial below.

Also Read: How to Box plot visualization with Pandas and Seaborn

Let’s start with Side-by-Side Boxplots right on!

Introduction to Side-By-Side Boxplots

Let’s say you are one of the organizers of a Talent Show and have different categories of performers such as Individual performers, Group performers, and Duo Performers. Now we need a way to visualize separate box plots for each category of performers. Normally, a box plot only visualizes a single group of data but in the case of side-by-side boxplots, we can plot multiple box plots in one frame.

In the case of the Talent show, have a look at the sample box plot visualization below:

 

By comparing the side-by-side boxplots of the Talent Show, we can get a sense of how different types of performers performed on average, as well as the moments when they shone the best throughout the show.

Now that we understand what side-by-side plots are, let’s move on to implementation!

Python Code Implementation of Side-By-Side Boxplot

For the creation of a side-by-side Boxplot, first of all, let’s create data to plot the visualizations. We will be taking three random uniform values in the range of 0 to 30 representing the cumulative scores given by the judges. To get uniform data, we will make use of the random.uniform function present inside numpy library as shown in the code below:

import numpy as np

individual_Performers = np.random.uniform(0, 30, 100)
group_Performers = np.random.uniform(0, 30, 100)
duo_Performers = np.random.uniform(0, 30, 100)

combined_Data = [individual_Performers, group_Performers, duo_Performers]

For the data points, we will be taking scores of 100 random performers. After getting Individual scores we will merge the data points in a combined variable just like shown above. Next, let’s focus on plotting the boxplots for the data we created.

Talent Show Data_CandlestickChart

Also Read: Understanding Python pandas.DataFrame.boxplot

Creating a Basic Side-By-Side Boxplot

For the creation of boxplots in this tutorial, we will make use of the matplotlib library.  We will make use of the boxplot function that will first of all take the combined data along with the custom labels using the labels attribute. Along with a basic plot, we will also add x and y labels and the title to the plot using the code below.

import matplotlib.pyplot as plt

plt.figure(figsize=(15,5))
plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'])
plt.ylabel('Performance Scores')
plt.xlabel('Type of Performer')
plt.title('Talent Show Performance Analysis')
plt.show()

The resulting plot when the above code is executed is :

Basic Side-By-Side Boxplot

In the plot, the orange line represents the mean/average score in each category. The lines being extended in each plot on either side represent the minimum and maximum values in each category. But the plot seems a little boring right? Let’s change that.

Before starting we need to introduce a new attribute path_artist which when set to True it implies that boxes in the plot are converted to patches and it makes it possible to add colors and styling to the patches.

box = plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'], patch_artist=True)

Adding Colors to the Boxplots

We will start by adding colors to the individual box plots to make them look unique and different. There is no direct attribute to add colors to the boxplots and hence we need to use a different approach to achieve the same. First of all, let’s declare the list of colors, in this case, I will be using HEX codes of the colors to make it more specific.

After that, we will be iterating through each box plot and adding color using the set_facecolor method. Along with this, we will be adding colors to the title and labels to the plot using the color attribute. The same procedure is shown in the code snippet below.

import matplotlib.pyplot as plt

plt.figure(figsize=(15, 5))
box = plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'], patch_artist=True)

colors = ['#F5B7B1', '#C39BD3', '#76D7C4']
for patch, color in zip(box['boxes'], colors):
    patch.set_facecolor(color)

plt.ylabel('Performance Scores', color='#873600')
plt.xlabel('Type of Performer', color='#873600')
plt.title('Talent Show Performance Analysis', color='#0B5345')
plt.show()

The resulting plot is:

Add Colors to the Boxplots side by side

As you can see now we can see a clear difference in all the plots as compared to the output shown before.

Adding Legend to the Boxplots

Adding Legend to the plot is a little complex job in the case of box plots. Don’t worry, I will make this job simpler by breaking the implementation down into simple steps. The main function that will be used is plt.legend which takes multiple parameters, let’s learn about each one after another. Have a look at the code once:

plt.legend(handles=[
    plt.Line2D([], [],
               marker='*',
               markersize=20,
               markerfacecolor=color,
               color='w',
               label=label)
    for color, label in zip(colors, ['Individual', 'Group', 'Duo'])
], loc='upper right')

Let’s understand the whole function in detail. First of all plt.legend is used to add legends in the plots in general. To make it even simpler and a little less complex we will make use of handles parameters. As a handle, we will be using Line2D plot that takes multiple attributes where the first two parameters are x and y coordinates for the positioning of the legend markers.  In this case, I want to use the default positioning and hence will pass [] for both coordinates.

Next, we will customize our markers , and for that, we will use marker shape, size, and color using the marker, markersize and markerfacecolor attributes respectively. For the color, I will be assigning a temporary variable called color which we will loop around later using the colors of the boxplots which we declared and used earlier.

Now you might be wondering what is the second color attribute used for, this is the color around the marker. For clear visibility of the marker, I am setting the color to white (w). Lastly, we have the label attribute which is set to a temporary variable which again will be looped over later.

I have been mentioning the term looping a lot in the previous parameters, the wait is over the next parameter to the handles parameter is looping over the color and label variables dynamically depending on the values set previously. Lastly, to avoid overlapping I have used loc parameter and set it to upper right and you can change the position according to your preference.

Have a look at the output when the code mixes with the previous code:

Adding Legend to the Boxplots

Complete Python Code

import numpy as np
import matplotlib.pyplot as plt

individual_Performers = np.random.uniform(0, 30, 100)
group_Performers = np.random.uniform(0, 30, 100)
duo_Performers = np.random.uniform(0, 30, 100)

combined_Data = [individual_Performers, group_Performers, duo_Performers]

plt.figure(figsize=(15, 5))
box = plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'], patch_artist=True)

colors = ['#F5B7B1', '#C39BD3', '#76D7C4']
for patch, color in zip(box['boxes'], colors):
    patch.set_facecolor(color)

plt.ylabel('Performance Scores', color='#873600')
plt.xlabel('Type of Performer', color='#873600')
plt.title('Talent Show Performance Analysis', color='#0B5345')

plt.legend(handles=[
    plt.Line2D([], [],
               marker='*',
               markersize=20,
               markerfacecolor=color,
               color='w',
               label=label)
    for color, label in zip(colors, ['Individual', 'Group', 'Duo'])
], loc='upper right')

plt.show()

Also Read:

  1. Plotting Violin Plots in Python using the Seaborn Library
  2. Create a pie chart using Matplotlib in Python
  3. Create major and minor gridlines with different line styles in Matplotlib Python

Leave a Reply

Your email address will not be published. Required fields are marked *