Side by side Boxplots in Python
ddubgHey Fellow Python coder! In this tutorial, we will be learning about Side-by-Side box plots and then how to implement them in Python programming. If you aren’t aware of what Boxplots are and how to implement the basic Boxplot using Python, have a look at the tutorial below.
Also Read: How to Box plot visualization with Pandas and Seaborn
Let’s start with Side-by-Side Boxplots right on!
Introduction to Side-By-Side Boxplots
Let’s say you are one of the organizers of a Talent Show and have different categories of performers such as Individual performers, Group performers, and Duo Performers. Now we need a way to visualize separate box plots for each category of performers. Normally, a box plot only visualizes a single group of data but in the case of side-by-side boxplots, we can plot multiple box plots in one frame.
In the case of the Talent show, have a look at the sample box plot visualization below:
By comparing the side-by-side boxplots of the Talent Show, we can get a sense of how different types of performers performed on average, as well as the moments when they shone the best throughout the show.
Now that we understand what side-by-side plots are, let’s move on to implementation!
Python Code Implementation of Side-By-Side Boxplot
For the creation of a side-by-side Boxplot, first of all, let’s create data to plot the visualizations. We will be taking three random uniform values in the range of 0 to 30 representing the cumulative scores given by the judges. To get uniform data, we will make use of the random.uniform
function present inside numpy
library as shown in the code below:
import numpy as np individual_Performers = np.random.uniform(0, 30, 100) group_Performers = np.random.uniform(0, 30, 100) duo_Performers = np.random.uniform(0, 30, 100) combined_Data = [individual_Performers, group_Performers, duo_Performers]
For the data points, we will be taking scores of 100 random performers. After getting Individual scores we will merge the data points in a combined variable just like shown above. Next, let’s focus on plotting the boxplots for the data we created.
Also Read: Understanding Python pandas.DataFrame.boxplot
Creating a Basic Side-By-Side Boxplot
For the creation of boxplots in this tutorial, we will make use of the matplotlib
library. We will make use of the boxplot
function that will first of all take the combined data along with the custom labels using the labels
attribute. Along with a basic plot, we will also add x and y labels and the title to the plot using the code below.
import matplotlib.pyplot as plt plt.figure(figsize=(15,5)) plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo']) plt.ylabel('Performance Scores') plt.xlabel('Type of Performer') plt.title('Talent Show Performance Analysis') plt.show()
The resulting plot when the above code is executed is :
In the plot, the orange line represents the mean/average score in each category. The lines being extended in each plot on either side represent the minimum and maximum values in each category. But the plot seems a little boring right? Let’s change that.
Before starting we need to introduce a new attribute path_artist
which when set to True
it implies that boxes in the plot are converted to patches
and it makes it possible to add colors and styling to the patches.
box = plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'], patch_artist=True)
Adding Colors to the Boxplots
We will start by adding colors to the individual box plots to make them look unique and different. There is no direct attribute to add colors to the boxplots and hence we need to use a different approach to achieve the same. First of all, let’s declare the list of colors, in this case, I will be using HEX
codes of the colors to make it more specific.
After that, we will be iterating through each box plot and adding color using the set_facecolor
method. Along with this, we will be adding colors to the title and labels to the plot using the color
attribute. The same procedure is shown in the code snippet below.
import matplotlib.pyplot as plt plt.figure(figsize=(15, 5)) box = plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'], patch_artist=True) colors = ['#F5B7B1', '#C39BD3', '#76D7C4'] for patch, color in zip(box['boxes'], colors): patch.set_facecolor(color) plt.ylabel('Performance Scores', color='#873600') plt.xlabel('Type of Performer', color='#873600') plt.title('Talent Show Performance Analysis', color='#0B5345') plt.show()
The resulting plot is:
As you can see now we can see a clear difference in all the plots as compared to the output shown before.
Adding Legend to the Boxplots
Adding Legend to the plot is a little complex job in the case of box plots. Don’t worry, I will make this job simpler by breaking the implementation down into simple steps. The main function that will be used is plt.legend
which takes multiple parameters, let’s learn about each one after another. Have a look at the code once:
plt.legend(handles=[ plt.Line2D([], [], marker='*', markersize=20, markerfacecolor=color, color='w', label=label) for color, label in zip(colors, ['Individual', 'Group', 'Duo']) ], loc='upper right')
Let’s understand the whole function in detail. First of all plt.legend
is used to add legends in the plots in general. To make it even simpler and a little less complex we will make use of handles
parameters. As a handle, we will be using Line2D
plot that takes multiple attributes where the first two parameters are x and y
coordinates for the positioning of the legend markers. In this case, I want to use the default positioning and hence will pass []
for both coordinates.
Next, we will customize our markers
, and for that, we will use marker shape, size, and color using the marker
, markersize
and markerfacecolor
attributes respectively. For the color, I will be assigning a temporary variable called color
which we will loop around later using the colors of the boxplots which we declared and used earlier.
Now you might be wondering what is the second color
attribute used for, this is the color around the marker. For clear visibility of the marker, I am setting the color to white (w
). Lastly, we have the label
attribute which is set to a temporary variable which again will be looped over later.
I have been mentioning the term looping
a lot in the previous parameters, the wait is over the next parameter to the handles
parameter is looping over the color and label variables dynamically depending on the values set previously. Lastly, to avoid overlapping I have used loc
parameter and set it to upper right
and you can change the position according to your preference.
Have a look at the output when the code mixes with the previous code:
Complete Python Code
import numpy as np import matplotlib.pyplot as plt individual_Performers = np.random.uniform(0, 30, 100) group_Performers = np.random.uniform(0, 30, 100) duo_Performers = np.random.uniform(0, 30, 100) combined_Data = [individual_Performers, group_Performers, duo_Performers] plt.figure(figsize=(15, 5)) box = plt.boxplot(combined_Data, labels=['Individual', 'Group', 'Duo'], patch_artist=True) colors = ['#F5B7B1', '#C39BD3', '#76D7C4'] for patch, color in zip(box['boxes'], colors): patch.set_facecolor(color) plt.ylabel('Performance Scores', color='#873600') plt.xlabel('Type of Performer', color='#873600') plt.title('Talent Show Performance Analysis', color='#0B5345') plt.legend(handles=[ plt.Line2D([], [], marker='*', markersize=20, markerfacecolor=color, color='w', label=label) for color, label in zip(colors, ['Individual', 'Group', 'Duo']) ], loc='upper right') plt.show()
Also Read:
Leave a Reply