Create a GGPlot with Multiple Lines in Python

Observing a trend or a pattern in data has become an essential task in today’s data-driven world, and a ggplot makes data visualization highly effective. In this tutorial, we will create a ggplot with multiple lines in Python.

The Grammar of Graphics

‘Ggplot’ derives from the concept of the grammar of graphics. Just like grammar sets out generic rules for a language, grammar of graphics establishes fundamental components for visualizing data. These components include data, aesthetics, statistical transformations, geometric objects, positional adjustments and such.

The plotnine is the Python adaption of the grammar of graphics, inspired by the ggplot2 package in R. It is currently the most mature package available in Python for plotting ggplot graphs.

You can check: How to install ggplot2 in Python

Scatter plot using ggplot2 in Python with customization

Creating a ggplot with multiple lines in Python

Consider housing sales data for different cities in Texas. Let us visualize these sales across the years with the help of a multiple-line ggplot graph.

1. Install the libraries and import them

Simply use the pip command to install plotnine.

pip install plotnine

Note: Ensure that you have pandas installed as well.

And import the library into the code.

from plotnine import *

2. Prepare the data

We will utilize the Housing Sales in Texas dataset available in plotnine as our datasource. Let us import the data.

from plotnine.data import txhousing

To plot the graph, we need to structure the data based on our example. Here, we will draw the sum of sales of different cities through the years. To simplify the graph, let us exclude the sales figures below 5000.

yearly_sales = txhousing.groupby(['year', 'city'])['sales'].sum()
yearly_sales = yearly_sales.reset_index(name='total_sales')

yearly_sales = yearly_sales[(yearly_sales['total_sales'] >= 5000)]

print(yearly_sales)
year city total_sales
3 2000 Austin 18621.0
9 2000 Collin County 10000.0
11 2000 Dallas 45446.0
12 2000 Denton County 6149.0
13 2000 El Paso 5109.0
.. ... ... ...
705 2015 Fort Worth 7300.0
709 2015 Houston 48109.0
719 2015 Montgomery County 5474.0
720 2015 NE Tarrant County 5783.0
726 2015 San Antonio 16455.0

[194 rows x 3 columns]

3. Plot the graph

Finally, let us create our ggplot by defining the mandatory components of the grammar of graphics.

  • Data– the data source of the graph.
  • Aesthetics– the mapping of the x-axis and the y-axis.
  • Geometric object– the type of visualization (point, line, bar, etc.). Since we need to plot multiple lines, we will make use of the geom_line() function.
p = (ggplot(data=yearly_sales, mapping=aes(x='year', y='total_sales', color='city')) 
     + geom_line())

print(p)

multiple line ggplot in Python

The graph effectively visualizes the trend in housing sales. You can further explore additional features by customizing the graph and experimenting with different datasets. Happy Learning!

Leave a Reply

Your email address will not be published. Required fields are marked *