Visualizing the distribution of a dataset in Python

Fellow coders, in this tutorial section, we will visualize the distribution of a dataset in Python. We use visualization techniques to better understand our data and to make it presentable to others.

For this tutorial, we will be using the following libraries to accomplish our task:

  • seaborn
  • matplotlib
  • pandas

What is Seaborn:

Seaborn is built on top of Matplotlib and is a very powerful library that provides a lot of beautiful plot types.

Before we begin with the coding part, make sure that you have seaborn and pandas installed already. If not, use the following lines of code:

!pip install seaborn

!pip install pandas

 

There are two types of distributions:

  1. Univariate Distribution
  2. Bivariate Distribution

Working with the code:

Plotting Univariate distributions:

x = np.random.normal(size=50)
sns.distplot(x)

The code above will give us the following output:

Histograms:

x = np.random.normal(size=100)

sb.distplot(x, kde=False)

This code will generate the following output:

 

Kernel density estimation:

x = np.random.normal(0, 1, size=30)
bandwidth = 1.06 * x.std() * x.size ** (-1 / 5.)
support = np.linspace(-4, 4, 200)

kernels = []
for x_i in x:

    kernel = stats.norm(x_i, bandwidth).pdf(support)
    kernels.append(kernel)
    plt.plot(support, kernel, color="r")

sb.rugplot(x, color=".2", linewidth=3);

This code will generate the following output:

 

Plotting Bivariate distribution:

Scatterplot:

x = np.random.normal(size=100)
y = np.random.normal(size=100)

sb.jointplot(x, y);

The output of the above code is:

Kernel density estimation:

x = np.random.normal(size=100)
y = np.random.normal(size=100)

sb.jointplot(x, y, kind='kde');

The output of the above code is:

Hexbin plot:

mean, cov = [0, 1], [(1, .5), (.5, 1)]
x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sb.axes_style("white"):
    sb.jointplot(x=x, y=y, kind="hex", color="k");

The output of the code above is:

Visualizing Pairwise relationship:

iris = sb.load_dataset("iris")
sb.pairplot(iris, hue="species");

The output of the code above is:

Leave a Reply

Your email address will not be published. Required fields are marked *