How to use numpy.percentile() in Python

In this article, we will see how to make use of numpy.percentile() in Python. Before going into discussing the method, let’s first see what percentile means.

A percentile value is one that divides the dataset into 2 categories, the ones that have a value lesser than the percentile value and the ones that are greater. So the Nth percentile value is the value that is larger than or equal to N% of the values in the dataset. For example, calculating the 20th percentile gives 20% of the smallest values. I hope you understood the concept of percentile, let’s now discuss the actual method.

Before going into discussing the method, if you haven’t installed numpy on your system yet, run the following command in your command prompt.

pip install numpy

Try importing the numpy module in your Python shell to check if the installation was successful.

How to use numpy.percentile() in Python from the NumPy module

The description of the numpy.percentile() method is:

numpy.percentile(np_array, N, axis, out)

where,
np_array - the set of values that we are working on
N - N as in Nth percentile
axis - used to calculate percentile values in multidimensional numpy arrays along a specific axis
out - the array in which the result should be placed
returns - Nth percentile value or an array of percentile values along an axis

Let’s now see some examples of the percentile method in action:

Example #1:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Gives the 90th percentile value
print(np.percentile(arr, 90))

The output for the above code is:

9.1

It is true that 90% of the values in the array are smaller than 9.1. You can round off these floating-point values to the nearest integer.

Example #2:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Gives the percentile values of the values along the vertical axis in the grid
print(np.percentile(arr, 90, axis = 0))

The output for the above code is:

[6.4 7.4 8.4]

The output can be interpreted as follows. Since we are using axis = 0, we need to consider the values column-wise. 6.4 is larger than 90% of the values in the first column and the same with 7.4 and 8.4 in their respective columns.

The use cases for this function is when you need to divide your dataset based on a percentile basis. Using this method, you can get all the values that are within the Nth percentile value.

I hope you found this article helpful in understanding the usage of numpy.percentile() in Python.

See also:

Leave a Reply

Your email address will not be published.