Math Operations for Data Analysis in Python

Data Analysis is the process of extracting valuable information for data.

In python, we have a number of tools to do that. We will first import the numpy library, this library has many build-in tools to do a lot of mathematical operations easily.

Math involved

To show the math functions involved I have loaded a basic dataset, you can any dataset as per your convenience or get it from sklearn.datasets.

Load the datasets.

import numpy as np
data = np.genfromtxt("0000000000002419_training_ccpp_x_y_train (1).csv", delimiter=",")

As you can see, its a simple dataset with just numerical values in an array form.

array([[   8.58,   38.38, 1021.03,   84.37,  482.26],
       [  21.79,   58.2 , 1017.21,   66.74,  446.94],
       [  16.64,   48.92, 1011.55,   78.76,  452.56],
       ...,
       [  29.8 ,   69.34, 1009.36,   64.74,  437.65],
       [  16.37,   54.3 , 1017.94,   63.63,  459.97],
       [  30.11,   62.04, 1010.69,   47.96,  444.42]])

SUM

To get the sum of the data

data.sum()

11588436.350000001

MAX

T get the maximum value in the data

data.max()

1033.3

MIN

To get the minimum value in the data

data.min()

1.81

MEAN

To get the mean of the data

data.mean()

322.97760172798223

STANDARD DEVIATION

To get the standard deviation of the data

data.std()

379.76319759971136

These are some of the functions used, there are many more.

Leave a Reply