Math Operations for Data Analysis in Python
Data Analysis is the process of extracting valuable information for data.
In python, we have a number of tools to do that. We will first import the numpy library, this library has many build-in tools to do a lot of mathematical operations easily.
Math involved
To show the math functions involved I have loaded a basic dataset, you can any dataset as per your convenience or get it from sklearn.datasets.
Load the datasets.
import numpy as np data = np.genfromtxt("0000000000002419_training_ccpp_x_y_train (1).csv", delimiter=",")
As you can see, its a simple dataset with just numerical values in an array form.
array([[ 8.58, 38.38, 1021.03, 84.37, 482.26], [ 21.79, 58.2 , 1017.21, 66.74, 446.94], [ 16.64, 48.92, 1011.55, 78.76, 452.56], ..., [ 29.8 , 69.34, 1009.36, 64.74, 437.65], [ 16.37, 54.3 , 1017.94, 63.63, 459.97], [ 30.11, 62.04, 1010.69, 47.96, 444.42]])
SUM
To get the sum of the data
data.sum() 11588436.350000001
MAX
T get the maximum value in the data
data.max() 1033.3
MIN
To get the minimum value in the data
data.min() 1.81
MEAN
To get the mean of the data
data.mean() 322.97760172798223
STANDARD DEVIATION
To get the standard deviation of the data
data.std() 379.76319759971136
These are some of the functions used, there are many more.
Leave a Reply