Weather Prediction Using Machine Learning in Python
In this tutorial, we will learn how to predict the future temperature of a particular place using machine learning in Python language.
- Machine learning is a part of Artificial intelligence with the help of which any system can learn and improve from existing real datasets to generate an accurate output.
- The machines are programmed in such a way that the program looks for patterns in the data to make various decisions in the future without human intervention.
PYTHON MODULES REQUIRED
The various python modules required for the development of this project are:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor
pandas: allows the user to perform various actions for manipulating a dataset.
numpy: it is an array-processing package and provides tools to work with data arrays.
train_test_split: it is a part of sklearn.model which helps in dividing the data for training and testing purposes.
RandomForestRegressor: capable of doing regression and classification of the data with the help of decision trees at training time.
Procedure to develop the model for weather prediction
NOTE: The dataset used in this program is extracted from meteoblue.com
Dataset used: temps2.csv for the csv file.
creation=pd.read_csv('temps2.csv', sep=';') creation.head(5) print(creation) print("The shape of our feature is:", creation.shape) creation.describe() creation=pd.get_dummies(creation) creation.iloc[:,5:].head(5) labels=np.array(creation['Temperature']) creation=creation.drop('Temperature',axis=1) creation_list=list(creation.columns) creation=np.array(creation)
- First of all, we read the ‘.csv’ file containing the required dataset using ‘read_csv()’ function.
- Now, to convert the categorical data to numerical data, we use ‘get_dummies()’ function.
- To get the required column of the dataset on which we have to perform the training and testing, we use ‘iloc’ function.
- To store the data to be processed separately, use ‘array()’ function of the numpy header file.
- Now’s the time to divide our data for training and testing purposes.
We use train_test_split() function to achieve so.
train_creation, test_creation, train_labels, test_labels= train_test_split(creation,labels, test_size=0.30,random_state=4)
The syntax of the function is:
train_test_split(X,y, train_size, test_data_size, random_state=1)
X,y: parameters of the dataset that are used to split
train_size: sets the size of the training set.
test_size: sets the size of testing data.
random_state: performs a random split.
print('Training creation shape:', train_creation.shape) print('Training labels shape:', train_labels.shape) print('Testing creation shape:', test_creation.shape) print('Testing label shape:', test_labels.shape) rf=RandomForestRegressor(n_estimators=1000, random_state=4) rf.fit(train_creation, train_labels) predictions=rf.predict(test_creation) print(predictions) errors=abs(predictions - test_labels) print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.') mape=100* (errors/test_labels) accuracy=100-np.mean(mape/3) print('Accuracy of the model:', round(accuracy,2),'%')
6. Now, to perform regression and classification of the data so as to get the accurate result we use RandomForestRegressor().
The syntax of this function is:
n_estimators: no. of decisions tress in the training data.
7. Also, for better accuracy of the developed model, we use ‘fit()‘ function. This function trains the model using data examples and best matches the curvature of the given data points.
8. Now, to finally predict future values using the model, we should use ‘predict()‘ function which is in-built in pandas.
9. We print the predictions and also calculate and display the accuracy of our model.
The shape of our features is: (9192, 9) Training creation Shape: (6434, 8) Training Labels Shape: (6434,) Testing creation Shape: (2758, 8) Testing Labels Shape: (2758,) [11.54557 23.62543 19.97311 ... 21.09666 11.20721 20.98867] Mean Absolute Error: 1.04 degrees. Accuracy of the model : 94.13 %.