Weather Prediction Using Machine Learning in Python

In this tutorial, we will learn how to predict the future temperature of a particular place using machine learning in Python language.


  1. Machine learning is a part of Artificial intelligence with the help of which any system can learn and improve from existing real datasets to generate an accurate output.
  2. The machines are programmed in such a way that the program looks for patterns in the data to make various decisions in the future without human intervention.


The various python modules required for the development of this project are:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

pandas: allows the user to perform various actions for manipulating a dataset.

numpy: it is an array-processing package and provides tools to work with data arrays.

train_test_split: it is a part of sklearn.model which helps in dividing the data for training and testing purposes.

RandomForestRegressor: capable of doing regression and classification of the data with the help of decision trees at training time.

Procedure to develop the model for weather prediction

NOTE: The dataset used in this program is extracted from

Dataset used: temps2.csv for the csv file.

creation=pd.read_csv('temps2.csv', sep=';')
print("The shape of our feature is:", creation.shape)

  1. First of all, we read the ‘.csv’ file containing the required dataset using ‘read_csv()’ function.
  2. Now, to convert the categorical data to numerical data, we use ‘get_dummies()’ function.
  3. To get the required column of the dataset on which we have to perform the training and testing, we use ‘iloc[]’ function.
  4. To store the data to be processed separately, use ‘array()’ function of the numpy header file.
  5. Now’s the time to divide our data for training and testing purposes.

We use train_test_split() function to achieve so.

train_creation, test_creation, train_labels, test_labels= train_test_split(creation,labels, test_size=0.30,random_state=4)


The syntax of the function is:

train_test_split(X,y, train_size, test_data_size, random_state=1)

X,y: parameters of the dataset that are used to split

train_size: sets the size of the training set.

test_size: sets the size of testing data.

random_state: performs a random split.


print('Training creation shape:', train_creation.shape)
print('Training labels shape:', train_labels.shape)
print('Testing creation shape:', test_creation.shape)
print('Testing label shape:', test_labels.shape)
rf=RandomForestRegressor(n_estimators=1000, random_state=4), train_labels)
errors=abs(predictions - test_labels)
print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')
mape=100* (errors/test_labels)
print('Accuracy of the model:', round(accuracy,2),'%')

6. Now, to perform regression and classification of the data so as to get the accurate result we use RandomForestRegressor().
The syntax of this function is:

RandomForestRegressor(n_estimators, random_state)

n_estimators: no. of decisions tress  in the training data.

7. Also, for better accuracy of the developed model, we use ‘fit()‘ function. This function trains the model using data examples and best matches the curvature of the given data points.

8. Now, to finally predict future values using the model, we should use ‘predict()‘ function which is in-built in pandas.

9. We print the predictions and also calculate and display the accuracy of our model.


The shape of our features is: (9192, 9)
Training creation Shape: (6434, 8)
Training Labels Shape: (6434,)
Testing creation Shape: (2758, 8)
Testing Labels Shape: (2758,)
[11.54557 23.62543 19.97311 ... 21.09666 11.20721 20.98867]
Mean Absolute Error: 1.04 degrees.
Accuracy of the model : 94.13 %.

Also read:

5 responses to “Weather Prediction Using Machine Learning in Python”

  1. Aryan says:

    Really great work. nice and easy to understand. Loved it!


    The shape of our feature is: (9192, 10)
    Training creation shape: (6434, 3233)
    Training labels shape: (6434,)
    Testing creation shape: (2758, 3233)
    Testing label shape: (2758,)
    ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’).

  3. Wshnu says:

    It’s a great website for beginners to understand machine learning. But, I have one question.
    I tried to download the dataset from with format .csv, then when I want to run in the script, there’s an error in the name of the variable that couldn’t be read.
    Because when I look at the format of csv, it’s quite different.
    Does it have any effect?

  4. Em says:

    Thanks for this post! I ran ito an error and im not very good at debugging, can any one help?
    divide by zero encountered in true_divide
    mape=100* (errors/test_labels)
    Accuracy of the model: -inf %

  5. yaf says:

    this is the error I encountered after compiling this code:
    [9192 rows x 9 columns]
    The shape of our feature is: (9192, 9)
    Training creation shape: (6434, 8)
    Training labels shape: (6434,)
    Testing creation shape: (2758, 8)
    Testing label shape: (2758,)
    [21.29289 19.278 8.53319 … 20.69914 15.77222 4.19896]
    Mean Absolute Error: 1.05 degrees.
    Accuracy of the model: -inf %
    C:\Users\dell\ RuntimeWarning: divide by zero encountered in true_divide
    mape=100* (errors/test_labels)
    can you help?

Leave a Reply

Your email address will not be published. Required fields are marked *