Predict Population Growth Using Machine Learning in Python

Post Views: 1,771

In this tutorial, we will learn how to predict population growth using Machine Learning in Python. We will follow the approach in Python and implement a very popular yet very basic Machine Learning algorithm called Linear Regression.

How to predict population growth in Python with scikit-learn

In order to follow this tutorial, you will need a basic understanding of Python code. We will go through the concepts of Linear Regression in-depth and try to explain the entire algorithm with correspondence to the code we use to run it.

Why Linear Regression?

As the population of a country can take any values and not some selected discrete values we realize that this is a regression problem and not a classification problem. Linear Regression is one of the most basic algorithms of Machine Learning. This algorithm lets us predict numerical data. Hence we use Linear Regression to solve this problem.

Up next, are the steps we take to solve the problem.

Importing Libraries

There are a lot of built-in libraries available in Python that help us in writing easy, crisp and error-free code. We first import such libraries at the beginning of our program.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
%matplotlib inline

Exploratory Data Analysis (EDA)

The dataset we use here has been collected from the internet. It is freely available.

Please find the data set in the link below :

population.csv

We first load the data into a pandas data frame and then make a dataset so as to run our model on it.

We change the values of countries to numerical values.

And lastly, we normalize the data to scale using the function from scikit library to ease out the prediction of growth rate with machine learning.

To know more about the normalize function, do give this a read: sklearn.preprocessing.normalize in Python

data = pd.read_csv('population.csv')
from sklearn.preprocessing import LabelEncoder
lc = LabelEncoder()
lc.fit(df['LOCATION'])
TIME = lc.transform(df['LOCATION'])
df['Country'] = TIME
df.drop(['LOCATION'],axis=1,inplace=True)
X = df.drop(['Value'],axis=1)
y = df['Value'].to_numpy()
from sklearn import preprocessing
normalized_X = preprocessing.normalize(X)

Splitting dataset into training and test data

Next, we split the dataset into training and test data using the sklearn library.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(normalized_X, y, test_size=0.3, random_state=101)

Creating the model and fitting the data

We create the model from the Linear regression library and fit our test data into the model.

from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)

Predicting Results

Predicting results is very simple as you can see.

predictions = lm.predict(X_test)

Estimating Error

We will use the seaborn library to plot the following graph :

predict population growth using machine learning in Python

In the image we see the dist plot between the given values in the test data vs. the values our model predicted.

Now to see the accuracy of our model we will use the tools of Mean Squared Error and Mean Absolute Error.

from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

In this way, we can predict the population growth using Machine Learning in Python.

Link to Jupyter Notebook: Population Growth

So download your own dataset and get coding. Hope this was helpful!

One response to “Predict Population Growth Using Machine Learning in Python”

Abhishek says:

May 7, 2020 at 5:29 pm

Why you have used Linear Regression? Why not nonlinear Regression?
And how to prove assumptions of linear Regression

Reply