Predict Population Growth Using Machine Learning in Python
In this tutorial, we will learn how to predict population growth using Machine Learning in Python. We will follow the approach in Python and implement a very popular yet very basic Machine Learning algorithm called Linear Regression.
How to predict population growth in Python with scikit-learn
In order to follow this tutorial, you will need a basic understanding of Python code. We will go through the concepts of Linear Regression in-depth and try to explain the entire algorithm with correspondence to the code we use to run it.
Why Linear Regression?
As the population of a country can take any values and not some selected discrete values we realize that this is a regression problem and not a classification problem. Linear Regression is one of the most basic algorithms of Machine Learning. This algorithm lets us predict numerical data. Hence we use Linear Regression to solve this problem.
Up next, are the steps we take to solve the problem.
Importing Libraries
There are a lot of built-in libraries available in Python that help us in writing easy, crisp and error-free code. We first import such libraries at the beginning of our program.
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline
Exploratory Data Analysis (EDA)
The dataset we use here has been collected from the internet. It is freely available.
Please find the data set in the link below :
We first load the data into a pandas data frame and then make a dataset so as to run our model on it.
We change the values of countries to numerical values.
And lastly, we normalize the data to scale using the function from scikit library to ease out the prediction of growth rate with machine learning.
To know more about the normalize function, do give this a read: sklearn.preprocessing.normalize in Python
data = pd.read_csv('population.csv') from sklearn.preprocessing import LabelEncoder lc = LabelEncoder() lc.fit(df['LOCATION']) TIME = lc.transform(df['LOCATION']) df['Country'] = TIME df.drop(['LOCATION'],axis=1,inplace=True) X = df.drop(['Value'],axis=1) y = df['Value'].to_numpy() from sklearn import preprocessing normalized_X = preprocessing.normalize(X)
Splitting dataset into training and test data
Next, we split the dataset into training and test data using the sklearn library.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(normalized_X, y, test_size=0.3, random_state=101)
Creating the model and fitting the data
We create the model from the Linear regression library and fit our test data into the model.
from sklearn.linear_model import LinearRegression lm = LinearRegression() lm.fit(X_train,y_train)
Predicting Results
Predicting results is very simple as you can see.
predictions = lm.predict(X_test)
Estimating Error
We will use the seaborn library to plot the following graph :
In the image we see the dist plot between the given values in the test data vs. the values our model predicted.
Now to see the accuracy of our model we will use the tools of Mean Squared Error and Mean Absolute Error.
from sklearn import metrics print('MAE:', metrics.mean_absolute_error(y_test, predictions)) print('MSE:', metrics.mean_squared_error(y_test, predictions)) print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))
In this way, we can predict the population growth using Machine Learning in Python.
Link to Jupyter Notebook: Population Growth
So download your own dataset and get coding. Hope this was helpful!
Why you have used Linear Regression? Why not nonlinear Regression?
And how to prove assumptions of linear Regression