Multivariate Adaptive Regression Splines in Python

In this tutorial, we will learn about MARS (multi-adaptive regression spline) in Python.

What is MARS?

MARS is a robust regression technique utilized for modeling intricate variables in collaborative efforts with high-dimensional data containing numerous inputs. The primary concepts of MARS revolve around fitting fundamental linear regression models individually within various sections upon segmenting the input space into smaller, more manageable chunks represented as “splines” or “knots”.

Basically, MARS breaks down a complex problem into smaller components that are addressed independently before combining all responses to generate a final comprehensive model accurately reflecting the intricate relationships between input and output variables. Overall, MARS serves as an effective, versatile method suitable for simulating complex data across diverse fields like environmental science, engineering, and finance, proving instrumental in deciphering intricate connections and generating precise forecasts!!

 

Python Code: Multivariate Adaptive Regression Splines

 

!pip install earth
!pip install statsmodels
!pip install scikit-learn-extra
  1. !pip install earth: This here’s a command to install the earth package using pip. The earth package provides functionalities related to fitting and predicting with Multivariate Adaptive Regression Splines (MARS) models.
  2. !pip install statsmodels: Installing the statsmodels package with this command provides you access to it’s classes and functions used for estimating statistical models and examining data sets.
  3. !pip install scikit-learn-extra: This command installs the scikit-learn-extra package, which provides additional functionalities to the scikit-learn library for machine learning!!!!

 

import earth
import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot
import numpy
  1. import earth: This imports the earth module, which contains functions and classes related to MARS models.
  2. import statsmodels.api as sm: For statistical analysis and modeling, the Python statsmodels library is an effective tool. It was imported as “sm” by us.
  3. from statsmodels.regression.linear_model import OLS: The Ordinary Least Squares (OLS) class is imported as a result. Unknown parameters of a linear regression model’s  can be estimated using the OLS approach.
  4. from sklearn.linear_model import LinearRegression: Linear regression is a data analysis technique that uses a known value to predict an unknown value.
  5. from matplotlib import pyplot: In order to create plots and visualizations in Python, this imports the pyplot module from the matplotlib package.
  6. import numpy: A Python package called NumPy (Numerical Python) is utilized when working with arrays. It also includes functions for working with matrices and linear algebra. For Instance, A collective of mathematical operations that are performed on these arrays.
numpy.random.seed(0)
m = 1000000
n = 10
X = 90*numpy.random.uniform(size=(m,n)) - 50
y = numpy.abs(X[:,6] - 4.0) + 1*numpy.random.normal(size=m)

Here we have create a small dataset for our selfs to perform tasks.

  1. numpy.random.seed(0): This sets the seed for the random number generator in NumPy to ensure reproducibility of results. The “seed” is a parameter that initializes the random number generator in NumPy.
  2. m = 1000000 and n = 10: These lines define the variables m and n, representing the number of samples and the number of features, respectively.
  3. X = 90*numpy.random.uniform(size=(m,n)) - 50: This generates an array of random numbers and thesize parameter specifies the shape of the array, So it creates an array with m rows and n columns. Now multiplying by 90 and subtraction 50 make sure that the values in the array will now range from 0 to 90 instead of 0 to 1 and final values in the array will range from -50 to 40. Why is this step important?
    • Data Generation
    • Data Range
  4. y = numpy.abs(X[:,6] - 4.0) + 1*numpy.random.normal(size=m)numpy.abs this calculates the absolute value of each element in the array obtained from the previous step. It ensures that all values are positive. numpy.random.normal it generates the target variable y for a regression problem. Why is this step important ?
    • Noise Introduction
    • Creating Target Variable
    • Model Training and Evaluation

 

Now the final step

print(OLS(y, X).fit().summary())
  1. OLS(y, X): This constructs an instance of the OLS regression model, which is a kind of regression model.
  2. .fit() creates an OLS model fit to the given data (y is the target variable and X is the feature matrix).summary() produces an output summarizing the OLS model fit.
  3. print(…): The synopsis is printed here.

 

Output

MARS Output Python

Leave a Reply

Your email address will not be published. Required fields are marked *