Principal components regression in ML – Python

Learn how to perform principal components regression in Machine learning using Python.

PCR is used to reduce dimensions in a dataset by projecting them in lower dimension space and we also use it to reduce the number of variables. If variables are correlated to each other we can reduce.

Here is the code for it.

First, we will import some libraries

import numpy as np
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

We will write a function for PCR

  • Here we are using __init__ method to initialize the PCA and LinearRegression.
  • The fit method performs PCA on the input features and then fits a linear regression model on the transformed features.
  • The predict method transforms new input data using PCA and then uses the linear regression model to make predictions.
class PrincipalComponentRegression:
def __init__(self, n_components):
    self.n_components = n_components
    self.pca = PCA(n_components=n_components)
    self.regressor = LinearRegression()

def fit(self, X, y):
# Perform PCA 
    X_pca = self.pca.fit_transform(X)

# Fit linear regression 
   self.regressor.fit(X_pca, y)

def predict(self, X):
# Transform the input features using PCA
    X_pca = self.pca.transform(X)

# Make predictions 
   return self.regressor.predict(X_pca)

Now we will take random data(sample data)

np.random.seed(42)
X = np.random.rand(100, 10)
y = 2 * X[:, 0] + 3 * X[:, 1] - X[:, 2] + np.random.randn(100) * 0.1

split the data into sets and create and fit the PCR model

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit the PCR model
pcr = PrincipalComponentRegression(n_components=5)
pcr.fit(X_train, y_train)

Now we will make a prediction and Evaluate the model

# Make predictions
y_pred = pcr.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared Score: {r2}")

Now we will  calculate the variance ratio

# Calculate 
explained_variance_ratio = pcr.pca.explained_variance_ratio_

print("\nExplained Variance Ratio:")
for i, ratio in enumerate(explained_variance_ratio):
    print(f"PC{i+1}: {ratio:.4f}")

Output-

Mean Squared Error: 0.9626445664900471
R-squared Score: 0.1734447632526135

Explained Variance Ratio:
PC1: 0.1787
PC2: 0.1435
PC3: 0.1202
PC4: 0.1061
PC5: 0.0894

Leave a Reply

Your email address will not be published. Required fields are marked *