Principal components regression in ML – Python
Learn how to perform principal components regression in Machine learning using Python.
PCR is used to reduce dimensions in a dataset by projecting them in lower dimension space and we also use it to reduce the number of variables. If variables are correlated to each other we can reduce.
Here is the code for it.
First, we will import some libraries
import numpy as np from sklearn.decomposition import PCA from sklearn.linear_model import LinearRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score
We will write a function for PCR
- Here we are using __init__ method to initialize the PCA and LinearRegression.
- The fit method performs PCA on the input features and then fits a linear regression model on the transformed features.
- The predict method transforms new input data using PCA and then uses the linear regression model to make predictions.
class PrincipalComponentRegression: def __init__(self, n_components): self.n_components = n_components self.pca = PCA(n_components=n_components) self.regressor = LinearRegression() def fit(self, X, y): # Perform PCA X_pca = self.pca.fit_transform(X) # Fit linear regression self.regressor.fit(X_pca, y) def predict(self, X): # Transform the input features using PCA X_pca = self.pca.transform(X) # Make predictions return self.regressor.predict(X_pca)
Now we will take random data(sample data)
np.random.seed(42) X = np.random.rand(100, 10) y = 2 * X[:, 0] + 3 * X[:, 1] - X[:, 2] + np.random.randn(100) * 0.1
split the data into sets and create and fit the PCR model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create and fit the PCR model pcr = PrincipalComponentRegression(n_components=5) pcr.fit(X_train, y_train)
Now we will make a prediction and Evaluate the model
# Make predictions y_pred = pcr.predict(X_test) # Evaluate the model mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse}") print(f"R-squared Score: {r2}")
Now we will calculate the variance ratio
# Calculate explained_variance_ratio = pcr.pca.explained_variance_ratio_ print("\nExplained Variance Ratio:") for i, ratio in enumerate(explained_variance_ratio): print(f"PC{i+1}: {ratio:.4f}")
Output-
Mean Squared Error: 0.9626445664900471 R-squared Score: 0.1734447632526135 Explained Variance Ratio: PC1: 0.1787 PC2: 0.1435 PC3: 0.1202 PC4: 0.1061 PC5: 0.0894
Leave a Reply