ML

Types of Regression in Machine Learning

Regression is a fundamental technique in machine learning and statistics used to model relationships between a dependent variable (target) and one or more independent variables (predictors). It helps in predicting continuous outcomes and understanding patterns in data. In this blog, we will explore different types of regression models, their use cases, and implementation using Python.


1. Linear Regression

Definition:

Linear regression models the relationship between two variables using a straight line, assuming a linear relationship between the dependent and independent variables.

Use Case:

Predicting house prices based on square footage.

Formula:

Y=β0+β1X+ϵY = \beta_0 + \beta_1 X + \epsilon

Python Implementation:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
Y = np.array([150000, 200000, 250000, 300000, 350000])

# Model training
model = LinearRegression()
model.fit(X, Y)

# Prediction
pred = model.predict([[2200]])
print(f"Predicted price for 2200 sq ft: ${pred[0]:.2f}")

2. Multiple Linear Regression

Definition:

Extends linear regression by incorporating multiple independent variables. It assumes that predictors are not highly correlated (multicollinearity).

Use Case:

Predicting car mileage based on engine size, weight, and horsepower.

Formula:

Y=β0+β1X1+β2X2+…+βnXn+ϵY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + … + \beta_n X_n + \epsilon

Python Implementation:

from sklearn.linear_model import LinearRegression

# Sample dataset
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]])  # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])

model = LinearRegression()
model.fit(X, Y)
pred = model.predict([[2200, 4]])
print(f"Predicted price: ${pred[0]:.2f}")

3. Polynomial Regression

Definition:

Handles non-linear relationships by introducing polynomial terms.

Use Case:

Predicting population growth trends.

Formula:

Y=β0+β1X+β2X2+β3X3+…+ϵY = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + … + \epsilon

Python Implementation:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

poly_model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())

# Sample data
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
Y = np.array([150000, 200000, 250000, 300000, 350000])

poly_model.fit(X, Y)

pred = poly_model.predict(np.array([[2200]]))
print(f"Predicted price (Polynomial): ${pred[0]:.2f}")

4. Ridge Regression (L2 Regularization)

Definition:

Regularized regression technique that penalizes large coefficients to prevent overfitting.

Use Case:

Used in high-dimensional datasets (e.g., predicting stock prices).

Python Implementation:

from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1.0)
# Sample data
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]])  # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])

ridge_model.fit(X, Y)
pred = ridge_model.predict([[2200, 4]])
print(f"Predicted price (Ridge): ${pred[0]:.2f}")

5. Lasso Regression (L1 Regularization)

Definition:

Similar to Ridge but can shrink some coefficients to zero, performing feature selection.

Use Case:

Used when we need automatic feature selection in models.

Python Implementation:

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=0.1)
# Sample data
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]])  # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])

lasso_model.fit(X, Y)
pred = lasso_model.predict([[2200, 4]])
print(f"Predicted price (Lasso): ${pred[0]:.2f}")

6. Logistic Regression (For Classification)

Definition:

Although called ‘Logistic Regression,’ it is actually used for binary classification problems (e.g., Yes/No, 0/1, True/False).

Use Case:

Spam detection (Spam vs. Not Spam).

Python Implementation:

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

X, Y = make_classification(n_samples=100, n_features=2, random_state=42)
log_model = LogisticRegression()
log_model.fit(X, Y)
sample_input = X[0].reshape(1, -1)
pred = log_model.predict(sample_input)
print(f"Predicted class: {pred[0]}")

7. Poisson Regression

Definition:

Used for count-based data (e.g., number of customer complaints per day).

Python Implementation:

from sklearn.linear_model import PoissonRegressor

poisson_model = PoissonRegressor()
# Sample data
X = np.array([[1], [2], [3], [4], [5]])  # Days
Y = np.array([3, 7, 9, 12, 15])  # Count of events

poisson_model.fit(X, Y)
pred = poisson_model.predict([[2200, 4]])
print(f"Predicted count (Poisson): {pred[0]:.2f}")

Use Case:

Predicting website visits per hour.


8. Elastic Net Regression

Definition:

Combination of Ridge and Lasso regression, particularly useful when there are correlated predictors.

Use Case:

Used when dealing with highly correlated features.

Python Implementation:

from sklearn.linear_model import ElasticNet

elastic_model = ElasticNet(alpha=0.1, l1_ratio=0.5)
# Sample data
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]])  # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])

elastic_model.fit(X, Y)
pred = elastic_model.predict([[2200, 4]])
print(f"Predicted price (Elastic Net): ${pred[0]:.2f}")

9. Locally Weighted Regression (LOWESS)

Definition:

Locally Weighted Regression (LOWESS) is a non-parametric regression technique that fits multiple local models rather than a single global model. It gives more weight to nearby points, making it ideal for capturing complex, non-linear relationships.

Use Case:

  • Used in time series smoothing and non-linear trend detection (e.g., stock market trends, economic forecasting).

Python Implementation:

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

# Sample data
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Y = np.array([2.2, 2.8, 3.6, 4.5, 5.1, 6.5, 7.4, 7.8, 9.2, 10.1])

# Fit LOWESS model
lowess = sm.nonparametric.lowess(Y, X, frac=0.3)  # frac determines smoothing level

# Plot results
plt.scatter(X, Y, label="Original Data", color="red")
plt.plot(lowess[:, 0], lowess[:, 1], label="LOWESS Fit", color="blue")
plt.legend()
plt.show()

10. Softmax Regression

Relation to Logistic Regression:

Softmax Regression is a generalization of Logistic Regression. While logistic regression is used for binary classification (two classes), Softmax Regression extends it to multi-class classification problems by using the softmax function to assign probabilities to multiple categories.

Definition:

Softmax Regression is a generalization of logistic regression for multi-class classification problems. It uses the softmax function to assign probabilities to multiple categories.

Use Case:

  • Predicting the category of an image (e.g., classifying handwritten digits in the MNIST dataset).

Formula:

Python Implementation:

import numpy as np
from sklearn.linear_model import LogisticRegression

# Sample data
X = np.array([[1, 2], [2, 1], [2, 3], [3, 2], [3, 3], [4, 4]])  # Features
Y = np.array([0, 0, 1, 1, 2, 2])  # Multi-class labels

# Model training
softmax_model = LogisticRegression(multi_class='multinomial', solver='lbfgs')
softmax_model.fit(X, Y)

# Prediction
pred = softmax_model.predict([[3, 3]])
print(f"Predicted class: {pred[0]}")

Conclusion

Choosing the right regression model depends on the data type, number of features, and the relationship between variables. Here’s a quick summary:

ScenarioBest Regression Type
Predicting continuous values (e.g., house prices)Linear Regression
Data has multiple predictorsMultiple Linear Regression
Non-linear trendsPolynomial Regression, LOWESS
High-dimensional dataRidge, Lasso, Elastic Net
Binary classificationLogistic Regression
Multi-class classificationSoftmax Regression
Count-based dataPoisson Regression

Would you like me to add more real-world applications or explanations? Let me know in the comments! 🚀

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *