Regression is a fundamental technique in machine learning and statistics used to model relationships between a dependent variable (target) and one or more independent variables (predictors). It helps in predicting continuous outcomes and understanding patterns in data. In this blog, we will explore different types of regression models, their use cases, and implementation using Python.
1. Linear Regression
Definition:
Linear regression models the relationship between two variables using a straight line, assuming a linear relationship between the dependent and independent variables.
Use Case:
Predicting house prices based on square footage.
Formula:
Y=β0+β1X+ϵY = \beta_0 + \beta_1 X + \epsilon
Python Implementation:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
Y = np.array([150000, 200000, 250000, 300000, 350000])
# Model training
model = LinearRegression()
model.fit(X, Y)
# Prediction
pred = model.predict([[2200]])
print(f"Predicted price for 2200 sq ft: ${pred[0]:.2f}")
2. Multiple Linear Regression
Definition:
Extends linear regression by incorporating multiple independent variables. It assumes that predictors are not highly correlated (multicollinearity).
Use Case:
Predicting car mileage based on engine size, weight, and horsepower.
Formula:
Y=β0+β1X1+β2X2+…+βnXn+ϵY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + … + \beta_n X_n + \epsilon
Python Implementation:
from sklearn.linear_model import LinearRegression
# Sample dataset
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]]) # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])
model = LinearRegression()
model.fit(X, Y)
pred = model.predict([[2200, 4]])
print(f"Predicted price: ${pred[0]:.2f}")
3. Polynomial Regression
Definition:
Handles non-linear relationships by introducing polynomial terms.
Use Case:
Predicting population growth trends.
Formula:
Y=β0+β1X+β2X2+β3X3+…+ϵY = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + … + \epsilon
Python Implementation:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
poly_model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
# Sample data
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
Y = np.array([150000, 200000, 250000, 300000, 350000])
poly_model.fit(X, Y)
pred = poly_model.predict(np.array([[2200]]))
print(f"Predicted price (Polynomial): ${pred[0]:.2f}")
4. Ridge Regression (L2 Regularization)
Definition:
Regularized regression technique that penalizes large coefficients to prevent overfitting.
Use Case:
Used in high-dimensional datasets (e.g., predicting stock prices).
Python Implementation:
from sklearn.linear_model import Ridge
ridge_model = Ridge(alpha=1.0)
# Sample data
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]]) # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])
ridge_model.fit(X, Y)
pred = ridge_model.predict([[2200, 4]])
print(f"Predicted price (Ridge): ${pred[0]:.2f}")
5. Lasso Regression (L1 Regularization)
Definition:
Similar to Ridge but can shrink some coefficients to zero, performing feature selection.
Use Case:
Used when we need automatic feature selection in models.
Python Implementation:
from sklearn.linear_model import Lasso
lasso_model = Lasso(alpha=0.1)
# Sample data
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]]) # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])
lasso_model.fit(X, Y)
pred = lasso_model.predict([[2200, 4]])
print(f"Predicted price (Lasso): ${pred[0]:.2f}")
6. Logistic Regression (For Classification)
Definition:
Although called ‘Logistic Regression,’ it is actually used for binary classification problems (e.g., Yes/No, 0/1, True/False).
Use Case:
Spam detection (Spam vs. Not Spam).
Python Implementation:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
X, Y = make_classification(n_samples=100, n_features=2, random_state=42)
log_model = LogisticRegression()
log_model.fit(X, Y)
sample_input = X[0].reshape(1, -1)
pred = log_model.predict(sample_input)
print(f"Predicted class: {pred[0]}")
7. Poisson Regression
Definition:
Used for count-based data (e.g., number of customer complaints per day).
Python Implementation:
from sklearn.linear_model import PoissonRegressor
poisson_model = PoissonRegressor()
# Sample data
X = np.array([[1], [2], [3], [4], [5]]) # Days
Y = np.array([3, 7, 9, 12, 15]) # Count of events
poisson_model.fit(X, Y)
pred = poisson_model.predict([[2200, 4]])
print(f"Predicted count (Poisson): {pred[0]:.2f}")
Use Case:
Predicting website visits per hour.
8. Elastic Net Regression
Definition:
Combination of Ridge and Lasso regression, particularly useful when there are correlated predictors.
Use Case:
Used when dealing with highly correlated features.
Python Implementation:
from sklearn.linear_model import ElasticNet
elastic_model = ElasticNet(alpha=0.1, l1_ratio=0.5)
# Sample data
X = np.array([[1000, 2], [1500, 3], [2000, 4], [2500, 5], [3000, 6]]) # Size, Rooms
Y = np.array([150000, 200000, 250000, 300000, 350000])
elastic_model.fit(X, Y)
pred = elastic_model.predict([[2200, 4]])
print(f"Predicted price (Elastic Net): ${pred[0]:.2f}")
9. Locally Weighted Regression (LOWESS)
Definition:
Locally Weighted Regression (LOWESS) is a non-parametric regression technique that fits multiple local models rather than a single global model. It gives more weight to nearby points, making it ideal for capturing complex, non-linear relationships.
Use Case:
- Used in time series smoothing and non-linear trend detection (e.g., stock market trends, economic forecasting).
Python Implementation:
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
# Sample data
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Y = np.array([2.2, 2.8, 3.6, 4.5, 5.1, 6.5, 7.4, 7.8, 9.2, 10.1])
# Fit LOWESS model
lowess = sm.nonparametric.lowess(Y, X, frac=0.3) # frac determines smoothing level
# Plot results
plt.scatter(X, Y, label="Original Data", color="red")
plt.plot(lowess[:, 0], lowess[:, 1], label="LOWESS Fit", color="blue")
plt.legend()
plt.show()
10. Softmax Regression
Relation to Logistic Regression:
Softmax Regression is a generalization of Logistic Regression. While logistic regression is used for binary classification (two classes), Softmax Regression extends it to multi-class classification problems by using the softmax function to assign probabilities to multiple categories.
Definition:
Softmax Regression is a generalization of logistic regression for multi-class classification problems. It uses the softmax function to assign probabilities to multiple categories.
Use Case:
- Predicting the category of an image (e.g., classifying handwritten digits in the MNIST dataset).
Formula:
Python Implementation:
import numpy as np
from sklearn.linear_model import LogisticRegression
# Sample data
X = np.array([[1, 2], [2, 1], [2, 3], [3, 2], [3, 3], [4, 4]]) # Features
Y = np.array([0, 0, 1, 1, 2, 2]) # Multi-class labels
# Model training
softmax_model = LogisticRegression(multi_class='multinomial', solver='lbfgs')
softmax_model.fit(X, Y)
# Prediction
pred = softmax_model.predict([[3, 3]])
print(f"Predicted class: {pred[0]}")
Conclusion
Choosing the right regression model depends on the data type, number of features, and the relationship between variables. Here’s a quick summary:
Scenario | Best Regression Type |
---|---|
Predicting continuous values (e.g., house prices) | Linear Regression |
Data has multiple predictors | Multiple Linear Regression |
Non-linear trends | Polynomial Regression, LOWESS |
High-dimensional data | Ridge, Lasso, Elastic Net |
Binary classification | Logistic Regression |
Multi-class classification | Softmax Regression |
Count-based data | Poisson Regression |
Would you like me to add more real-world applications or explanations? Let me know in the comments! 🚀