Ml Multinomial Regression
## Polynomial Regression
Imagine you are observing the trajectory of an object falling from the sky. Its speed is not uniform, but getting faster and faster. If you try to fit this trajectory with a straight line, the result will be poor because a straight line cannot describe this curved change. In this case, we need a line that can bendβpolynomial regression is a powerful tool for solving such problems.
Simply put, **polynomial regression** is an extension of linear regression. It maps data to a higher-dimensional space by adding higher-order terms (such as squared terms, cubic terms) to the original features, thereby using a "curve" to fit the nonlinear relationships present in the data.
* * *
## 1. Core Concepts: From Straight Line to Curve
### 1.1 Review of Linear Regression
The model formula for linear regression is very simple: `y = wβ * x + b` where:
* `y` is the target value we want to predict.
* `x` is the input feature.
* `wβ` is the weight of the feature (slope).
* `b` is the bias term (intercept).
This model determines that it can only draw a **straight line**.
### 1.2 Introducing Polynomial Regression
The core idea of polynomial regression is: **treat higher powers of features as new features**, and then apply linear regression on this expanded feature set.
For example, a quadratic polynomial regression model: `y = wβ * x + wβ * xΒ² + b`
As you can see, although `xΒ²` appears in the equation, if we treat `x` and `xΒ²` as two independent features `X1` and `X2`, then the model becomes: `y = wβ * X1 + wβ * X2 + b` This is essentially still a **linear model**, except that it is linear with respect to **features** `X1` and `X2`. This is why polynomial regression is called an "extension of linear regression".
### 1.3 Key Terms
* **Degree/Order**: The highest exponent in the polynomial. Degree 2 is a quadratic curve (parabola), degree 3 is a cubic curve, and so on.
* **Overfitting**: If the chosen degree is too high, the model becomes very "wiggly", perfectly passing through all training data points, but its predictive ability on new data drops sharply. It's like using a complex net to catch a few pointsβthe mesh is too fine, and it can't catch the big fish.
* **Underfitting**: If the degree is too low (for example, using a straight line to fit obviously curved data), the model cannot capture the basic patterns in the data, and its predictive ability is also poor.
The flowchart below shows the typical thought process for applying polynomial regression:
!(#)
* * *
## 2. Practical Implementation: Polynomial Regression with Python
We will use the powerful machine learning library `scikit-learn`, which makes implementing polynomial regression very simple.
### 2.1 Preparing Environment and Data
First, ensure that the necessary libraries are installed, and create a set of simulated nonlinear data.
## Example
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Set random seed to ensure consistent results across runs
np.random.seed(42)
# Create simulated data: y is a quadratic function of x plus some random noise
X =6 * np.random.rand(100,1) - 3# Generate 100 random numbers in the [-3, 3) interval
y =0.5 * X**2 + X + 2 + np.random.randn(100,1)# y = 0.5xΒ² + x + 2 + noise
# Visualize original data
plt.scatter(X, y, s=10, alpha=0.7, label='Original Data')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Simulated Nonlinear Data')
plt.legend()
plt.show()
Running this code, you will see that the data points roughly follow a "U" shape (parabola) distribution, which is clearly unsuitable for fitting with a straight line.
### 2.2 Core Steps: Feature Transformation and Model Training
The key step is to use `PolynomialFeatures` to generate higher-order feature terms.
## Example
# 1. Create polynomial features
# Parameter degree determines the polynomial degree, here we try degree 2
poly_features = PolynomialFeatures(degree=2, include_bias=False)
# Transform original feature X into a new feature matrix X_poly containing X and X^2
X_poly = poly_features.fit_transform(X)
print(f"Original X shape: {X.shape}")
print(f"Transformed X_poly shape: {X_poly.shape}")
print(f"First 5 rows of X_poly data:n{X_poly[:5]}")
# Output shows that X_poly has two columns: first column is X, second column is X^2
# 2. Train linear regression model on transformed features
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)# Use X_poly, not original X
# 3. View learned model parameters (weights and bias)
print(f"n Model parameters (weights w1, w2): {lin_reg.coef_.ravel()}")
print(f"Model bias (intercept b): {lin_reg.intercept_}")
# Output should be close to the parameters used when generating data: [1, 0.5] and 2
### 2.3 Visualizing Fitting Results
Let's see what the trained "curve" model looks like.
## Example
# To draw a smooth curve, generate a set of uniformly distributed points
X_new = np.linspace(-3,3,100).reshape(100,1)
# Apply the same polynomial feature transformation to these new points
X_new_poly = poly_features.transform(X_new)
# Make predictions with the model
y_new = lin_reg.predict(X_new_poly)
# Start plotting
plt.scatter(X, y, s=10, alpha=0.7, label='Training Data')
plt.plot(X_new, y_new,'r-', linewidth=2, label='Polynomial Regression Fit (degree=2)')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Quadratic Polynomial Regression Fitting Effect')
plt.legend()
plt.show()
You should see a beautiful red curve that nicely captures the parabolic trend of the data.
* * *
## 3. Important Topic: How to Choose the Correct Degree?
Choosing the degree is a trade-off process. We can intuitively understand this by visualizing the fitting effects of different degrees.
## Example
# Try different degrees: 1 (linear), 2, 15 (too high)
degrees =[1,2,15]
plt.figure(figsize=(15,4))
for i, degree in enumerate(degrees):
# Create subplot
ax = plt.subplot(1,len(degrees), i + 1)
# Generate polynomial features and train model
poly_features = PolynomialFeatures(degree=degree, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
# Predict and plot
y_new = lin_reg.predict(poly_features.transform(X_new))
ax.scatter(X, y, s=10, alpha=0.7)
ax.plot(X_new, y_new,'r-', linewidth=2)
ax.set_title(f'Degree = {degree}')
ax.set_xlabel('X')
ax.set_ylabel('y')
# Calculate and display RΒ² score (closer to 1 is better)
y_pred = lin_reg.predict(X_poly)
r2 = r2_score(y, y_pred)
ax.text(0.05,0.95, f'$R^2$ = {r2:.3f}', transform=ax.transAxes,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
plt.tight_layout()
plt.show()
**Observations and Explanations:**
* **Degree=1 (Linear)**: A straight line, `RΒ²` score is low, clearly **underfitting**, unable to express the curvature of the data.
* **Degree=2 (Quadratic)**: A smooth curve, `RΒ²` score is high, fitting effect is good.
* **Degree=15 (Fifteenth)**: The curve oscillates violently, passing through many data points, but predictions between data points are bizarre. Its `RΒ²` on training data may be close to 1, but predictions on new data will be very poorβthis is typical **overfitting**.
### 3.1 More Scientific Method: Cross-Validation
In practice, we evaluate the performance of models with different degrees on unseen data through **cross-validation**, and select the model that performs best on the validation set. `scikit-learn`'s `cross_val_score` can conveniently implement this.
## Example
from sklearn.model_selection import cross_val_score
# Test a range of degrees
degrees_to_try =range(1,11)
cv_scores =[]
for degree in degrees_to_try:
poly_features = PolynomialFeatures(degree=degree, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
# Use 5-fold cross-validation, with negative mean squared error as scoring (sklearn convention: higher score is better, so use negative MSE)
scores = cross_val_score(lin_reg, X_poly, y, cv=5, scoring='neg_mean_squared_error')
cv_scores.append(-scores.mean())# Take average and convert back to positive MSE
# Find the degree that minimizes cross-validation error
best_degree = degrees_to_try[np.argmin(cv_scores)]
print(f"According to cross-validation, the best degree is: {best_degree}")
# Visualize cross-validation error vs degree
plt.plot(degrees_to_try, cv_scores,'bo-')
plt.xlabel('Polynomial Degree')
plt.ylabel('5-Fold Cross-Validation Average MSE')
plt.title('Cross-Validation for Selecting Best Degree')
plt.axvline(x=best_degree, color='r', linestyle='--', label=f'Best Degree={best_degree}')
plt.legend()
plt.grid(True)
plt.show()
* * *
## 4. Practical Exercises
Now, it's time to consolidate what you've learned through hands-on practice.
**Exercise 1: Diagnose and Fix** Run the code below. It attempts to fit data using polynomial regression, but the effect is poor. Please analyze where the problem might be, and modify the code to fit correctly.
## Example
# Problematic code
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
X = np.array([1,2,3,4,5]).reshape(-1,1)
y = np.array([2,4,9,16,25])# Roughly y = x^2
# Try to fit with degree 1 polynomial (linear)
poly = PolynomialFeatures(degree=1)
X_poly = poly.fit_transform(X)
model = LinearRegression().fit(X_poly, y)
X_plot = np.linspace(1,5,100).reshape(-1,1)
X_plot_poly = poly.transform(X_plot)
y_plot = model.predict(X_plot_poly)
plt.scatter(X, y, label='Data')
plt.plot(X_plot, y_plot,'r-', label='Fit')
plt.legend()
plt.show()
**Exercise 2: Explore Real Dataset** Use the `Boston Housing dataset` or `Diabetes dataset` that comes with `scikit-learn`. Select a feature that shows a nonlinear relationship with the target value, and apply polynomial regression.
1. Plot the original data scatter plot.
2. Try different degrees (2, 3, 4), and visualize the fitting curves.
3. Use cross-validation to find the polynomial degree with the best prediction effect for this feature.
**Exercise 3: Challenge - Multivariate Polynomial Regression** The examples above only had one feature `x`. Polynomial regression also works with multiple features. For example, when there are two features `x1` and `x2`, `degree=2` polynomial features will include: `x1`, `x2`, `x1Β²`, `x1*x2`, `x2Β²`. Try creating a simulated dataset containing two features `(x1, x2)` (for example, `y = x1 + x2Β² + noise`), and use `PolynomialFeatures(degree=2)` for fitting. Observe the shape of the generated feature matrix, and understand its meaning.
YouTip