Chapter 6 Part 4
Dimension Reduction
Lasso does dimension reduction when fitting
The methods that we have discussed so far in this chapter have involved fitting linear regression models, via least squares or a shrunken approach, using the original predictors, \(X_1\),\(X_2\),…,\(X_p\).
Next we cover a few approaches that transform the predictors and then fit a least squares model using the transformed variables. We will refer to these techniques as dimension reduction methods.
The Transformations
Let \(Z_1,...,Z_m\) represent \(M<p\) linear combinations of our original \(p\) predictors. Or, \[Z_m = \sum_{j=1}^p\phi_{mj}X_j\] for some constants \(\phi_{m1},...,\phi_{mp}\).
We can then fit linear regression model, \[y_i = \theta_0 + \sum{m=1}^M\theta_mz_{im}+\epsilon_i\text{, }i=1,...,n,\] using ordinary least squares.
The Coefficients
Note that in our transformed model, the regression coefficients are given by \(\theta_0...\theta_M\).
If the constants \(\phi_{m1},...,\phi_{mp}\) are chosen well, then this dimension reduction can often outperform OLS regression.
Also note, that \(\beta_j = \sum_{m=1}^M\theta_m\phi_{mf}\), and thus this new transformed model is a special case of OLS
This effectively constrains the \(\beta_j\)’s.
Principal Components Regression
Here we apply principal components analysis (PCA) (discussed in Chapter 10 of the text) to define the linear combinations of the predictors, for use in our regression.
The first principal component is that (normalized) linear combination of the variables with the largest variance.
The second principal component has largest variance, subject to being uncorrelated with the first. And so on.
Hence with many correlated original variables, we replace them with a small set of principal components that capture their joint variation.
PCR - How does it work?
Partial Least Squares
PCR identifies linear combinations, or directions, that best represent the predictors \(X_1,...X_p\)
These directions are identified in an unsupervised way, since the response Y is not used to help determine the principal component directions.
That is, the response does not supervise the identification of the principal components.
Consequently, PCR suffers from a potentially serious drawback: there is no guarantee that the directions that best explain the predictors will also be the best directions to use for predicting the response.
More PLS
Like PCR, PLS is a dimension reduction method, which first identifies a new set of features \(Z_1,...,Z_m\) that are linear combinations of the original features, and then its a linear model via OLS using these \(M\) new features.
But unlike PCR, PLS identifies these new features in a supervised way, that is, it makes use of the response Y in order to identify new features that not only approximate the old features well, but also that are related to the response.
Roughly speaking, the PLS approach attempts to find directions that help explain both the response and the predictors.
Even More PLS
PCR in tidymodels
- Use step_normalize and then step_pca
- Fit a linear regression model
<- linear_reg() |>
lm_spec set_engine("lm")
<- recipe(Sepal.Width ~ .,data = iris)|>
iris_rec_pcr step_dummy(all_nominal_predictors())|>
step_normalize(all_predictors()) |>
step_pca(all_numeric_predictors()) #New
<- workflow() |>
pcr_wf add_model(lm_spec)|>
PCR in tidymodels
<- pcr_wf |>
iris_pcr_fit fit(data = iris)
<- iris_pcr_fit |> tidy()
# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.06 0.0219 140. 1.08e-155
2 PC1 -0.0682 0.0119 -5.73 5.61e- 8
3 PC2 -0.157 0.0191 -8.23 1.01e- 13
4 PC3 0.436 0.0456 9.56 4.58e- 17
5 PC4 -0.910 0.122 -7.46 7.40e- 12
6 PC5 0.356 0.204 1.75 8.29e- 2
PCR in tidymodels
For now we are going to let it choose our number of components
Once we cover PCA formally, we can do more.
PLS in tidymodels
<- recipe(Sepal.Width ~ .,data = iris)|>
iris_rec_pls step_dummy(all_nominal_predictors())|>
step_normalize(all_predictors()) |>
step_pls(all_numeric_predictors(),outcome = "Sepal.Width") #New
<- workflow() |>
pls_wf add_model(lm_spec)|>
<- pls_wf |>
iris_pls_fit fit(data = iris)
<- iris_pls_fit |> tidy()
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.06 0.0283 108. 7.30e-142
2 PLS1 0.146 0.0190 7.72 1.68e- 12
3 PLS2 0.132 0.0247 5.34 3.49e- 7
PLS in tidymodels
plot_top_loadings(iris_pls_fit,type = 'pls')