Dimension Reduction

  • Lasso does dimension reduction when fitting

  • The methods that we have discussed so far in this chapter have involved fitting linear regression models, via least squares or a shrunken approach, using the original predictors, \(X_1\),\(X_2\),…,\(X_p\).

  • Next we cover a few approaches that transform the predictors and then fit a least squares model using the transformed variables. We will refer to these techniques as dimension reduction methods.

The Transformations

  • Let \(Z_1,...,Z_m\) represent \(M<p\) linear combinations of our original \(p\) predictors. Or, \[Z_m = \sum_{j=1}^p\phi_{mj}X_j\] for some constants \(\phi_{m1},...,\phi_{mp}\).

  • We can then fit linear regression model, \[y_i = \theta_0 + \sum{m=1}^M\theta_mz_{im}+\epsilon_i\text{, }i=1,...,n,\] using ordinary least squares.

The Coefficients

  • Note that in our transformed model, the regression coefficients are given by \(\theta_0...\theta_M\).

  • If the constants \(\phi_{m1},...,\phi_{mp}\) are chosen well, then this dimension reduction can often outperform OLS regression.

  • Also note, that \(\beta_j = \sum_{m=1}^M\theta_m\phi_{mf}\), and thus this new transformed model is a special case of OLS

  • This effectively constrains the \(\beta_j\)’s.

Principal Components Regression

  • Here we apply principal components analysis (PCA) (discussed in Chapter 10 of the text) to define the linear combinations of the predictors, for use in our regression.

  • The first principal component is that (normalized) linear combination of the variables with the largest variance.

  • The second principal component has largest variance, subject to being uncorrelated with the first. And so on.

  • Hence with many correlated original variables, we replace them with a small set of principal components that capture their joint variation.

PCR - How does it work?


Partial Least Squares

  • PCR identifies linear combinations, or directions, that best represent the predictors \(X_1,...X_p\)

  • These directions are identified in an unsupervised way, since the response Y is not used to help determine the principal component directions.

  • That is, the response does not supervise the identification of the principal components.

  • Consequently, PCR suffers from a potentially serious drawback: there is no guarantee that the directions that best explain the predictors will also be the best directions to use for predicting the response.

More PLS

  • Like PCR, PLS is a dimension reduction method, which first identifies a new set of features \(Z_1,...,Z_m\) that are linear combinations of the original features, and then its a linear model via OLS using these \(M\) new features.

  • But unlike PCR, PLS identifies these new features in a supervised way, that is, it makes use of the response Y in order to identify new features that not only approximate the old features well, but also that are related to the response.

  • Roughly speaking, the PLS approach attempts to find directions that help explain both the response and the predictors.

Even More PLS


PCR in tidymodels

  • Use step_normalize and then step_pca
  • Fit a linear regression model

lm_spec <- linear_reg() |>

iris_rec_pcr <- recipe(Sepal.Width ~ .,data = iris)|>
    step_normalize(all_predictors()) |>
    step_pca(all_numeric_predictors())          #New

pcr_wf <- workflow() |>

PCR in tidymodels

iris_pcr_fit <- pcr_wf |> 
  fit(data = iris)

tidy_pcr_fit<- iris_pcr_fit |> tidy()

# A tibble: 6 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   3.06      0.0219    140.   1.08e-155
2 PC1          -0.0682    0.0119     -5.73 5.61e-  8
3 PC2          -0.157     0.0191     -8.23 1.01e- 13
4 PC3           0.436     0.0456      9.56 4.58e- 17
5 PC4          -0.910     0.122      -7.46 7.40e- 12
6 PC5           0.356     0.204       1.75 8.29e-  2

  • For now we are going to let it choose our number of components

  • Once we cover PCA formally, we can do more.

PLS in tidymodels

iris_rec_pls <- recipe(Sepal.Width ~ .,data = iris)|>
    step_normalize(all_predictors()) |>
    step_pls(all_numeric_predictors(),outcome = "Sepal.Width")          #New

pls_wf <- workflow() |>

iris_pls_fit <- pls_wf |> 
  fit(data = iris)

tidy_pls_fit<- iris_pls_fit |> tidy()

# A tibble: 3 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)    3.06     0.0283    108.   7.30e-142
2 PLS1           0.146    0.0190      7.72 1.68e- 12
3 PLS2           0.132    0.0247      5.34 3.49e-  7

PLS in tidymodels

plot_top_loadings(iris_pls_fit,type = 'pls')