library(tidyverse)
library(tidymodels)
library(ISLR2)
Chapter 6 Part 3
tidymodels and shrinkage
Setup
Lasso and Ridge - Statquest
https://www.youtube.com/watch?v=Q81RR3yKn30&t=7s
https://www.youtube.com/watch?v=NGf0voTMlcs
On your own:
Ridge Vs Lasso https://www.youtube.com/watch?v=Xm2C_gTAl8c
Elastic Net https://www.youtube.com/watch?v=1dKRdX9bfIo
Ridge Regression
Pros
- Can be used when \(p > n\)
- Can be used to help with multicollinearity
- Will decrease variance (as \(\lambda \rightarrow \infty\) )
Cons
- Will have increased bias (compared to least squares)
- Does not really help with variable selection (all variables are included in some regard, even if their \(\beta\) coefficients are really small)
Lasso
Pros
- Can be used when \(p > n\)
- Can be used to help with multicollinearity
- Will decrease variance (as \(\lambda \rightarrow \infty\) )
- Can be used for variable selection, since it will make some \(\beta\) coefficients exactly 0
Cons
- Will have increased bias (compared to least squares)
- If \(p>n\) the lasso can select at most \(n\) variables
What if we want to do both?
Elastic net!
\(RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\)
. . .
What is the \(\ell_1\) part of the penalty?
. . .
What is the \(\ell_2\) part of the penalty
Elastic net
\[RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\]
When will this be equivalent to Ridge Regression?
Elastic net
\[RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\]
When will this be equivalent to Lasso?
Elastic Net
\[RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\]
- The \(\ell_1\) part of the penalty will generate a sparse model (shrink some \(\beta\) coefficients to exactly 0)
- The \(\ell_2\) part of the penalty removes the limitation on the number of variables selected (can be \(>n\) now)
tidymodels
<-
lm_spec linear_reg() |> # Pick linear regression
set_engine(engine = "lm") # set engine
lm_spec
Linear Regression Model Specification (regression)
Computational engine: lm
<- fit(lm_spec,
lm_fit ~ horsepower,
mpg data = Auto)
Validation set approach
<- initial_split(Auto, prop = 0.5)
Auto_split Auto_split
<Training/Testing/Total>
<196/196/392>
. . .
Extract the training and testing data
training(Auto_split)
testing(Auto_split)
A faster way!
- You can use
last_fit()
and specify the split - This will automatically train the data on the
train
data from the split - Instead of specifying which metric to calculate (with
rmse
as before) you can just usecollect_metrics()
and it will automatically calculate the metrics on thetest
data from the split
A faster way!
set.seed(100000)
<- initial_split(Auto, prop = 0.5)
Auto_split <- last_fit(lm_spec,
lm_fit ~ horsepower,
mpg split = Auto_split)
|>
lm_fit collect_metrics()
# A tibble: 2 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 rmse standard 4.77 Preprocessor1_Model1
2 rsq standard 0.634 Preprocessor1_Model1
What about cross validation?
<- vfold_cv(Auto, v = 5)
Auto_cv Auto_cv
# 5-fold cross-validation
# A tibble: 5 × 2
splits id
<list> <chr>
1 <split [313/79]> Fold1
2 <split [313/79]> Fold2
3 <split [314/78]> Fold3
4 <split [314/78]> Fold4
5 <split [314/78]> Fold5
What if we wanted to do some preprocessing
- For the shrinkage methods we discussed it was important to scale the variables
. . .
- What would happen if we scale before doing cross-validation? Will we get different answers?
What if we wanted to do some preprocessing
<- Auto |>
Auto_scaled mutate(horsepower = scale(horsepower))
sd(Auto_scaled$horsepower)
[1] 1
<- vfold_cv(Auto_scaled, v = 5)
Auto_cv_scaled
# Will not actually use:
map_dbl(Auto_cv_scaled$splits,
function(x) {
<- as.data.frame(x)$horsepower
dat sd(dat)
})
[1] 0.9698961 1.0060047 0.9967524 1.0339029 0.9922419
What if we wanted to do some preprocessing
recipe()
!- Using the
recipe()
function along withstep_*()
functions, we can specify preprocessing steps and R will automagically apply them to each fold appropriately.
. . .
<- recipe(mpg ~ horsepower, data = Auto) |>
rec step_scale(horsepower)
- You can find all of the potential preprocessing steps here: https://tidymodels.github.io/recipes/reference/index.html
Where do we plug in this recipe?
- The
recipe
gets plugged into thefit_resamples()
function
. . .
<- vfold_cv(Auto, v = 5)
Auto_cv
<- recipe(mpg ~ horsepower, data = Auto) |>
rec step_scale(horsepower)
<- fit_resamples(lm_spec,
results preprocessor = rec,
resamples = Auto_cv)
|>
results collect_metrics()
# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 rmse standard 4.92 5 0.0744 Preprocessor1_Model1
2 rsq standard 0.611 5 0.0158 Preprocessor1_Model1
What if we want to predict mpg with more variables
- Now we still want to add a step to scale predictors
- We could either write out all predictors individually to scale them
- OR we could use the
all_predictors()
short hand.
. . .
<- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
rec step_scale(all_predictors())
Putting it together
<- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
rec step_scale(all_predictors())
<- fit_resamples(lm_spec,
results preprocessor = rec,
resamples = Auto_cv)
|>
results collect_metrics()
# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 rmse standard 4.23 5 0.157 Preprocessor1_Model1
2 rsq standard 0.704 5 0.0253 Preprocessor1_Model1
Application Exercise
- Examine the
Hitters
dataset by running?Hitters
in the Console - We want to predict a major league player’s
Salary
from all of the other 19 variables in this dataset. Create a visualization ofSalary
. - Create a recipe to estimate this model.
- Add a preprocessing step to your recipe, scaling each of the predictors
What if we have categorical variables?
- We can turn the categorical variables into indicator (“dummy”) variables in the recipe
. . .
<- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
rec step_dummy(all_nominal()) |>
step_scale(all_predictors())
What if we have missing data?
- We can remove any rows with missing data
. . .
<- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
rec step_dummy(all_nominal()) |>
step_naomit(everything()) |>
step_scale(all_predictors())
What if we have missing data?
<- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
rec step_dummy(all_nominal()) |>
step_naomit(all_outcomes()) |>
step_impute_mean(all_predictors()) |>
step_scale(all_predictors())
Application Exercise
- Add a preprocessing step to your recipe to convert nominal variables into indicators
- Add a step to your recipe to remove missing values for the outcome
- Add a step to your recipe to impute missing values for the predictors using the average for the remaining values NOTE THIS IS NOT THE BEST WAY TO DO THIS!
Ridge, Lasso, and Elastic net
When specifying your model, you can indicate whether you would like to use ridge, lasso, or elastic net. We can write a general equation to minimize:
\[RSS + \lambda\left((1-\alpha)\sum_{i=1}^p\beta_j^2+\alpha\sum_{i=1}^p|\beta_j|\right)\]
<- linear_reg() |>
lm_spec set_engine("glmnet")
- First specify the engine. We’ll use
glmnet
- The
linear_reg()
function has two additional parameters,penalty
andmixture
penalty
is \(\lambda\) from our equation.mixture
is a number between 0 and 1 representing \(\alpha\)
Ridge, Lasso, and Elastic net
\[RSS + \lambda\left((1-\alpha)\sum_{i=1}^p\beta_j^2+\alpha\sum_{i=1}^p|\beta_j|\right)\]
What would we set mixture
to in order to perform Ridge regression?
. . .
<- linear_reg(penalty = 100, mixture = 0) |>
ridge_spec set_engine("glmnet")
Application Exercise
- Set a seed
set.seed(1)
- Create a cross validation object for the
Hitters
dataset - Using the recipe from the previous exercise, fit the model using Ridge regression with a penalty \(\lambda\) = 300
- What is the estimate of the test RMSE for this model?
Ridge, Lasso, and Elastic net
\[RSS + \lambda\left((1-\alpha)\sum_{i=1}^p\beta_j^2+\alpha\sum_{i=1}^p|\beta_j|\right)\]
<- linear_reg(penalty = 100, mixture = 0) |>
ridge_spec set_engine("glmnet")
. . .
<- linear_reg(penalty = 5, mixture = 1) |>
lasso_spec set_engine("glmnet")
. . .
<- linear_reg(penalty = 60, mixture = 0.7) |>
enet_spec set_engine("glmnet")
Okay, but we wanted to look at 3 different models!
<- linear_reg(penalty = 100, mixture = 0) |>
ridge_spec set_engine("glmnet")
<- fit_resamples(ridge_spec,
results preprocessor = rec,
resamples = Auto_cv)
. . .
<- linear_reg(penalty = 5, mixture = 1) |>
lasso_spec set_engine("glmnet")
<- fit_resamples(lasso_spec,
results preprocessor = rec,
resamples = Auto_cv)
. . .
<- linear_reg(penalty = 40, mixture = 0.1) |>
elastic_spec set_engine("glmnet")
<- fit_resamples(elastic_spec,
results preprocessor = rec,
resamples = Auto_cv)
- 😱 this looks like copy + pasting!
tune 🎶
<-
penalty_spec linear_reg(penalty = tune(), mixture = tune()) |>
set_engine("glmnet")
- Notice the code above has
tune()
for the the penalty and the mixture. Those are the things we want to vary!
tune 🎶
- Now we need to create a grid of potential penalties ( \(\lambda\) ) and mixtures ( \(\alpha\) ) that we want to test
- Instead of
fit_resamples()
we are going to usetune_grid()
. . .
<- expand_grid(penalty = seq(0, 100, by = 10),
grid mixture = seq(0, 1, by = 0.25))
<- tune_grid(penalty_spec,
results preprocessor = rec,
grid = grid,
resamples = Auto_cv)
tune autoplot
autoplot(results)+ ## ggplot function
theme_classic()
Warning in ggplot2::scale_x_continuous(trans = trans): log-10 transformation introduced infinite values.
log-10 transformation introduced infinite values.
tune 🎶
|>
results collect_metrics()
# A tibble: 110 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 0 rmse standard 4.25 5 0.139 Preprocessor1_Model01
2 0 0 rsq standard 0.702 5 0.0231 Preprocessor1_Model01
3 10 0 rmse standard 4.75 5 0.119 Preprocessor1_Model02
4 10 0 rsq standard 0.697 5 0.0211 Preprocessor1_Model02
5 20 0 rmse standard 5.30 5 0.0981 Preprocessor1_Model03
6 20 0 rsq standard 0.697 5 0.0210 Preprocessor1_Model03
7 30 0 rmse standard 5.71 5 0.0850 Preprocessor1_Model04
8 30 0 rsq standard 0.697 5 0.0209 Preprocessor1_Model04
9 40 0 rmse standard 6.02 5 0.0775 Preprocessor1_Model05
10 40 0 rsq standard 0.697 5 0.0209 Preprocessor1_Model05
# ℹ 100 more rows
Subset results
|>
results collect_metrics() |>
filter(.metric == "rmse") |>
arrange(mean)
# A tibble: 55 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 1 rmse standard 4.23 5 0.157 Preprocessor1_Model45
2 0 0.75 rmse standard 4.23 5 0.156 Preprocessor1_Model34
3 0 0.25 rmse standard 4.23 5 0.155 Preprocessor1_Model12
4 0 0.5 rmse standard 4.23 5 0.156 Preprocessor1_Model23
5 0 0 rmse standard 4.25 5 0.139 Preprocessor1_Model01
6 10 0 rmse standard 4.75 5 0.119 Preprocessor1_Model02
7 20 0 rmse standard 5.30 5 0.0981 Preprocessor1_Model03
8 10 0.25 rmse standard 5.60 5 0.0861 Preprocessor1_Model13
9 30 0 rmse standard 5.71 5 0.0850 Preprocessor1_Model04
10 40 0 rmse standard 6.02 5 0.0775 Preprocessor1_Model05
# ℹ 45 more rows
- Since this is a data frame, we can do things like filter and arrange!
Subset results
|>
results collect_metrics() |>
filter(.metric == "rmse") |>
arrange(mean)
# A tibble: 55 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 1 rmse standard 4.23 5 0.157 Preprocessor1_Model45
2 0 0.75 rmse standard 4.23 5 0.156 Preprocessor1_Model34
3 0 0.25 rmse standard 4.23 5 0.155 Preprocessor1_Model12
4 0 0.5 rmse standard 4.23 5 0.156 Preprocessor1_Model23
5 0 0 rmse standard 4.25 5 0.139 Preprocessor1_Model01
6 10 0 rmse standard 4.75 5 0.119 Preprocessor1_Model02
7 20 0 rmse standard 5.30 5 0.0981 Preprocessor1_Model03
8 10 0.25 rmse standard 5.60 5 0.0861 Preprocessor1_Model13
9 30 0 rmse standard 5.71 5 0.0850 Preprocessor1_Model04
10 40 0 rmse standard 6.02 5 0.0775 Preprocessor1_Model05
# ℹ 45 more rows
Which would you choose?
|>
results collect_metrics() |>
filter(.metric == "rmse") |>
ggplot(aes(penalty, mean, color = factor(mixture), group = factor(mixture))) +
geom_line() +
geom_point() +
labs(y = "RMSE")
Application Exercise
- Using the
Hitters
cross validation object and recipe created in the previous exercise, usetune_grid
to pick the optimal penalty and mixture values. - Update the code below to create a grid that includes penalties from 0 to 50 by 1 and mixtures from 0 to 1 by 0.5.
- Use this grid in the
tune_grid
function. Then usecollect_metrics
and filter to only include the RSME estimates. - Create a figure to examine the estimated test RMSE for the grid of penalty and mixture values – which should you choose?
<- expand_grid(penalty = seq(0, ----),
grid mixture = seq(0, 1, by = ----))
Putting it all together
- Often we can use a combination of all of these tools together
- First split our data
- Do cross validation on just the training data to tune the parameters
- Use
last_fit()
with the selected parameters, specifying the split data so that it is evaluated on the left out test sample
Putting it all together
<- initial_split(Auto, prop = 0.5)
auto_split <- training(auto_split)
auto_train <- vfold_cv(auto_train, v = 5)
auto_cv
<- recipe(mpg ~ horsepower + displacement + weight, data = auto_train) |>
rec step_scale(all_predictors())
<- tune_grid(penalty_spec,
tuning
rec,grid = grid,
resamples = auto_cv)
|>
tuning collect_metrics() |>
filter(.metric == "rmse") |>
arrange(mean)
# A tibble: 55 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 1 rmse standard 3.49 5 0.214 Preprocessor1_Model45
2 0 0.75 rmse standard 3.50 5 0.207 Preprocessor1_Model34
3 0 0.5 rmse standard 3.51 5 0.206 Preprocessor1_Model23
4 0 0.25 rmse standard 3.51 5 0.206 Preprocessor1_Model12
5 0 0 rmse standard 3.57 5 0.220 Preprocessor1_Model01
6 10 0 rmse standard 4.22 5 0.229 Preprocessor1_Model02
7 20 0 rmse standard 4.80 5 0.227 Preprocessor1_Model03
8 10 0.25 rmse standard 5.13 5 0.228 Preprocessor1_Model13
9 30 0 rmse standard 5.21 5 0.227 Preprocessor1_Model04
10 40 0 rmse standard 5.50 5 0.227 Preprocessor1_Model05
# ℹ 45 more rows
Putting it all together
<- linear_reg(penalty = 0, mixture = 0) |>
final_spec set_engine("glmnet")
<- last_fit(final_spec,
fit
rec,split = auto_split)
|>
fit collect_metrics()
# A tibble: 2 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 rmse standard 4.94 Preprocessor1_Model1
2 rsq standard 0.681 Preprocessor1_Model1
Extracting coefficients
- We can use
workflow()
to combine the recipe and the model specification to pass to afit
object.
. . .
<- training(auto_split)
training_data
workflow() |>
add_recipe(rec) |>
add_model(final_spec) |>
fit(data = training_data) |>
tidy()
# A tibble: 4 × 3
term estimate penalty
<chr> <dbl> <dbl>
1 (Intercept) 41.4 0
2 horsepower -0.814 0
3 displacement -1.42 0
4 weight -3.83 0
Application Exercise
- Using the final model specification, extract the coefficients from the model by creating a
workflow
- Filter out any coefficients exactly equal to 0