library(tidyverse)
library(tidymodels)
library(ISLR2)Chapter 6 Part 3
tidymodels and shrinkage
Setup
Lasso and Ridge - Statquest
https://www.youtube.com/watch?v=Q81RR3yKn30&t=7s
https://www.youtube.com/watch?v=NGf0voTMlcs
On your own:
Ridge Vs Lasso https://www.youtube.com/watch?v=Xm2C_gTAl8c
Elastic Net https://www.youtube.com/watch?v=1dKRdX9bfIo
Ridge Regression
Pros
- Can be used when \(p > n\)
- Can be used to help with multicollinearity
- Will decrease variance (as \(\lambda \rightarrow \infty\) )
Cons
- Will have increased bias (compared to least squares)
- Does not really help with variable selection (all variables are included in some regard, even if their \(\beta\) coefficients are really small)
Lasso
Pros
- Can be used when \(p > n\)
- Can be used to help with multicollinearity
- Will decrease variance (as \(\lambda \rightarrow \infty\) )
- Can be used for variable selection, since it will make some \(\beta\) coefficients exactly 0
Cons
- Will have increased bias (compared to least squares)
- If \(p>n\) the lasso can select at most \(n\) variables
What if we want to do both?
Elastic net!
\(RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\)
. . .
What is the \(\ell_1\) part of the penalty?
. . .
What is the \(\ell_2\) part of the penalty
Elastic net
\[RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\]
When will this be equivalent to Ridge Regression?
Elastic net
\[RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\]
When will this be equivalent to Lasso?
Elastic Net
\[RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|\]
- The \(\ell_1\) part of the penalty will generate a sparse model (shrink some \(\beta\) coefficients to exactly 0)
- The \(\ell_2\) part of the penalty removes the limitation on the number of variables selected (can be \(>n\) now)
tidymodels
lm_spec <-
linear_reg() |> # Pick linear regression
set_engine(engine = "lm") # set engine
lm_specLinear Regression Model Specification (regression)
Computational engine: lm
lm_fit <- fit(lm_spec,
mpg ~ horsepower,
data = Auto)Validation set approach
Auto_split <- initial_split(Auto, prop = 0.5)
Auto_split<Training/Testing/Total>
<196/196/392>
. . .
Extract the training and testing data
training(Auto_split)
testing(Auto_split)A faster way!
- You can use
last_fit()and specify the split - This will automatically train the data on the
traindata from the split - Instead of specifying which metric to calculate (with
rmseas before) you can just usecollect_metrics()and it will automatically calculate the metrics on thetestdata from the split
A faster way!
set.seed(100000)
Auto_split <- initial_split(Auto, prop = 0.5)
lm_fit <- last_fit(lm_spec,
mpg ~ horsepower,
split = Auto_split)
lm_fit |>
collect_metrics() # A tibble: 2 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 rmse standard 4.77 Preprocessor1_Model1
2 rsq standard 0.634 Preprocessor1_Model1
What about cross validation?
Auto_cv <- vfold_cv(Auto, v = 5)
Auto_cv# 5-fold cross-validation
# A tibble: 5 × 2
splits id
<list> <chr>
1 <split [313/79]> Fold1
2 <split [313/79]> Fold2
3 <split [314/78]> Fold3
4 <split [314/78]> Fold4
5 <split [314/78]> Fold5
What if we wanted to do some preprocessing
- For the shrinkage methods we discussed it was important to scale the variables
. . .
- What would happen if we scale before doing cross-validation? Will we get different answers?
What if we wanted to do some preprocessing
Auto_scaled <- Auto |>
mutate(horsepower = scale(horsepower))
sd(Auto_scaled$horsepower)[1] 1
Auto_cv_scaled <- vfold_cv(Auto_scaled, v = 5)
# Will not actually use:
map_dbl(Auto_cv_scaled$splits,
function(x) {
dat <- as.data.frame(x)$horsepower
sd(dat)
})[1] 0.9698961 1.0060047 0.9967524 1.0339029 0.9922419
What if we wanted to do some preprocessing
recipe()!- Using the
recipe()function along withstep_*()functions, we can specify preprocessing steps and R will automagically apply them to each fold appropriately.
. . .
rec <- recipe(mpg ~ horsepower, data = Auto) |>
step_scale(horsepower) - You can find all of the potential preprocessing steps here: https://tidymodels.github.io/recipes/reference/index.html
Where do we plug in this recipe?
- The
recipegets plugged into thefit_resamples()function
. . .
Auto_cv <- vfold_cv(Auto, v = 5)
rec <- recipe(mpg ~ horsepower, data = Auto) |>
step_scale(horsepower)
results <- fit_resamples(lm_spec,
preprocessor = rec,
resamples = Auto_cv)
results |>
collect_metrics()# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 rmse standard 4.92 5 0.0744 Preprocessor1_Model1
2 rsq standard 0.611 5 0.0158 Preprocessor1_Model1
What if we want to predict mpg with more variables
- Now we still want to add a step to scale predictors
- We could either write out all predictors individually to scale them
- OR we could use the
all_predictors()short hand.
. . .
rec <- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
step_scale(all_predictors())Putting it together
rec <- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
step_scale(all_predictors())
results <- fit_resamples(lm_spec,
preprocessor = rec,
resamples = Auto_cv)
results |>
collect_metrics()# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 rmse standard 4.23 5 0.157 Preprocessor1_Model1
2 rsq standard 0.704 5 0.0253 Preprocessor1_Model1
Application Exercise
- Examine the
Hittersdataset by running?Hittersin the Console - We want to predict a major league player’s
Salaryfrom all of the other 19 variables in this dataset. Create a visualization ofSalary. - Create a recipe to estimate this model.
- Add a preprocessing step to your recipe, scaling each of the predictors
What if we have categorical variables?
- We can turn the categorical variables into indicator (“dummy”) variables in the recipe
. . .
rec <- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
step_dummy(all_nominal()) |>
step_scale(all_predictors())What if we have missing data?
- We can remove any rows with missing data
. . .
rec <- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
step_dummy(all_nominal()) |>
step_naomit(everything()) |>
step_scale(all_predictors())What if we have missing data?
rec <- recipe(mpg ~ horsepower + displacement + weight, data = Auto) |>
step_dummy(all_nominal()) |>
step_naomit(all_outcomes()) |>
step_impute_mean(all_predictors()) |>
step_scale(all_predictors()) Application Exercise
- Add a preprocessing step to your recipe to convert nominal variables into indicators
- Add a step to your recipe to remove missing values for the outcome
- Add a step to your recipe to impute missing values for the predictors using the average for the remaining values NOTE THIS IS NOT THE BEST WAY TO DO THIS!
Ridge, Lasso, and Elastic net
When specifying your model, you can indicate whether you would like to use ridge, lasso, or elastic net. We can write a general equation to minimize:
\[RSS + \lambda\left((1-\alpha)\sum_{i=1}^p\beta_j^2+\alpha\sum_{i=1}^p|\beta_j|\right)\]
lm_spec <- linear_reg() |>
set_engine("glmnet") - First specify the engine. We’ll use
glmnet - The
linear_reg()function has two additional parameters,penaltyandmixture penaltyis \(\lambda\) from our equation.mixtureis a number between 0 and 1 representing \(\alpha\)
Ridge, Lasso, and Elastic net
\[RSS + \lambda\left((1-\alpha)\sum_{i=1}^p\beta_j^2+\alpha\sum_{i=1}^p|\beta_j|\right)\]
What would we set mixture to in order to perform Ridge regression?
. . .
ridge_spec <- linear_reg(penalty = 100, mixture = 0) |>
set_engine("glmnet") Application Exercise
- Set a seed
set.seed(1) - Create a cross validation object for the
Hittersdataset - Using the recipe from the previous exercise, fit the model using Ridge regression with a penalty \(\lambda\) = 300
- What is the estimate of the test RMSE for this model?
Ridge, Lasso, and Elastic net
\[RSS + \lambda\left((1-\alpha)\sum_{i=1}^p\beta_j^2+\alpha\sum_{i=1}^p|\beta_j|\right)\]
ridge_spec <- linear_reg(penalty = 100, mixture = 0) |>
set_engine("glmnet") . . .
lasso_spec <- linear_reg(penalty = 5, mixture = 1) |>
set_engine("glmnet") . . .
enet_spec <- linear_reg(penalty = 60, mixture = 0.7) |>
set_engine("glmnet") Okay, but we wanted to look at 3 different models!
ridge_spec <- linear_reg(penalty = 100, mixture = 0) |>
set_engine("glmnet")
results <- fit_resamples(ridge_spec,
preprocessor = rec,
resamples = Auto_cv). . .
lasso_spec <- linear_reg(penalty = 5, mixture = 1) |>
set_engine("glmnet")
results <- fit_resamples(lasso_spec,
preprocessor = rec,
resamples = Auto_cv). . .
elastic_spec <- linear_reg(penalty = 40, mixture = 0.1) |>
set_engine("glmnet")
results <- fit_resamples(elastic_spec,
preprocessor = rec,
resamples = Auto_cv)- 😱 this looks like copy + pasting!
tune 🎶
penalty_spec <-
linear_reg(penalty = tune(), mixture = tune()) |>
set_engine("glmnet") - Notice the code above has
tune()for the the penalty and the mixture. Those are the things we want to vary!
tune 🎶
- Now we need to create a grid of potential penalties ( \(\lambda\) ) and mixtures ( \(\alpha\) ) that we want to test
- Instead of
fit_resamples()we are going to usetune_grid()
. . .
grid <- expand_grid(penalty = seq(0, 100, by = 10),
mixture = seq(0, 1, by = 0.25))
results <- tune_grid(penalty_spec,
preprocessor = rec,
grid = grid,
resamples = Auto_cv)tune autoplot
autoplot(results)+ ## ggplot function
theme_classic()Warning in ggplot2::scale_x_continuous(trans = trans): log-10 transformation introduced infinite values.
log-10 transformation introduced infinite values.
tune 🎶
results |>
collect_metrics()# A tibble: 110 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 0 rmse standard 4.25 5 0.139 Preprocessor1_Model01
2 0 0 rsq standard 0.702 5 0.0231 Preprocessor1_Model01
3 10 0 rmse standard 4.75 5 0.119 Preprocessor1_Model02
4 10 0 rsq standard 0.697 5 0.0211 Preprocessor1_Model02
5 20 0 rmse standard 5.30 5 0.0981 Preprocessor1_Model03
6 20 0 rsq standard 0.697 5 0.0210 Preprocessor1_Model03
7 30 0 rmse standard 5.71 5 0.0850 Preprocessor1_Model04
8 30 0 rsq standard 0.697 5 0.0209 Preprocessor1_Model04
9 40 0 rmse standard 6.02 5 0.0775 Preprocessor1_Model05
10 40 0 rsq standard 0.697 5 0.0209 Preprocessor1_Model05
# ℹ 100 more rows
Subset results
results |>
collect_metrics() |>
filter(.metric == "rmse") |>
arrange(mean)# A tibble: 55 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 1 rmse standard 4.23 5 0.157 Preprocessor1_Model45
2 0 0.75 rmse standard 4.23 5 0.156 Preprocessor1_Model34
3 0 0.25 rmse standard 4.23 5 0.155 Preprocessor1_Model12
4 0 0.5 rmse standard 4.23 5 0.156 Preprocessor1_Model23
5 0 0 rmse standard 4.25 5 0.139 Preprocessor1_Model01
6 10 0 rmse standard 4.75 5 0.119 Preprocessor1_Model02
7 20 0 rmse standard 5.30 5 0.0981 Preprocessor1_Model03
8 10 0.25 rmse standard 5.60 5 0.0861 Preprocessor1_Model13
9 30 0 rmse standard 5.71 5 0.0850 Preprocessor1_Model04
10 40 0 rmse standard 6.02 5 0.0775 Preprocessor1_Model05
# ℹ 45 more rows
- Since this is a data frame, we can do things like filter and arrange!
Subset results
results |>
collect_metrics() |>
filter(.metric == "rmse") |>
arrange(mean)# A tibble: 55 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 1 rmse standard 4.23 5 0.157 Preprocessor1_Model45
2 0 0.75 rmse standard 4.23 5 0.156 Preprocessor1_Model34
3 0 0.25 rmse standard 4.23 5 0.155 Preprocessor1_Model12
4 0 0.5 rmse standard 4.23 5 0.156 Preprocessor1_Model23
5 0 0 rmse standard 4.25 5 0.139 Preprocessor1_Model01
6 10 0 rmse standard 4.75 5 0.119 Preprocessor1_Model02
7 20 0 rmse standard 5.30 5 0.0981 Preprocessor1_Model03
8 10 0.25 rmse standard 5.60 5 0.0861 Preprocessor1_Model13
9 30 0 rmse standard 5.71 5 0.0850 Preprocessor1_Model04
10 40 0 rmse standard 6.02 5 0.0775 Preprocessor1_Model05
# ℹ 45 more rows
Which would you choose?
results |>
collect_metrics() |>
filter(.metric == "rmse") |>
ggplot(aes(penalty, mean, color = factor(mixture), group = factor(mixture))) +
geom_line() +
geom_point() +
labs(y = "RMSE") Application Exercise
- Using the
Hitterscross validation object and recipe created in the previous exercise, usetune_gridto pick the optimal penalty and mixture values. - Update the code below to create a grid that includes penalties from 0 to 50 by 1 and mixtures from 0 to 1 by 0.5.
- Use this grid in the
tune_gridfunction. Then usecollect_metricsand filter to only include the RSME estimates. - Create a figure to examine the estimated test RMSE for the grid of penalty and mixture values – which should you choose?
grid <- expand_grid(penalty = seq(0, ----),
mixture = seq(0, 1, by = ----))Putting it all together
- Often we can use a combination of all of these tools together
- First split our data
- Do cross validation on just the training data to tune the parameters
- Use
last_fit()with the selected parameters, specifying the split data so that it is evaluated on the left out test sample
Putting it all together
auto_split <- initial_split(Auto, prop = 0.5)
auto_train <- training(auto_split)
auto_cv <- vfold_cv(auto_train, v = 5)
rec <- recipe(mpg ~ horsepower + displacement + weight, data = auto_train) |>
step_scale(all_predictors())
tuning <- tune_grid(penalty_spec,
rec,
grid = grid,
resamples = auto_cv)
tuning |>
collect_metrics() |>
filter(.metric == "rmse") |>
arrange(mean)# A tibble: 55 × 8
penalty mixture .metric .estimator mean n std_err .config
<dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0 1 rmse standard 3.49 5 0.214 Preprocessor1_Model45
2 0 0.75 rmse standard 3.50 5 0.207 Preprocessor1_Model34
3 0 0.5 rmse standard 3.51 5 0.206 Preprocessor1_Model23
4 0 0.25 rmse standard 3.51 5 0.206 Preprocessor1_Model12
5 0 0 rmse standard 3.57 5 0.220 Preprocessor1_Model01
6 10 0 rmse standard 4.22 5 0.229 Preprocessor1_Model02
7 20 0 rmse standard 4.80 5 0.227 Preprocessor1_Model03
8 10 0.25 rmse standard 5.13 5 0.228 Preprocessor1_Model13
9 30 0 rmse standard 5.21 5 0.227 Preprocessor1_Model04
10 40 0 rmse standard 5.50 5 0.227 Preprocessor1_Model05
# ℹ 45 more rows
Putting it all together
final_spec <- linear_reg(penalty = 0, mixture = 0) |>
set_engine("glmnet")
fit <- last_fit(final_spec,
rec,
split = auto_split)
fit |>
collect_metrics()# A tibble: 2 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 rmse standard 4.94 Preprocessor1_Model1
2 rsq standard 0.681 Preprocessor1_Model1
Extracting coefficients
- We can use
workflow()to combine the recipe and the model specification to pass to afitobject.
. . .
training_data <- training(auto_split)
workflow() |>
add_recipe(rec) |>
add_model(final_spec) |>
fit(data = training_data) |>
tidy()# A tibble: 4 × 3
term estimate penalty
<chr> <dbl> <dbl>
1 (Intercept) 41.4 0
2 horsepower -0.814 0
3 displacement -1.42 0
4 weight -3.83 0
Application Exercise
- Using the final model specification, extract the coefficients from the model by creating a
workflow - Filter out any coefficients exactly equal to 0