library(tidyverse)
library(tidymodels)
library(ISLR)
library(GGally)
Lab 02
Logistic Regression
Getting started
Go to our RStudio and create a new R project inside your class folder.
YAML:
Create a .qmd
file for your lab, make sure the author is your name, and Render the document.
Packages
In this lab we will work with four packages: ISLR
which is a package that accompanies your textbook, tidyverse
which is a collection of packages for doing data analysis in a “tidy” way, tidymodels
a collection of packages for statistical modeling, and GGally
, a package to help us visualize the data.
Data
For this lab, we are using the Smarket
data from the ISLR
package.
Exercises
Conceptual questions
- Suppose that an individual has a 23% chance of defaulting on their credit card payment. What are the odds that they will default?
Logistic Regression
For this lab we are using the
Smarket
data. Examine this data set - how many observations are there? How many columns? What are the variables?Let’s look at the correlation between all of the variables. Add the code below to your .qmd file. What can you learn from this visualization? Which pair of variables have the highest correlation?
ggpairs(Smarket,
lower = list(combo = wrap(ggally_facethist, binwidth = 0.5)),
progress = FALSE)
Inference Fit a logistic regression model to predict
Direction
usingLag1
,Lag2
,Lag3
,Lag4
,Lag5
, andVolume
. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 6 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.Inference Exponentiate the results from Exercise 4. Interpret the odds ratio for the same predictor you selected in Exercise 4.
Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 4. What is the test Accuracy for this model? Interpret this result.
Inference Fit a logistic regression model to predict
Direction
using onlyLag1
andLag2
. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 2 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.Inference Exponentiate the results from Exercise 7. Interpret the odds ratio for the same predictor you selected in Exercise 7.
Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 7. What is the test Accuracy for this model? Interpret this result.
If you had to choose between the model fit in Exercise 4 and the one fit in Exercise 7, which would you choose? Why?