Lab 02

Logistic Regression

Author

Dr. Lucy D’Agostino McGowan modfied by Dr. Tyler George

Getting started

Go to our RStudio and create a new R project inside your class folder.

YAML:

Create a .qmd file for your lab, make sure the author is your name, and Render the document.

Packages

In this lab we will work with four packages: ISLR which is a package that accompanies your textbook, tidyverse which is a collection of packages for doing data analysis in a “tidy” way, tidymodels a collection of packages for statistical modeling, and GGally, a package to help us visualize the data.

library(tidyverse) 
library(tidymodels)
library(ISLR)
library(GGally)

Data

For this lab, we are using the Smarket data from the ISLR package.

Exercises

Conceptual questions

  1. Suppose that an individual has a 23% chance of defaulting on their credit card payment. What are the odds that they will default?

Logistic Regression

  1. For this lab we are using the Smarket data. Examine this data set - how many observations are there? How many columns? What are the variables?

  2. Let’s look at the correlation between all of the variables. Add the code below to your .qmd file. What can you learn from this visualization? Which pair of variables have the highest correlation?

ggpairs(Smarket, 
        lower = list(combo = wrap(ggally_facethist, binwidth = 0.5)), 
        progress = FALSE)
  1. Inference Fit a logistic regression model to predict Direction using Lag1, Lag2, Lag3, Lag4, Lag5, and Volume. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 6 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.

  2. Inference Exponentiate the results from Exercise 4. Interpret the odds ratio for the same predictor you selected in Exercise 4.

  3. Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 4. What is the test Accuracy for this model? Interpret this result.

  4. Inference Fit a logistic regression model to predict Direction using only Lag1 and Lag2. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 2 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.

  5. Inference Exponentiate the results from Exercise 7. Interpret the odds ratio for the same predictor you selected in Exercise 7.

  6. Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 7. What is the test Accuracy for this model? Interpret this result.

  7. If you had to choose between the model fit in Exercise 4 and the one fit in Exercise 7, which would you choose? Why?