---
: "Document title"
title: "Your Name"
author:
format:
html-resources: true
embed---
Homework 5
Trees
Due 5/6/2024 at 9am
Instructions
Creating a new project. and giving it a sensible name such as homework5 and having that project in the course folder you created.
Create a new quarto document and give it a sensible name such as hw5.
In the
YAML
add the following (add what you don’t have). The embed-resources component will make your final renderedhtml
self-contained.
- Though the book used R code for base R, I want you to complete the exercises using functions from tidymodels when possible.
- Set a seed before each problem.
- Make sure your answers print results. If the output is a large table, use the head() function to shorten the output.
Exercises
- This problem involves the
OJ
data set which is part of theISLR2
package.
Create a training set containing 80% of the observations.
Fit a tree to the training data, with
Purchase
as the response and the other variables as predictors. Use thesummary()
function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?Type in the name of the tree object in order to get a detailed text output.
Create a plot of the tree, and interpret the results. Pick one of the terminal nodes, and interpret the information displayed.
Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?
Use cross validation on the training set in order to determine the optimal tree size.
Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.
Which tree size corresponds to the lowest cross-validated classification error rate?
Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.
Compare the training error rates between the pruned and unpruned trees. Which is higher?
Compare the test error rates between the pruned and unpruned trees. Which is higher?
Fit a bagged tree model using the training data (you do not need to use cross validation). You will need to change the
mtry
argument to the correct number. What is the test error rate?Fit a random forest model using the training data (you do not need to use cross validation). You will need to change the
mtry
argument to the correct number. What is the test error rate?Fit a tree model using cross validation on the training set to tune the
mtry
argument. With your bestmtry
value, refit the model on the whole training set. What is the test error rate?Finally, create a table that includes the test error for all of the models included above. Give each a name that is clear and include any tuned values. Which model is best and why?
Submission
When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:
- Finding the .html file in your File pane (on the bottom right of the screen)
- Click the check box next to the file
- Click the blue gear above and then click “Export” to download
- Submit your final html document to the respective assignment on Moodle