STA 362 Spring 2024
Take-Home Midterm Exam
Due: Saturday, April 27th, at 12pm
Rules
Your solutions must be written up in a Quarto (qmd) file and then rendered into an html file called
exam-01.html
. This file must include your code, output, and write-up for each question. When showing results that are large tables, please use thehead
function if it is over 50 rows.This exam is open book, open internet, closed to other people. You may use any online or book-based resource you would like, but you must include citations for any code that you use (directly or indirectly). You may not consult with anyone else about this exam other than the Professor. You cannot ask direct questions on the internet, or consult with each other, not even for hypothetical questions.
You will be required to upload the HTML file from your output. Technical difficulties are not an excuse for late work - do not wait until the last minute. Verify your html file includes all graphs and tables before uploading to Moodle. Use the embedded resources option in the YAML.
Your analysis’, outputs, and, and narratives, should be answering the questions, not your code.
Submission
When you are finished with your exam, be sure to Render the final document. Once rendered, you can download your file by:
- Finding the .html file in your File pane (on the bottom right of the screen)
- Click the check box next to the file
- Click the blue gear above and then click “Export” to download
- Submit your final html document to the exam spot on Moodle
Data
The data is found in the classes shared folder on the RStudio server and is called “groundhog.csv.”
The data consists of a subset of variables described at: https://github.com/rfordatascience/tidytuesday/blob/master/data/2024/2024-01-30/readme.md
One new variable was added, called was_spring
which indicates if spring did follow Groundhog Day (February 2nd).
IMPORTANT: Make sure to use the data file mentioned above.
Overall Goal: Predict if it will be spring, or not
Read the data into your environment. Split the data. Briefly discuss your approach.
Perform EDA on all variables. Discuss each variable’s potential value in predicting if it will be spring or not using your evidence.
Using any, or all, of the variables in the data, fit the following models to predict if it will be spring or not.
For each model make sure to:
- Check conditions
- Only use variables that the model is designed for
- Provide ROC AUC, ROC curve plot, and accuracy on a test set.
Logistic Regression Model
LDA
QDA
Naive Bayes
K Nearest Neighbors
Choose the best model and justify the choice.
Is the groundhogs prediction relevant in predicting if it will be spring or not? Justify your answer using your analysis above.