Exercise 4: Linear Models
Exercise 1
This question should be answered using the Carseats
dataset from the ISLR
package.
You load these data as follows
(a) Fit a multiple regression model to predict Sales
using Price
,
Urban
, and US
.
(b) Provide an interpretation of each coefficient in the model. Be careful – some of the variables in the model are qualitative!
(c) Write out the model in equation form, being careful to handle the qualitative variables properly.
(d) For which of the predictors can you reject the null hypothesis $H_0 : \beta_j =0$?
(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.
(f) How well do the models in (a) and (e) fit the data?
(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
Exercise 2.
In this problem, you will develop a model to predict whether a given car gets high or low gas mileage based on the Auto
dataset from the ISLR
package.
You load the data as follows
(a) Create a binary variable, mpg01
, that contains a 1 if mpg
contains a value above its median, and a 0 if mpg
contains a value below its median. You can compute the median using the median()
function. Note you may find it helpful to use the data.frame()
function to create a single data set containing both mpg01
and the other Auto
variables.
(b) Explore the data graphically in order to investigate the association between mpg01
and the other features. Which of the other features seem most likely to be useful in predicting mpg01
?
(c) Split the data into a training set and a test set.
(d) Perform logistic regression on the training data in order to predict mpg01
using the variables that seemed most associated with mpg01
in (b). What is the test error of the model obtained?