Summary

test for significance of regressor, overall model

fitted regression equation

check assumptions using residual plots

identify outliers and influential points

interpret coefficients, R-squared

compare fit of models for same response using adjusted R-squared

Regression

A regression of the response variable on the regressor is a mathematical relationship between the mean of and different values of .

Linear regression

The regression is linear, and of the form

Linear refers to the linearity of the parameters:

is still linear
is not linear

Simple refers to the use of only one regressor.

If there are multiple regressors, it is called a multiple linear regression.

Simple Linear Regression

Assumptions

For :

Data obtained by randomisation
Linearity of the relationship
Error term is normal
Constant variance

Note: assumptions can only be checked after model is fitted.

Estimation

Ordinary Least Squares

considers all possible “best-fit lines”
compute sum of squared residuals
line minimising this quantity is the line of best-fit

Understanding R Output

Interpreting this:

However, confidence intervals for can be determined, which would be preferable in some instances:

confint(M1, level)

Hypothesis Testing

test: tests significance of one regressor
test: tests significance of whole model

Note: In a Simple Linear Regression model, the test is equivalent to the test.

t-test

Assumption remains the same
State null hypothesis (and alternative)
- or regressor not significant
Find test-statistic
Derive
Conclude if slope is significantly different from at a prespecified

f-test

Assumption remains the same
State null hypothesis (and alternative)
- or all regressors not significant
Find test-statistic
Derive
Conclude if model is significant

If the test does not reject , it suggests the model does not have significant regressors, and suggests a new model without regressors.

This is called the intercept model.

Diagnostics

Randomisation - determined during data collection
Linearity - checking scatterplot between response and regressor and residual plot.
Normality - checked using residuals of built model
Constant variance - checked using residuals of built model

Scatterplot

If linearity violated

add higher order terms If not constant variance
transform response () (will change coefficient)

Residuals

Using the standard residuals :

Plotting the residuals:

Plot against or :
- Expected: points scatter randomly about , with interval .
- If there is a funnel shape, constant variance is violated.
Plotting against
- Expected: linearity
- If not linear, linearity is violated.
QQ-plot of
- Expected: normal
- If not, linearity is violated.

Outliers, Influential Points

Outlier

A point with standard residuals or

Influential points

Affects parameter estimates greatly. (An outlier may or may not be influential).

Measured using Cook’s distance (which measures the effect of deleting a given observation) (using as the threshold).

Statistic

R-squared statistic

The proportion of total variation of the response (about the sample mean ) explained by the model.

In a simple model,

Multiple Linear Regression

MLR	SLR
Regression function is linear	Same
Check assumptions using residuals	Same
t-tests for individual coefficients	Same
F-test for overall regression	Same
Test significance of a categorical variable which has more than 2 categories	Different
Using adjusted R-squared to compare models	Use non-adjusted R-squared to compare models.

Indicator Variables, Interaction Terms

Indicator variable

Takes on value 1 if category observed, and 0 otherwise.

Interaction terms

If there is an interaction between two variables, they are considered interaction terms.

Note that there is a new coefficient here, used to signify the coefficient of the interaction term. When dropping insignificant terms - if the interaction term is highly significant, to keep the interaction term, all the main terms of the interaction must be kept.

Explorer

Linear Regression

Simple Linear Regression

Assumptions

Estimation

Understanding R Output

Hypothesis Testing

t-test

f-test

Diagnostics

Scatterplot

Residuals

Outliers, Influential Points

Statistic

Multiple Linear Regression

Indicator Variables, Interaction Terms

Graph View

Table of Contents

Backlinks