Quick Links


Statistical inference

Making statistical inference about the population using sample data. Two main types of inference:

  • estimation
  • testing hypothesis

Hypothesis

A statement about a population, claiming that a parameter takes a particular numerical value or falls in a certain range of values.


General Procedure (5 Steps)

  1. Look at assumptions
  2. State the hypothesis
  3. Find the test statistic, and its null distribution
  4. Find the -value and interpret it
  5. Make a conclusion

1. Assumptions

If assumptions do not hold, test does not have the properties needed.

Generally, the most important assumption is randomisation but there might be other assumptions, such as sample size and distribution shape.

2. Stating Hypotheses

Null hypothesis

A statement that the parameter takes a particular value, denoted

Alternative hypothesis H_1

A statement that the parameter falls in some alternative range of values, denoted

The alternative hypothesis determines the side of the test:

  • : parameter not equal
    • two-sided test
  • : parameter larger than value under
    • right-sided test
  • : parameter smaller than value under
    • left-sided test

3. Test Statistic and Null Distribution

The value of test statistic requires

  • the value of the point estimate, and its sampling distribution
  • the parameter value specified under

The distribution of a test statistic under is null distribution.

4. p-value

The test begins with the assumption that is true.

If the test statistic calculated above in step 3 is far out in the tail of the null distribution, it is too far from what predicts. The value can be calculated of the test statistic, to which a threshold can be specified if can be rejected.

5. Conclude

If significance level is specified, decision on the validity of the can be made:

  • if value , reject
  • otherwise, do NOT reject

Errors

Type I

When is rejected but it is true.

Probability of error is denoted .

Type II

When is not rejected but it is false.

Probability of error is denoted .

Power

The power of the test is defined to be , which is the probability of correctly rejecting , when it is false.

The two errors cannot be reduced simultaneously.

  • If is smaller, is rejected less often.
  • When is retained more often, the probability to retain it when it is not correct increases.
  • Probability of Type II error increases (while probability of type I error decreases).

Relation to Confidence Intervals

There is a consistency between a confidence interval and a significance test

  • when the confidence level of the confidence interval , the significance level , and
  • the test is two-sided
  • both CI and test have the same standard error

When these hold, if the point estimate found in the significance test is within the point estimate, we do not reject , and vice versa.

Example

If the confidence interval is :

  • when the value calculated in the point estimate is , we retain
  • when the value calculated in the point estimate is , we reject .

Hypothesis Testing for Population

1. Assumptions

  1. Variable is categorical
  2. Data is obtained using randomisation
  3. Sample size is sufficiently large that the sampling distribution of the sample proportion is approximately normal when the null is true. Checked using - value specified in .

2. Hypothesis

3. Test Statistic

With the statistics :u d

Note that

4. Calculate p-value

two-tailedleft-tailright-tailrejectionrejectionrejectionrejection

Compute value using R - pnorm(Z)

5. Interpret

Reject or retain given the value.

  • If it is small (lesser than ), reject
  • Otherwise, do not reject

Hypothesis Testing for Mean

1. Assumptions

  1. Variable is quantitative
  2. Data is obtained using randomisation
  3. Population is approximately normal.
    • Crucial when small.

2. Hypothesis

3. Test Statistic

With the statistics :

Note that if is true: follows distribution with degrees of freedom.

4. Calculate p-value

two-tailedleft-tailright-tailrejectionrejectionrejectionrejection

Compute value.

5. Interpret

Reject or retain given the value.

  • If it is small (lesser than ), reject
  • Otherwise, do not reject

Hypothesis Testing for Two Independent Samples with Equal Variance

1. Assumptions

  1. Variable is quantitative
  2. Samples are independent
  3. Population distribution of each group is approximately normal
  4. Variances are the same

The variance test can be checked using the equal variance test.

2. Hypothesis

3. Test Statistic

With the statistics :

In this formula:

  • refers to the pooled estimate of the common variance
  • is the standard error.

Note that if is true: follows distribution with degrees of freedom.

4. Calculate p-value

two-tailedleft-tailright-tailrejectionrejectionrejectionrejection

Compute value.

5. Interpret

Reject or retain given the value.

  • If it is small (lesser than ), reject
  • Otherwise, do not reject

Hypothesis Testing for Two Independent Samples with Unequal Variance

1. Assumptions

  1. Variable is quantitative
  2. Samples are independent
  3. Population distribution of each group is approximately normal

The variance test can be checked using the equal variance test.

2. Hypothesis

3. Test Statistic

With the statistics :

In this formula:

  • is the standard error.

Note that if is true: follows distribution with a degrees of freedom, where is complex (and may not be an integer).

4. Calculate p-value

two-tailedleft-tailright-tailrejectionrejectionrejectionrejection

Compute value.

5. Interpret

Reject or retain given the value.

  • If it is small (lesser than ), reject
  • Otherwise, do not reject

Hypothesis Testing for Two Dependent Samples

Also known as the paired t-test

1. Assumptions

  1. Variable is quantitative
  2. Samples are dependent
    • every observation has a matched value in other sample
  3. Population distribution of each group is approximately normal

2. Hypothesis

Let the be the mean of the differences of the matched subjects.

With the given hypothesis, the Hypothesis Testing for Mean can be done here.