Types of Estimation

Point estimation

A single number is calculated to estimate the parameter.

Point estimator

Rule or formula describing calculation

Point estimate

Resulting number

Interval estimation

Two numbers are calculated to form an interval within which parameter is expected to lie.

Point Estimation

Estimator

An estimator is a rule, usually expressed as a formula, on how to calcualte an estimate, based on information in the sample.

Unbiased estimator

Let be an estimator of . Then, is a random variable based on the sample.

If , is an unbiased estimator of .

unbiased estimatortrue value of parameterbiased estimator

By definition, an unbiased estimator has a mean value that is equal to the value of a parameter.

Maximum Error of Estimate

Difference between estimator and true value of the parameter

If population is normal, or if is large, follows a standard normal or an approximately standard normal distribution.

z_{\alpha}

is the number with an upper-tail probability of for standard normal distribution .

We can then derive the maximum error of estimate.

Maximum error of estimate

Determination of Sample Size

Motivation

Given a maximum error , we want to know what the minimum sample size should be.

StatisticStandard DeviationErrorDesired Sample SizeUNKNOWNNormalanysmallPOPULATIONStatisticStandard DeviationSAMPLE SIZEErrorDesired Sample SizeKNOWNCASE ICASE IIIStatisticStandard DeviationErrorDesired Sample SizeUNKNOWNanylargelargeStatisticStandard DeviationErrorDesired Sample SizeKNOWNCASE IICASE IVCase INormal distributionKnown Standard DeviationCase IIAny distributionKnownStandard DeviationCase IIINormal distributionUnknown Standard DeviationCase IVAny distributionUnknown Standard DeviationConfidence IntervalConfidence IntervalConfidence IntervalConfidence Intervalmean

Confidence Intervals

Confidence interval

A rule for calculating from the sample, an interval in which you are fairly certain the parameter of interest lies in.

Degree of confidence/confidence level

Quantifies the certainty mentioned above

is called the confidence interval.

Mean

Case I: known, data normal

The following is a confidence interval.

Case II: known, data any

Similar to case I:

The following is a confidence interval.

Case III: unknown, data normal

The following is a confidence interval.

Case IV: unknown, data any

The following is a confidence interval.

Interpreting Confidence Intervals

When has probability of containing ,

  • everytime we take samples and construct the interval estimator, a different confidence interval is computed.
  • some confidence intervals contains , and some don’t.

Since is not known,

  • there is no way to determine if a confidence interval contains or not.
  • if the procedure is repeated many times, about of the many confidence intervals gotten will contain the true parameter.
    • ~ if we repeat the procedure to get 0.95 confidence intervals, 0.95 of the confidence intervals computed will contain the true parameter.

Experimental Design

To compare two populations, a number of observations from each population need to be collected.

Experimental design

Manner in which samples from populations are collected.

Basic designs

  • independent samples (complete randomisation)
  • matched pairs samples (randomisation between matched pairs)

Independent SamplesDependent SamplesGeneral assumptions:Unequal varianceKnown varianceeither both normalORboth samples largeUnknown varianceboth samples largeEqual varianceSmall sample sizeLarge sample sizeSmall sample sizeLarge sample sizeLarge:normal ## Independent Samples: Known and Unequal

Assumptions

  1. A random sample of size from population 1 with mean and variance
  2. A random sample of size from population 2 with mean and variance
  3. Both samples are independent
  4. Population variances are known and not the same
  5. Either one of the following conditions holds:
  • Both populations are normal
  • Both samples are large:

Consider and as random samples from the two populations of interest.

Then,

Thus,

and using the independence assumption,

When

  • populations are normal OR
  • both samples are large (using CLT)

Thus, our point of interest is the following difference :

with confidence for any .

Getting the confidence interval

If are known, we get:

Confidence interval for difference, with known and unequal variances

Thus, the confidence interval for is:

Independent Samples: Large, with Unknown Variances

Assumptions

  1. A random sample of size from population 1 with mean and variance
  2. A random sample of size from population 2 with mean and variance
  3. Both samples are independent
  4. Population variances are unknown and not the same
  5. Both samples are large:

As are unknown, let:

Now, using the sample variances :

Confidence interval for difference, with large sample size and unequal variances

Thus, the confidence interval for is:

Independent Samples: Small, with Equal Variances

Equal Variance Assumption

In real applications, equal variance assumption is usually unknown and needs to be checked.

Assumptions

  1. A random sample of size from population 1 with mean and variance
  2. A random sample of size from population 2 with mean and variance
  3. Both samples are independent
  4. Population variances are unknown and the same
  5. Both samples are small:
  6. Both populations are normally distributed

Based on equal variance assumption, as well as normally distributed population:

Since both variances are equal, we can estimate , using both or . They are both unbiased estimators.

Pooled estimator S^{2}_{p}

Using the pooled estimator, the statistic:

We then get the probability:

Confidence interval for difference, with small sample size and equal variances

Thus, the confidence interval for is:

Equal Variance Assumption

In real applications, equal variance assumption is usually unknown and needs to be checked.

Assumptions

  1. A random sample of size from population 1 with mean and variance
  2. A random sample of size from population 2 with mean and variance
  3. Both samples are independent
  4. Population variances are unknown and the same
  5. Both samples are small:
  6. Both populations are normally distributed

Based on equal variance assumption, as well as normally distributed population:

Since both variances are equal, we can estimate , using both or . They are both unbiased estimators.

Pooled estimator S^{2}_{p}

Using the pooled estimator, the statistic is similar, but due to CLT, follows a Normal distribution instead.

We then get the probability:

Confidence interval for difference, with large sample size and equal variances

Thus, the confidence interval for is:

Dependent Samples: Paired Data

Assumptions

  1. are matched pairs, with being a random sample from population 1, and being a random sample from population 2.
  2. are dependent
  3. and are independent for any
  4. For matched pairs, define
  1. is now a random sample from a single population, with mean , variance

We can then consider the technique used for a single population:

Then, we get the statistic:

Then, using the CLT:

or

Confidence interval for difference of paired samples, with small sample size

Thus, the confidence interval for is:

Confidence interval for difference of paired samples, with large sample size

Thus, the confidence interval for is: