Types of Estimation
Point estimation
A single number is calculated to estimate the parameter.
Point estimator
Rule or formula describing calculation
Point estimate
Resulting number
Interval estimation
Two numbers are calculated to form an interval within which parameter is expected to lie.
Point Estimation
Estimator
An estimator is a rule, usually expressed as a formula, on how to calcualte an estimate, based on information in the sample.
Unbiased estimator
Let
be an estimator of . Then, is a random variable based on the sample. If
, is an unbiased estimator of .
By definition, an unbiased estimator has a mean value that is equal to the value of a parameter.
Maximum Error of Estimate
Difference between estimator and true value of the parameter
If population is normal, or if
is large, follows a standard normal or an approximately standard normal distribution.
z_{\alpha}
is the number with an upper-tail probability of for standard normal distribution .
We can then derive the maximum error of estimate.
Maximum error of estimate
Determination of Sample Size
Motivation
Given a maximum error
, we want to know what the minimum sample size should be.
Confidence Intervals
Confidence interval
A rule for calculating from the sample, an interval
in which you are fairly certain the parameter of interest lies in. Degree of confidence/confidence level
Quantifies the certainty mentioned above
is called the confidence interval.
Mean
Case I: known, data normal
The following is a
Case II: known, data any
Similar to case I:
The following is a
Case III: unknown, data normal
The following is a
Case IV: unknown, data any
The following is a
Interpreting Confidence Intervals
When
- everytime we take samples and construct the interval estimator, a different confidence interval is computed.
- some confidence intervals contains
, and some don’t.
Since
- there is no way to determine if a confidence interval contains
or not. - if the procedure is repeated many times, about
of the many confidence intervals gotten will contain the true parameter. - ~ if we repeat the procedure to get 0.95 confidence intervals, 0.95 of the confidence intervals computed will contain the true parameter.
Experimental Design
To compare two populations, a number of observations from each population need to be collected.
Experimental design
Manner in which samples from populations are collected.
Basic designs
- independent samples (complete randomisation)
- matched pairs samples (randomisation between matched pairs)
Assumptions
- A random sample of size
from population 1 with mean and variance - A random sample of size
from population 2 with mean and variance - Both samples are independent
- Population variances are known and not the same
- Either one of the following conditions holds:
- Both populations are normal
- Both samples are large:
Consider
Then,
Thus,
and using the independence assumption,
When
- populations are normal OR
- both samples are large
(using CLT)
Thus, our point of interest is the following difference
with confidence
Getting the confidence interval
If
Confidence interval for difference, with known and unequal variances
Thus, the
confidence interval for is:
Independent Samples: Large, with Unknown Variances
Assumptions
- A random sample of size
from population 1 with mean and variance - A random sample of size
from population 2 with mean and variance - Both samples are independent
- Population variances are unknown and not the same
- Both samples are large:
As
Now, using the sample variances
Confidence interval for difference, with large sample size and unequal variances
Thus, the
confidence interval for is:
Independent Samples: Small, with Equal Variances
Equal Variance Assumption
In real applications, equal variance assumption is usually unknown and needs to be checked.
Assumptions
- A random sample of size
from population 1 with mean and variance - A random sample of size
from population 2 with mean and variance - Both samples are independent
- Population variances are unknown and the same
- Both samples are small:
- Both populations are normally distributed
Based on equal variance assumption, as well as normally distributed population:
Since both variances are equal, we can estimate
Pooled estimator
S^{2}_{p}
Using the pooled estimator, the statistic:
We then get the probability:
Confidence interval for difference, with small sample size and equal variances
Thus, the
confidence interval for is:
Equal Variance Assumption
In real applications, equal variance assumption is usually unknown and needs to be checked.
Assumptions
- A random sample of size
from population 1 with mean and variance - A random sample of size
from population 2 with mean and variance - Both samples are independent
- Population variances are unknown and the same
- Both samples are small:
- Both populations are normally distributed
Based on equal variance assumption, as well as normally distributed population:
Since both variances are equal, we can estimate
Pooled estimator
S^{2}_{p}
Using the pooled estimator, the statistic is similar, but due to CLT, follows a Normal distribution instead.
We then get the probability:
Confidence interval for difference, with large sample size and equal variances
Thus, the
confidence interval for is:
Dependent Samples: Paired Data
Assumptions
are matched pairs, with being a random sample from population 1, and being a random sample from population 2. are dependent and are independent for any - For matched pairs, define
is now a random sample from a single population, with mean , variance
We can then consider the technique used for a single population:
Then, we get the statistic:
Then, using the CLT:
or
Confidence interval for difference of paired samples, with small sample size
Thus, the
confidence interval for is:
Confidence interval for difference of paired samples, with large sample size
Thus, the
confidence interval for is: