# Confidence interval

A confidence interval is a mathematical interval between two values, based on a parameter. The purpose of the interval is to find the probability that the actual value of a parameter falls within the interval. They are often used to help prove the likelihood of causation. A "strong" confidence interval will have a well-defined, reasonable range, and the values of the parameter will fall in the interval frequently. While "frequently" is differently defined, it is often accepted as a 95% likelihood (sometimes phrased as "19 out of 20 times").

## Notation

Confidence intervals can be calculated for any parameter within a statistical population. For this example of notation, assume that mu is the mean, and mu-tilde is the estimator of mu. The probability that the mean and its estimator are less than some value y is equal to x, where y is some non-negative real number, and xϵ[0,1].

$\Pr(|\tilde \mu - \mu| < y)=x$

## Example

Define the following mean estimator:

$\tilde \mu=\frac{\sum_{i=1}^n X_i}{n}.$

Where mu-tilde has the following Gaussian sampling distribution

$\tilde \mu \sim G(\mu, \frac{\sigma}{\sqrt n})$

Assume the standard notation (sigma is the standard deviation, n is the sample size). You are given that sigma/sqrt(n) is 67.5. Assume we want to find the probability that the difference between the mean and its estimator is less than 100. Then:

$\Pr(|\tilde \mu - \mu| < 100)=2\Pr(\tilde \mu - \mu > 100)$

$\Pr(\tilde \mu - \mu > 100)$
$=1 - \Pr(\tilde \mu - \mu < 100)$
$=1 - \Pr(\frac{\tilde \mu - \mu}{\sigma / \sqrt n} > \frac {100}{\sigma / \sqrt n})$
$\approx 1 - \phi(\frac{100}{67.5})$
= 0.069, where Phi is the cumulative distribution function for the G(0,1) Gaussian distribution.

$\Pr(|\tilde \mu - \mu| < 100)=\Pr(-100 < \tilde \mu - \mu < 100)=\Pr(\tilde \mu - 100 < \mu < \tilde \mu + 100)$
= 0.138

Then we conclude that there is a 13.8% chance that the average is within the interval $(\tilde \mu - 100, \tilde \mu + 100)$

In another practice, the confidence interval is used to find the interval as opposed to the probability. In the above process, you are given the probability (say, 95%) that a parameter falls within an interval, and from there, the interval itself must be found. Then, one can conclude that 95% of times, the parameter will fall within a given interval.