|
|
Estimator Bias

Estimation is something you use every day without even thinking about it.  You estimate how long it will take you to get to work, how much salt to add to your cooking, and how well your favourite football team will do.  That doesn't mean you are always right!  So how do you know how good your estimate is?  How can you tell if they are biased or not?

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Estimator Bias

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

Estimation is something you use every day without even thinking about it. You estimate how long it will take you to get to work, how much salt to add to your cooking, and how well your favourite football team will do. That doesn't mean you are always right! So how do you know how good your estimate is? How can you tell if they are biased or not?

This is where statisticians use of estimator bias comes in. Since your estimate is based on an average idea of how things have gone in the past, you can use an estimator for the average, and from there figure out how biased or unbiased it is.

Comparing estimators and finding the variance or standard error of an estimator are explained in the article Quality of Estimators.

Definition of the Bias of an estimator

Say, for example, you wanted to find the mean length of fish in an aquarium. Not only are there a huge number of fish you'd need to measure, but it's also very difficult to catch and measure all the fish.

Instead of measuring every single fish in the population (which is referred to as a census), a better approach would be to take a sample of fish, and from that sample find an estimate for the mean length of the fish. This is referred to as an estimator.

First, however, you need to know what a statistic is.

The statistic, \(T\), is comprised of \(n\) samples of random variable \(X\) (i.e. \(X_1,X_2,X_3,...,X_n\)). These observations are independent and are each identically distributed.

Often these are called test statistics to differentiate them from the word "statistics". Mathematically, this means that the statistic used to estimate a parameter, \(T\), will be comprised of \(n\) independent, random samples taken from a random variable, \(X\).

An estimator is a statistic used to estimate a population parameter. An estimate is the value of the estimator when taken from a sample.

You might also see an estimator called a point estimate. It is important to be able to recognise what estimators are. Have a look at the following example.

Explain why the following functions are or are not estimators where \(X_1, X_2,...,X_n\) are taken from a population with parameters \(\mu\) and \(\sigma\).

i) \(\dfrac{X_3+X_6}{2}\)

ii) \(\dfrac{\sum(X_i-\mu)^2}{n}\)

Solution:

i) The function

\[\dfrac{X_3+X_6}{2}\]

is an estimator since it is comprised of independent, identically distributed samples.

ii) On the other hand,

\[\dfrac{\sum(X_i-\mu)^2}{n}\]

is not an estimator since it contains \(\mu\) which is not a sample. In fact, this potential estimator is not even a statistic. The variable \(\mu\) is the population parameter! You can't use a formula involving the population parameter to estimate the population parameter.

Let's take look at a quick overview.

Overview of estimator bias

Not all statistics are reliable estimators. To determine the validity of a statistic's ability to estimate a parameter, you will need to find the expected value of the statistic.

If the expectation of the statistic is different to the parameter that you want to estimate, then this tells you that the statistic is biased.

You can think of bias as a measure of how skewed your sampling distribution is, or how far from the population parameter your estimator is as well. The more skewed the sampling distribution, the higher the bias.

For more information on skew, see the article Skewness.

Bias of an estimator explanation

You can write the definition of an estimate being biased or unbiased using simple mathematical notation.

If \(\hat{\theta}\) is a statistic used to estimate population parameter \(\theta\), \(\hat{\theta}\) is unbiased when

\[\text {E}(\hat{\theta})=\theta\]

where \(\text{E}\) is the notation for expected value. Any statistic which is not unbiased is called biased.

If \(\hat{\theta}\) is biased, the bias can be found using the following formula:

\[\text{Bias}(\hat{\theta})=\text{E}(\hat{\theta})-\theta.\]

How large the bias of \(\hat{\theta}\) is can be found using the following formula:

\[\text{Bias}(\hat{\theta})=\text{E}(\hat{\theta})-\theta.\]

Notice that if \(\text{E}(\hat{\theta})=\theta \) then \(\text{Bias}=0\).

Let's put the definition to use.

Show that \(\text{E}(\bar{X})=\mu\) where

\[\bar{X}=\frac{(X_1+X_2+\dots+X_n)}{n} \]

is an unbiased estimator.

Solution:

Keeping in mind that \(\text {E}(aX)=\text {E}(X)\), you have

\[\begin{align}\text {E}(\bar{X})&=\frac{1}{n}\text{E}(X_1+\dots +X_n)\\&=\frac{1}{n}(\text {E}(X_1)+\dots +\text {E}(X_n))\end{align}\]

Since \(\text {E}(X_i)=\mu\) for all \(i\), you have

\[ \begin{align} \text {E} (\bar{X}) &= \frac{\mu +\mu +\dots + \mu}{n} \\ &= \frac{n \mu}{\mu}\\ &=\mu .\end{align}\]

This shows that \(\text {E}(\bar{X})=\mu\), which means \(\bar{X}\) is an unbiased estimator of parameter \(\mu\). This means that on average, this statistic will give the correct value for the estimated parameter.

For a reminder on why \(\text {E}(aX)=\text {E}(X)\), see the article Sum of Independent Random Variables.

The fact that the previous example gives you an unbiased estimator is why you will see it used to construct confidence intervals.

Estimator Bias example

Not all estimators are unbiased!

You are given

\[T=\frac{X_1+2X_2}{n}\]

as a candidate for an estimator of the parameter for the mean of a distribution, \(t\), where \(n\) is the total number of samples taken. Find the bias of this statistic.

Solution:

In this problem, the population parameter is the mean, \(t\). So to find the bias, you can use the formula

\[\text{Bias}(T)=\text {E}(T)-t,\]

giving you

\[ \begin{align} \text{Bias} (T) &= \text {E} \left(\frac{X_1+2X_2}{n}\right) -t \\&= \frac{\text {E} (X_1)+2\text {E} (X_2)}{n} -t \\&= \frac{3t}{n}-t\\&= \frac{t(3-n)}{n} .\end{align}\]

Therefore the bias of estimator \(T\) is

\[\text{Bias}(T) = \dfrac{t(3-n)}{n}.\]

Bias of estimator formula

While the sample mean is one way to get an unbiased estimator, it is not the only way. Let's look at applying the formula for the estimator of bias to variance instead.

To find an estimator for the population variance, you may try to use the variance of the sample which would be denoted as

\[V=\frac{\sum\limits_{i=1}^n(X_i-\bar{X})^2}{n}.\]

However, since this formula uses the sample mean, \(\bar{X}\), rather than \(\mu\), the population mean, the variance of a sample will be biased towards the sample mean rather than the population mean.

Instead, you can use a different statistic: the sample variance. This will give you an unbiased estimator for the population variance, \(\sigma^2\).

An unbiased estimator for the population variance, \(\sigma ^2\), is the sample variance, \(S^2\):

\[S^2=\frac{\sum\limits^n_{i=1} (X_i-\bar{X})^2}{n-1}.\]

This formula isn't always the easiest to use when calculating the sample mean. The are other ways to find \(s^2\).

These are the ways that you can calculate the sample variance:

\[\begin{align} s^2 &= \frac{\sum\limits^n_{i=1} (X_i-\bar{X})^2}{n-1} \\&= \frac{\sum\limits_{i=1}^n x^2-n\bar{x}^2}{n-1} \\&=\frac{S_{xx}}{n-1} .\end{align} \]

In general, \(S^2\) is used to denote the estimator for the population variance, and \(s^2\) is used to denote a particular estimate. It's worth learning the above two equivalent formulas as they are significantly easier to apply than the first one.

Let's take a look at the proof that \(s^2\) is an unbiased estimate for \( \sigma ^2\). In other words, the goal is to show that \(\text {E}(s^2)=\sigma ^2\).

To do this, you need to write the expectation of the sample variance

\[\text{E}(S^2) = \frac{\sum\limits_{i=1}^n x^2-n\bar{x}^2}{n-1} \]

in terms of \(\sigma\) and \(\mu\). Notice that you have already used one of the alternate ways of calculating the sample variance.

First, using the definition of \(\sigma ^2\), you have

\[\begin{align} \sigma ^2 &=\text{Var}(X) \\ &=\text {E}(X^2)-\mu ^2, \end{align} \]

therefore \(\text{E}(X^2)=\sigma ^2 +\mu ^2.\)

You also know that \(\text{Var}(\bar{X})=\dfrac{\sigma ^2}{n}\) and \(\text{E}(\bar{X})=\mu\), so you can write \(\text{Var}(\bar{X})\) as

\[\begin{align} \text{Var}(\bar{X}) &= \frac{\sigma ^2}{n} \\ &=\text {E}(\bar{X} ^2)-\mu ^2, \end{align}\]

so

\[\text {E}(\bar{X}^2)=\frac{\sigma ^2}{n}+\mu ^2.\]

The expectation of the sample variance is given by:

\[\begin{align} \text {E}(S^2) &= \frac{ \text {E}\left(\sum\limits_{i=1}^n X^2-n\bar{X}^2\right)}{n-1} \\&= \frac{ \text {E}\left(\sum\limits_{i=1}^n X^2\right)-\text {E}(n\bar{X}^2)}{n-1} .\end{align} \]

Since

\[\begin{align} \text {E}\left(\sum\limits_{i=1}^n X^2\right)&=\sum\limits_{i=1}^n \text {E}(X^2)\\ &=n\text {E}(X^2), \end{align}\]

you have

\[\begin{align} \text {E}(S^2) &= \frac{ n\text {E}(X^2)-\text {E}(n\bar{X}^2)}{n-1} \\ &= \frac{n(\sigma ^2 +\mu ^2)-n\left(\dfrac{\sigma ^2}{n} +\mu ^2\right)}{n-1}\\ &=\frac{n\sigma^2 +n\mu ^2 -\sigma ^2 -n\mu ^2 }{n-1} \\&= \frac{(n-1)\sigma ^2}{n-1} \\ &=\sigma^2 . \end{align} \]

Since \(\text {E}(s^2)=\sigma ^2\), you have shown that \(s^2\) is an unbiased estimate for the population variance, \(\sigma ^2\).

While you may not need to memorise the proof, it is always good to read and understand the steps to ensure you have a good understanding of the topic.

Estimator Bias - Key takeaways

  • An estimator is a statistic used to estimate a population parameter. An estimate is the value of the estimator when taken from a sample.
  • The statistic, \(T\), is comprised of \(n\) samples of random variable \(X\) (i.e. \(X_1,X_2,X_3,\dots ,X_n\)). These observations are independent are each identically distributed.
  • If \(\hat{\theta}\) is a statistic used to estimate population parameter \(\theta\), \(\hat{\theta}\) is unbiased when \(\text {E}(\hat{\theta})=\theta\).
  • If \(\hat{\theta}\) is biased, the bias can be quantified using the following formula:\[\text{Bias}(\hat{\theta})=\text {E}(\hat{\theta})-\theta.\]

Frequently Asked Questions about Estimator Bias

Biased estimators are where the expectation of the statistic is different to the parameter that you want to estimate.

The bias of an estimator is the difference between the expectation of the estimator and the parameter it is supposed to estimate.

If an estimator is biased, then it will have an expected value that is different than the parameter it is supposed to estimate.

The bias of an estimator is whether it is, on average, different from the parameter it is supposed to estimate. The variance of an estimator is how consistent the estimator is.

No, estimators are not always unbiased. It is preferential to use an unbiased estimator, but it may be the best option.

Test your knowledge with multiple choice flashcards

True or False: The sample mean is an unbiased estimator for the population mean.

The sample variance is...

What is the formula for \(\text{Var}(X)\)?

Next

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App