|
|
Sampling Distribution

Let's say you want to know the average GPA of high school senior students in Atlanta, Georgia. To calculate the exact value, you would need to ask the population, that is, all the senior students in Atlanta, Georgia for their GPA. That sounds exhausting! But what if you just take a sample of it instead of asking all the senior students? This is the idea behind sampling distributions. 

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Sampling Distribution

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

Let's say you want to know the average GPA of high school senior students in Atlanta, Georgia. To calculate the exact value, you would need to ask the population, that is, all the senior students in Atlanta, Georgia for their GPA. That sounds exhausting! But what if you just take a sample of it instead of asking all the senior students? This is the idea behind sampling distributions.

In this article, you'll find the definition of sampling distributions, types of sampling distributions, the formulas, the mean and the standard deviation of sampling distributions, and examples of application.

Introduction to Sampling Distributions

Coming back to the example above, let's say you randomly select and sample \(100\) senior students and calculate the average GPA from this sample. This average GPA would not be the same as the mean GPA of all senior students in Atlanta. It could be lower or higher, but it would most likely not be exactly equal to the population mean.

If you select a second sample of \(100\) senior students, the average GPA for this sample would most likely differ from the mean of your first one. Thus, random samples selected would produce different mean values. Despite this variety of values, when many sample means are obtained, you can plot these collected means on a graph, and then this can provide an estimated mean of the entire population. This process explains the concept of creating sampling distributions of the mean.

Definition of Sampling Distributions

A value that is calculated by taking information from a sample is called a statistic. Statistics allows you to estimate data of an entire population. As you saw in the example above, different random samples can give different values for a statistic; this difference is called sampling variability (or sampling error). This sampling variability can be reduced by increasing the sample size.

The distribution formed by all the possible values for sample statistics obtained for every possible different sample of a given size is called the sampling distribution.

Conditions for Sampling Distributions

To ensure that the sampling distribution truly estimates the entire population, you must make sure that these two criteria are checked:

  1. Randomization condition: the most important condition necessary for creating a sampling distribution is that your data comes from samples randomly selected.

  2. Independence (\(10\%\) condition): the sampled values must be independent one from another. Achieving this condition is the same as considering sample sizes no larger than \(10\%\) of the entire population.

Let's go back to the average GPA example. For the randomization condition, unless you have a list of the students with the highest GPA in Atlanta, choosing any \(100\) student randomly is enough to satisfy this condition.

On the other hand, for the independence condition, it is not unreasonable to assume that there are more than \(10\, 000\) senior students in Atlanta, so the \(10\%\) of this is \(1\,000\). Any sample size less than \(1\,000\) satisfies this condition, thus considering samples of a \(100\) in size is acceptable.

Types of Sampling Distributions

There are 3 types of sampling distributions:

  1. Sampling distribution of proportions

  2. Sampling distribution of means

  3. T-distribution

Sampling Distribution of Proportions

It is used to estimate a population proportion. It calculates the proportion of success, or chance, that a specific event will occur. The mean from each group of the sample proportion is a representation of the estimated proportion of success of the entire population.

Sampling Distribution of Means

It entails calculating the means of all sample groups from a selected population. Then, the average of the means of all the samples is an estimated mean of the entire population.

T-distribution

It is focused on a small population. It is used to measure the mean of the population and other statistical measurements such as confidence intervals, linear regression, and statistical differences. Since this distribution uses \(t\)-scores to calculate probabilities, it is out of the scope of this article.

Formula for Sampling Distributions

The sample proportion, denoted by \(\widehat{p}\), is calculated by counting how many successes are in the sample (success means that an individual possesses the characteristic of interest) and dividing it by the total sample size \(n\)

\[\widehat{p}=\frac{\text{number of successes in the sample}}{n}.\]

The sample mean, denoted by \(\overline{x}\), is calculated by adding up all the values obtained from the sample and dividing by the total sample size \(n\). The idea is the same as finding the average for a set of data. The formula is

\[\overline{x}=\frac{x_1+x_2+...+x_n}{n},\]

where \(\overline{x}\) is the sample mean, \(x_i\) is each one of the values of the sample, and \(n\) is the sample size.

Mean and Standard Deviation of Sampling Distributions

All probability distributions have characteristics that distinguish them. Sampling distributions are no exception, knowing the mean and standard deviation can give you a lot of information about the shape of the distribution.

Mean and Standard Deviation of the Sample Proportion

Let \(p\) be the proportion of success in a population and \(\widehat{p}\) the sample proportion, that is, the proportion of success in a random sample of size \(n\), then the sampling distribution of \(\widehat{p}\) has mean and standard deviation given by \[\mu_\widehat{p}=p\,\text{ and }\, \sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.\]

Moreover, if \[np\geq 10\,\text{ and }\, n(1-p)\geq 10,\] then, the sampling distribution of \(\widehat{p}\) is similar to a normal distribution.

A random sample is selected from a population that has a proportion of successes \(p=0.72\). Calculate the mean and standard deviation of the sampling distribution of \(\widehat{p}\) with sample size \(n=20\).

Solution:

Using the formulas stated before, the mean is equal to the proportion of success of the population, then \[\mu_\widehat{p}=0.72,\] while the standard deviation is given by \[\sigma_\widehat{p} =\sqrt{\frac{0.72(0.28)}{20}}\approx 0.100.\]

Mean and Standard Deviation of the Sample Mean

Let \(\mu\) be the mean and \(\sigma\) the standard deviation of the population. Let \(\overline{x}\) be the sample mean of a random sample of size \(n\), then the sampling distribution of \(\overline{x}\) has mean and standard deviation given by \[\mu_\overline{x}=\mu\,\text{ and }\, \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]

The standard deviation of the sampling distribution of means is also known as the standard error of the mean (SEM).

If the sample size \(n\) is large enough (according to the Central Limit Theorem, \(n\geq 30\) is enough) then, the sampling distribution of \(\overline{x}\) is similar to a normal distribution.

A random sample is selected from a population with mean \(\mu=80\) and standard deviation \(\sigma=5\). Calculate the mean and standard deviation of the sampling distribution of \(\overline{x}\) with sample size \(n=35\).

Solution:

Using the formulas stated before, the sample mean is equal to the mean of the population, so \[\mu_\overline{x}=80.\] And for the standard deviation of the sample mean

\[\sigma_\overline{x}=\frac{5}{\sqrt{35}}\approx 0.845.\]

Examples of Sampling Distributions

Let's see an example using sampling distributions.

A restaurant stated \(30\%\) of their customers like pineapple on their pizza. If there are \(100\) customers on a given day, what is the probability that at least \(40\%\) of these customers will buy a pizza with pineapple?

Solution:

(1) Note that \(p=0.30\), \((1-p)=0.70\) and the sample size is \(n=100\). Thus, the mean \(\mu_\widehat{p}=0.30\) and the standard deviation \[\sigma_{\widehat{p}}=\sqrt{\frac{(0.30)(0.70)}{100}}\approx 0.046.\]

(2) Since \(np=100(0.30)=30>10\) and \(n(1-p)=100(0.70)=70>10\), then the sampling distribution of \(\widehat{p}\) is similar to a normal distribution, and you can use this later to calculate the probability.

(3) Converting \(\widehat{p}\) into \(z\)-score (see the article \(z\)-scores for more details), you will have

\[\begin{align} P(\widehat{p}>40) &= P\left(z>\frac{0.40-0.30}{0.046}\right) \\ &=P(z>2.17) \\ & =1-P(z<2.17) \\ &= 1-0.9850 \\ &=0.015.\end{align}\]

Thus, the probability that at least \(40\%\) of these customers ask for a pizza with pineapple is \(0.015\).

Let's see one extra example.

A company claims that the average lifetime of their lightbulbs is \(2\,000\) hours with a standard deviation of \(300\) hours. What is the probability that a random sample of \(50\) lightbulbs have an average lifetime of less than \(1\,900\) hours?

Solution:

(1) Since the sample size is \(n=50\), according to the Central Limit Theorem, the sampling distribution of the mean \(\overline{x}\) follows a normal distribution with mean \(\mu_\overline{x}=2\,000\) and standard deviation \[\sigma_\overline{x}=\frac{300}{\sqrt{50}} \approx 42.426. \]

(2) Converting the \(\overline{x}\) into \(z\)-scores and using the standard normal table (see the article Standard Normal Distribution for more information), you will have

\[\begin{align} P(\overline{x}<1\,900) &=P\left(z<\frac{1\,900-2\,000}{42.426}\right) \\ &=P(z<-2.35) \\ &= 0.0094. \end{align}\]

Thus, the probability that from a sample of size \(n=50\) lightbulbs the average lifetime is less than \(1\,900\) hours is \(0.0094\).

Sampling Distribution - Key takeaways

  • A sampling distribution shows every possible statistic that can be obtained from every possible sample of the population.
  • The sampling distribution of proportion \(\widehat{p}\) has mean and standard deviation \[\mu_\widehat{p}=p\, \text{ and } \,\sigma_\widehat{p}=\sqrt{\frac{p(1-p)}{n}}.\]
  • When \(np\geq 10\) and \(n(1-p)\geq 10,\) the sampling distribution of proportion \(\widehat{p}\) behaves like a normal distribution.
  • The sampling distribution of mean \(\overline{x}\) has mean and standard deviation \[\mu_\overline{x}=\mu\,\text{ and }\, \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}.\]
  • When \(n\geq 30\), the Central Limit Theorem states that the sampling distribution of mean \(\overline{x}\) behaves like a normal distribution.

Frequently Asked Questions about Sampling Distribution

A sampling distribution is a statistical tool that helps to determine the probability of an event or another statistical parameter in a population based on taking random and small samples of it.

✓ Sampling distribution of proportions

✓ Sampling distribution of means

✓ T-distribution

To find the sampling distribution, follow the following steps:

  1. select random samples of fixed size from the population;
  2. obtain your data and summarize;
  3. plot the distribution of the summary data.

✓ The sample mean is a good estimator (unbiased) of the population mean.

✓ The data is centered on the mean or close to the true population mean.
✓ The distribution is normal and has a symmetric shape when enough data points are included (at least 30, according to the Central Limit Theorem).

The sampling distribution allows you to determine information about an entire population using only information from small samples.

Test your knowledge with multiple choice flashcards

To use the normal distribution to model a sampling distribution of mean, the following condition regarding the sample size must be satisfied:

The standard deviation of the sampling distribution of the proportion \(\widehat{p}\) can be calculated using the formula ____.

To use the normal distribution to model a sampling distribution of proportion, the following condition must be satisfied:

Next

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App