|
|
Comparing Two Means Hypothesis Testing

When facing different scenarios, you will need to adapt your hypothesis testing method. One scenario that frequently arises is one where you wish to test whether there is a difference between two means. You might have done this already using the normal distribution. But what happens if you don't know the variances of these populations and your sample sizes are small?

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Comparing Two Means Hypothesis Testing

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

When facing different scenarios, you will need to adapt your hypothesis testing method. One scenario that frequently arises is one where you wish to test whether there is a difference between two means. You might have done this already using the normal distribution. But what happens if you don't know the variances of these populations and your sample sizes are small?

That's where the \(t\)-distribution comes in. This article will take you through a hypothesis test for the difference in means of two independent, normally distributed populations.

Comparing Two Means: Hypothesis Testing Overview

The \(t\)-distribution can also be used to test the means of two independent normal distributions when the variances are unknown and the sample sizes are small. To do so, you will need to assume the populations have the same variance and therefore need to use a pooled estimate of variance.

For a reminder on the \(t\)-distribution and its properties, see the article T-distribution.

Unlike the paired \(t\)-test, where you are comparing the results of an experiment before and after some treatment, here you are comparing two independent normal distributions.

Describe the kind of hypothesis test would you use in the following scenarios.

1. A mobile phone company has released a new software update. They have asked you to find statistical evidence to support their claim that the software update has improved battery life.

2. A pet store sells Welsh Corgi puppies from two different breeders. They wish to determine whether there is a significant difference between the weights of the puppies from each breeder.

Solution

1. In order to conduct this experiment, you would need to collect samples of information on phone battery life before and after the software update. Since the samples will be taken from the same population after a change has been made, they are not independent. Therefore, you need to use a paired t-test.

2. In this case, you would be required to take samples of weights from two different breeders and therefore two independent distributions. You should assume that the populations have the same variances, therefore you will need to use a pooled estimate of variance to find the t-value and not a paired t-test.

Hypothesis testing for the difference of two means

The hypothesis test for the difference of two means follows these steps:

  1. Find the null hypothesis and alternative hypothesis, \(H_0\) and \(H_1\).

  2. Determine the significance level from the questions, \(\alpha\).

  3. Determine the number of degrees of freedom, \(\upsilon\).

  4. Find the critical region.

  5. Calculate the pooled estimate of the variance, \(s^2_p\).

  6. Calculate \(t\).

  7. Compare the value of \(t\) with your critical region and state your conclusion, addressing whether the result is significant, and what this means in the context of the question.

Next let's take a look at the hypotheses you will need to do the test.

Null hypothesis for comparing two means

While comparing two means, your null hypothesis will state that the difference between the two populations you are testing is equal to zero. In other words, the null hypothesis is that there is no difference in the population means.

Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.

To perform a hypothesis test for the difference between the means of these distributions, use the following null hypothesis,

\[H_0:\, \mu _x =\mu _y.\]

What about the alternative hypothesis?

Alternative hypothesis for comparing two means

The alternative hypothesis for comparing two means will depend on whether you wish to test whether one particular distribution is greater than the other (a one-tailed test), or simply whether there is any difference at all (a two-tailed test).

When using a two-tailed test, remember to divide the significance level between the two tails!

Remember to read the question carefully to determine which sort of alternative hypothesis to use.

Samples are taken from two distributions, \(X\) and \(Y\), under the assumption that they are independent and normally distributed.

In the case that you wish to test whether the means are different (that is a two-tailed test), you will have the following alternative hypothesis,

\[H_1:\, \mu _x \neq \mu _y.\]

In the case that you wish to test whether the mean of \(X\) is greater than the mean of \(Y\) (that is a one-tailed test), you will have the following alternative hypothesis,

\[H_1:\, \mu _x > \mu _y.\]

Next let's see some of the calculations involved.

Comparing Two Population Means Hypothesis Testing: Calculations

When testing for the difference between means, there are some extra calculations that you'll need to perform to find the pooled estimate of the variance and the value of \(t\) that you wish to test.

Using sample variances, \(s^2_x\) and \(s^2_y\), and the size of each sample, \(n_x\) and \(n_y\), the pooled estimate of the variance is given by the formula

\[s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}.\]

Once you have found \(s^2_p\), you will need to find the \(t\)-critical value that goes with it.

Given samples means and variances \(\bar{x}\), \(\bar{y}\), \(s^2_x\) and \(s^2_y\) and the pooled estimate of variance \(s^2_p\), the \(t\)-critical value, \(t^*\) is:

\[t^*=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}.\]

Hypothesis Testing Two Population Means Examples

Next, let's look at a couple of examples on how to use and calculate these statistics within an actual hypothesis test.

A pet store sells Welsh Corgi puppies on behalf of two puppy breeders, \(X\) and \(Y\). They have sampled the weights of puppies from each breeder.

Hypothesis Test for the Difference Between Two Means six Corgi puppies all in a row StudySmarterFig. 1 - Puppies always make math better!

Weights of puppies from breeder \(X\) in kilograms: \(5.44,5.32,5.21,5.67.\)

Weights of puppies from breeder \(Y\) in kilograms: \(5.02,4.99,5.42,5.21,5.11.\)

The pet store wishes to know whether there is a statistically significant difference between the weights of the puppies from each breeder.

a. If you wanted to test the difference in the weights of the puppies, what assumptions need to be made?

b. Test whether the mean weights of puppies from the two breeders is different at the \(10\%\) confidence level.

Solution

a. In order to test the difference in the weights of the puppies, the assumptions to be made are that the samples of puppies are normally distributed, independent and have the same variances.

b. The test is two-tailed, so the hypotheses are,

\[ \begin{align} &H_0:\, \mu _x=\mu _y \\ &H_1: \,\mu _x \neq \mu _y.\end{align}\]

This is a two-tailed test since the alternative hypothesis is that the mean weights are different. The significance level is \(10\)%, so the critical region will have the probability of \(0.05\) in each tail of the distribution.

The number of degrees of freedom is

\[\upsilon = (4-1)+(5-1)=7.\]

To find degrees of freedom in this case, you need to add together the degrees of freedom from each sample. Or, you can use the formula \(\upsilon = n_x+n_y-2\).

The critical value can be found using a calculator or probability tables:

\[t_{\upsilon =7}(0.05)=1.895.\]

Next, find the pooled estimate of variance. You should have \(\bar{x}=5.41\) and \(\bar{y}=5.17.\)

The samples variances are \(s^2_x=0.038866667 \) and \(s^2_y=0.03015\).

Therefore, the pooled estimate of variance is,

\[\begin{align} s^2_p &= \frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)} \\&= \frac{(4-1)0.038867 +(5-1)0.03015 }{(4-1)+(5-1)} \\&=0.033886 \text{ to 5 s.f.} \end{align}\]

Your value of \(t^*\) is then:

\[\begin{align} t&=\frac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}\\&=\dfrac{(5.41-5.17)-(0)}{\sqrt{0.033886\left(\dfrac{1}{4}+\dfrac{1}{5}\right)}}\\&=1.9435\end{align}\]

Since \(t^*=1.9435>1.895=t_\upsilon\), your value of \(t^*\) falls within the critical region. Therefore, at the \(10\)% significance level, you can reject the null hypothesis.

In conclusion, there is evidence to suggest there is a difference between the means of the weights of Welsh Corgi puppies from the two breeders.

This second example is slightly different to the first. The method will need to be adapted slightly.

A food delivery service, \(A\), claims that their average food delivery time is more than \(5\) minutes faster than the delivery time of their competitor, \(B\).

A random sample of delivery times from each company is collected:

  • Food delivery time for \(A\), in minutes: \(22,16,45,23,39,32.\)
  • Food delivery time for \(B\), in minutes: \(34,42,63,18,25,46,47.\)

Food delivery service \(B\) hires you to test whether this claim is statistically significant at the \(10\%\) significance level. Complete a hypothesis test for the difference between means and explain what this means for the two food delivery services.

Solution

Since the samples are independent the null hypothesis would normally be that the two means are the same. However the claim is that service \(A\) averages \(5\) minutes faster than their competitor, so the null hypothesis is instead \(\mu _A=\mu _B -5 \). Since you are only interested in whether the food delivery time is greater for one service, the hypotheses are:

\[ \begin{align} &H_0:\,\mu _A=\mu _B -5 \\ &H_1: \,\mu_A < \mu _B-5. \end{align}\]

This is a one-tailed test.The significance level is \(10\)%, so the critical region will have the probability of \(0.10\) in the left tail of the distribution.

The number of degrees of freedom are

\[\upsilon = (6-1)+(7-1)=11.\]

The critical value can be found using a calculator or probability tables,

\[t_{\upsilon =11}(0.10)=1.363.\]

Since you are only interested in whether \(\mu _a\) is less than \(\mu _b -5\), the critical value is \(t_\upsilon = -1.363\).

If the alternative hypothesis had been greater than, you would have used \(t_\upsilon = 1.363\) instead.

Next, find the pooled estimate of variance. You have \(\bar{a}=29.5\) and \(\bar{b}=39.3\). The samples variances are \(s^2_a=123.50 \) and \(s^2_b=226.57\). Therefore, the pooled estimate of variance is:

\[\begin{align} s^2_p &= \frac{(n_a-1)s^2_a+(n_b-1)s^2_b}{(n_a-1)+(n_b-1)} \\&= \frac{(6-1)123.50 +(7-1)226.57 }{(6-1)+(7-1)} \\&=179.72\text{ to 5 s.f.} \end{align}\]

The value of \(t^*\) is therefore,

\[\begin{align} t^*&=\frac{(\bar{a}-\bar{b})-(\mu _a - \mu _b)}{\sqrt{s^2_p\left(\dfrac{1}{n_a}+\dfrac{1}{n_b}\right)}}\\&=\dfrac{(29.5 -39.3)-(-5)}{\sqrt{179.72 \left(\dfrac{1}{6}+\dfrac{1}{7}\right)}}\\&=-0.64357.\end{align}\]

Since the null hypothesis states that \(\mu _x=\mu _y-5\), you will have \(\mu _x-\mu _y=-5\).

Since \(t^*=-0.64357>-1.363=t_\upsilon \), the value of \(t\) falls within the acceptance region. Therefore, at the \(10\%\) significance level, you fail to reject the null hypothesis.

This means that there is not sufficient evidence to suggest that delivery service \(A\) has a delivery time better than \(5\) minutes faster than delivery service \(B\).

For a more detailed explanation of the pooled estimate of variance, check out the article Pooled Estimate of Variance.

Comparing Two Means Hypothesis Testing - Key takeaways

  • The \(t\)-distribution can be used to test the means of two independent normal distributions when the variances are unknown
  • The assumptions are that the populations are independent, normal and have the same variance
  • The pooled estimate of variance formula is \[s^2_p=\frac{(n_x-1)s^2_x+(n_y-1)s^2_y}{(n_x-1)+(n_y-1)}.\]
  • The \(t^*\) value is \[t^*=\dfrac{(\bar{x}-\bar{y})-(\mu _x - \mu _y)}{\sqrt{s^2_p\left(\dfrac{1}{n_x}+\dfrac{1}{n_y}\right)}}.\]

Frequently Asked Questions about Comparing Two Means Hypothesis Testing

It depends on if the samples are independent or not.  If they are not independent then you can use a paired t-test.  If they are independent then you can use a test for the difference of two means.

If the two samples are independent, then the null hypothesis is that the difference in their means is zero.

The two means are significantly different if the \(t\)-critical value is outside the significance value selected for the hypothesis test.

Assuming that the samples are independent, the null hypothesis will be that the difference in the means is zero.  The alternative hypothesis will depend on whether you want to see if one mean is larger that the other, or if they are just different from each other. 

A comparison of means test is a kind of hypothesis test done when you have two independent samples and it uses a pooled estimate of variance.

Test your knowledge with multiple choice flashcards

The t-distribution can be used to test the means of two independent normal distributions when the variances are known.

Samples are taken from two distributions, \(X\) and \(Y\). They are independent and normally distributed.  You suspect that the mean of \(Y\) is larger than that of \(X\). What should your alternative hypothesis be? 

With sample variances \(s^2_x\) and \(s^2_y\), with sample sizes \(n_x\) and \(n_y\), the pooled estimate of the variance is:

Next
More about Comparing Two Means Hypothesis Testing

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App