Everyone's been in the situation before: you and your significant other can't agree on what to watch for date night! While the two of you are debating over which movie to watch, a question arises in the back of your mind; do different types of people (for instance, men vs. women) have different movie preferences? The answer to this question, and others like it, can be found using a specific Chi-square test – the Chi-square test for homogeneity.
Explore our app and discover over 50 million learning materials for free.
Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken
Jetzt kostenlos anmeldenNie wieder prokastinieren mit unseren Lernerinnerungen.
Jetzt kostenlos anmeldenEveryone's been in the situation before: you and your significant other can't agree on what to watch for date night! While the two of you are debating over which movie to watch, a question arises in the back of your mind; do different types of people (for instance, men vs. women) have different movie preferences? The answer to this question, and others like it, can be found using a specific Chi-square test – the Chi-square test for homogeneity.
When you want to know if two categorical variables follow the same probability distribution (like in the movie preference question above), you can use a Chi-square test for homogeneity.
A Chi-square \( (\chi^{2}) \) test for homogeneity is a non-parametric Pearson Chi-square test that you apply to a single categorical variable from two or more different populations to determine whether they have the same distribution.
In this test, you randomly collect data from a population to determine if there is a significant association between \(2\) or more categorical variables.
All the Pearson Chi-square tests share the same basic conditions. The main difference is how the conditions apply in practice. A Chi-square test for homogeneity requires a categorical variable from at least two populations, and the data needs to be the raw count of members of each category. This test is used to check if the two variables follow the same distribution.
To be able to use this test, the conditions for a Chi-square test of homogeneity are:
The variables must be categorical.
Because you are testing the sameness of the variables, they have to have the same groups. This Chi-square test uses cross-tabulation, counting observations that fall in each category.
Reference the study: “Out-of-Hospital Cardiac Arrest in High-Rise Buildings: Delays to Patient Care and Effect on Survival”1 – which was published in the Canadian Medical Association Journal (CMAJ) on April \(5, 2016\).
This study compared how adults live (house or townhouse, \(1^{st}\) or \(2^{nd}\) floor apartment, and \(3^{rd}\) or higher floor apartment) with their survival rate of a heart attack (survived or did not survive).
Your goal is to learn if there is a difference in the survival category proportions (i.e., are you more likely to survive a heart attack depending on where you live?) for the \(3\) populations:
Groups must be mutually exclusive; i.e., the sample is randomly selected.
Each observation is only allowed to be in one group. A person can live in a house or an apartment, but they cannot live in both.
Contingency Table | |||
---|---|---|---|
Living Arrangement | Survived | Did Not Survive | Row Totals |
House or Townhouse | 217 | 5314 | 5531 |
1st or 2nd Floor Apartment | 35 | 632 | 667 |
3rd or Higher Floor Apartment | 46 | 1650 | 1696 |
Column Totals | 298 | 7596 | \(n =\) 7894 |
Table 1. Table of contingency, Chi-Square test for homogeneity.
Expected counts must be at least \(5\).
This means the sample size must be large enough, but how large is difficult to determine beforehand. In general, making sure there are more than \(5\) in each category should be fine.
Observations must be independent.
This assumption is all about how you collect the data. If you use simple random sampling, that will almost always be statistically valid.
The question underlying this hypothesis test is: Do these two variables follow the same distribution?
The hypotheses are formed to answer that question.
The null hypothesis requires every single category to have the same probability between the two variables.
The alternative hypothesis is that the two variables are not from the same distribution, i.e., at least one of the null hypotheses is false.\[ \begin{align}H_{a}: p_{1,1} &\neq p_{2,1} \text{ OR } \\p_{1,2} &\neq p_{2,2} \text{ OR } \ldots \text{ OR } \\p_{1,n} &\neq p_{2,n}\end{align} \]
If even one category is different from one variable to the other, then the test will return a significant result and provide evidence to reject the null hypothesis.
The null and alternative hypotheses in the heart attack survival study are:
The population is people who live in houses, townhouses, or apartments and who have had a heart attack.
You must calculate the expected frequencies for a Chi-square test for homogeneity individually for each population at each level of the categorical variable, as given by the formula:
\[ E_{r,c} = \frac{n_{r} \cdot n_{c}}{n} \]
where,
\(E_{r,c}\) is the expected frequency for population \(r\) at level \(c\) of the categorical variable,
\(r\) is the number of populations, which is also the number of rows in a contingency table,
\(c\) is the number of levels of the categorical variable, which is also the number of columns in a contingency table,
\(n_{r}\) is the number of observations from population \(r\),
\(n_{c}\) is the number of observations from level \(c\) of the categorical variable, and
\(n\) is the total sample size.
Continuing with the heart attack survival study:
Next, you calculate the expected frequencies using the formula above and the contingency table, putting your results into a modified contingency table to keep your data organized.
Table 2. Table of contingency with observed frequencies, Chi-Square test for homogeneity.
Contingency Table with Observed (O) Frequencies and Expected (E) Frequencies | |||
---|---|---|---|
Living Arrangement | Survived | Did Not Survive | Row Totals |
House or Townhouse | O1,1: 217E1,1: 208.795 | O1,2: 5314E1,2: 5322.205 | 5531 |
1st or 2nd Floor Apartment | O2,1: 35E2,1: 25.179 | O2,2: 632E2,2: 641.821 | 667 |
3rd or Higher Floor Apartment | O3,1: 46E3,1: 64.024 | O3,2: 1650E3,2: 1631.976 | 1696 |
Column Totals | 298 | 7596 | \(n =\) 7894 |
Decimals in the table are rounded to \(3\) digits.
There are two variables in a Chi-square test for homogeneity. Therefore, you are comparing two variables and need the contingency table to add up in both dimensions.
Since you need the rows to add up and the columns to add up, the degrees of freedom is calculated by:
\[ k = (r - 1) (c - 1) \]
where,
\(k\) is the degrees of freedom,
\(r\) is the number of populations, which is also the number of rows in a contingency table, and
\(c\) is the number of levels of the categorical variable, which is also the number of columns in a contingency table.
The formula (also called a test statistic) of a Chi-square test for homogeneity is:
\[ \chi^{2} = \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \]
where,
\(O_{r,c}\) is the observed frequency for population \(r\) at level \(c\), and
\(E_{r,c}\) is the expected frequency for population \(r\) at level \(c\).
Step \(1\): Create a Table
Starting with your contingency table, remove the “Row Totals” column and the “Column Totals” row. Then, separate your observed and expected frequencies into two columns, like so:
Table 3. Table of observed and expected frequencies, Chi-Square test for homogeneity.
Table of Observed and Expected Frequencies | |||
---|---|---|---|
Living Arrangement | Status | Observed Frequency | Expected Frequency |
House or Townhouse | Survived | 217 | 208.795 |
Did Not Survive | 5314 | 5322.205 | |
1st or 2nd Floor Apartment | Survived | 35 | 25.179 |
Did Not Survive | 632 | 641.821 | |
3rd or Higher Floor Apartment | Survived | 46 | 64.024 |
Did Not Survive | 1650 | 1631.976 |
Decimals in this table are rounded to \(3\) digits.
Step \(2\): Subtract Expected Frequencies from Observed Frequencies
Add a new column to your table called “O – E”. In this column, put the result of subtracting the expected frequency from the observed frequency:
Table 4. Table of observed and expected frequencies, Chi-Square test for homogeneity.
Table of Observed, Expected, and O – E Frequencies | |||||
---|---|---|---|---|---|
Living Arrangement | Status | Observed Frequency | Expected Frequency | O – E | |
House or Townhouse | Survived | 217 | 208.795 | 8.205 | |
Did Not Survive | 5314 | 5322.205 | -8.205 | ||
1st or 2nd Floor Apartment | Survived | 35 | 25.179 | 9.821 | |
Did Not Survive | 632 | 641.821 | -9.821 | ||
3rd or Higher Floor Apartment | Survived | 46 | 64.024 | -18.024 | |
Did Not Survive | 1650 | 1631.976 | 18.024 |
Decimals in this table are rounded to \(3\) digits.
Step \(3\): Square the Results from Step \(2\)Add another new column to your table called “(O – E)2”. In this column, put the result of squaring the results from the previous column:
Table 5. Table of observed and expected frequencies, Chi-Square test for homogeneity.
Table of Observed, Expected, O – E, and (O – E)2 Frequencies | |||||||
---|---|---|---|---|---|---|---|
Living Arrangement | Status | Observed Frequency | Expected Frequency | O – E | (O – E)2 | ||
House or Townhouse | Survived | 217 | 208.795 | 8.205 | 67.322 | ||
Did Not Survive | 5314 | 5322.205 | -8.205 | 67.322 | |||
1st or 2nd Floor Apartment | Survived | 35 | 25.179 | 9.821 | 96.452 | ||
Did Not Survive | 632 | 641.821 | -9.821 | 96.452 | |||
3rd or Higher Floor Apartment | Survived | 46 | 64.024 | -18.024 | 324.865 | ||
Did Not Survive | 1650 | 1631.976 | 18.024 | 324.865 |
Decimals in this table are rounded to \(3\) digits.
Step \(4\): Divide the Results from Step \(3\) by the Expected FrequenciesAdd a final new column to your table called “(O – E)2/E”. In this column, put the result of dividing the results from the previous column by their expected frequencies:
Table 6. Table of observed and expected frequencies, Chi-Square test for homogeneity.
Table of Observed, Expected, O – E, (O – E)2, and (O – E)2/E Frequencies | |||||||||
---|---|---|---|---|---|---|---|---|---|
Living Arrangement | Status | Observed Frequency | Expected Frequency | O – E | (O – E)2 | (O – E)2/E | |||
House or Townhouse | Survived | 217 | 208.795 | 8.205 | 67.322 | 0.322 | |||
Did Not Survive | 5314 | 5322.205 | -8.205 | 67.322 | 0.013 | ||||
1st or 2nd Floor Apartment | Survived | 35 | 25.179 | 9.821 | 96.452 | 3.831 | |||
Did Not Survive | 632 | 641.821 | -9.821 | 96.452 | 0.150 | ||||
3rd or Higher Floor Apartment | Survived | 46 | 64.024 | -18.024 | 324.865 | 5.074 | |||
Did Not Survive | 1650 | 1631.976 | 18.024 | 324.865 | 0.199 |
Decimals in this table are rounded to \(3\) digits.
Step \(5\): Sum the Results from Step \(4\) to get the Chi-Square Test StatisticFinally, add up all the values in the last column of your table to calculate your Chi-square test statistic:
\[ \begin{align}\chi^{2} &= \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \\&= 0.322 + 0.013 + 3.831 + 0.150 + 5.074 + 0.199 \\&= 9.589.\end{align} \]
The Chi-square test statistic for the Chi-square test for homogeneity in the heart attack survival study is:
\[ \chi^{2} = 9.589. \]
To determine whether the test statistic is large enough to reject the null hypothesis, you compare the test statistic to a critical value from a Chi-square distribution table. This act of comparison is the heart of the Chi-square test of homogeneity.
Follow the \(6\) steps below to perform a Chi-square test of homogeneity.
Steps \(1, 2\) and \(3\) are outlined in detail in the previous sections: “Chi-Square Test for Homogeneity: Null Hypothesis and Alternative Hypothesis”, “Expected Frequencies for a Chi-Square Test for Homogeneity”, and “How to Calculate the Test Statistic for a Chi-Square Test for Homogeneity”.
Step \(1\): State the Hypotheses
The alternative hypothesis is that the two variables are not from the same distribution, i.e., at least one of the null hypotheses is false.\[ \begin{align}H_{a}: p_{1,1} &\neq p_{2,1} \text{ OR } \\p_{1,2} &\neq p_{2,2} \text{ OR } \ldots \text{ OR } \\p_{1,n} &\neq p_{2,n}\end{align} \]
Step \(2\): Calculate the Expected Frequencies
Reference your contingency table to calculate the expected frequencies using the formula:
\[ E_{r,c} = \frac{n_{r} \cdot n_{c}}{n} \]
Step \(3\): Calculate the Chi-Square Test Statistic
Use the formula for a Chi-square test for homogeneity to calculate the Chi-square test statistic:
\[ \chi^{2} = \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \]
Step \(4\): Find the Critical Chi-Square Value
To find the critical Chi-square value, you can either:
use a Chi-square distribution table, or
use a critical value calculator.
No matter which method you choose, you need \(2\) pieces of information:
the degrees of freedom, \(k\), given by the formula:
\[ k = (r - 1) (c - 1) \]
and the significance level, \(\alpha\), which is usually \(0.05\).
Find the critical value of the heart attack survival study.
To find the critical value:
Table 7. Table of percentage points, Chi-Square test for homogeneity.
Percentage Points of the Chi-Square Distribution | |||||||||
---|---|---|---|---|---|---|---|---|---|
Degrees of Freedom (k) | Probability of a Larger Value of X2; Significance Level (α) | ||||||||
0.99 | 0.95 | 0.90 | 0.75 | 0.50 | 0.25 | 0.10 | 0.05 | 0.01 | |
1 | 0.000 | 0.004 | 0.016 | 0.102 | 0.455 | 1.32 | 2.71 | 3.84 | 6.63 |
2 | 0.020 | 0.103 | 0.211 | 0.575 | 1.386 | 2.77 | 4.61 | 5.99 | 9.21 |
3 | 0.115 | 0.352 | 0.584 | 1.212 | 2.366 | 4.11 | 6.25 | 7.81 | 11.34 |
Step \(5\): Compare the Chi-Square Test Statistic to the Critical Chi-Square Value
Is your test statistic large enough to reject the null hypothesis? To find out, compare it to the critical value.
Compare your test statistic to the critical value in the heart attack survival study:
The Chi-square test statistic is: \( \chi^{2} = 9.589 \)
The critical Chi-square value is: \( 5.99 \)
The Chi-square test statistic is greater than the critical value.
Step \(6\): Decide Whether to Reject the Null Hypothesis
Finally, decide if you can reject the null hypothesis.
If the Chi-square value is less than the critical value, then you have an insignificant difference between the observed and expected frequencies; i.e., \( p > \alpha \).
This means you do not reject the null hypothesis.
If the Chi-square value is greater than the critical value, then you have a significant difference between the observed and expected frequencies; i.e., \( p < \alpha \).
This means you have sufficient evidence to reject the null hypothesis.
Now you can decide whether to reject the null hypothesis for the heart attack survival study:
The Chi-square test statistic is greater than the critical value; i.e., the \(p\)-value is less than the significance level.
You conclude that there is a smaller chance of survival for those who suffer a heart attack and live on the third or higher floor of an apartment, and therefore reject the null hypothesis.
The \(p\)-value of a Chi-square test for homogeneity is the probability that the test statistic, with \(k\) degrees of freedom, is more extreme than its calculated value. You can use a Chi-square distribution calculator to find the \(p\)-value of a test statistic. Alternatively, you can use a chi-square distribution table to determine if the value of your chi-square test statistic is above a certain significance level.
At this point, you might ask yourself, what is the difference between a Chi-square test for homogeneity and a Chi-square test for independence?
You use the Chi-square test for homogeneity when you have only \(1\) categorical variable from \(2\) (or more) populations.
In this test, you randomly collect data from a population to determine if there is a significant association between \(2\) categorical variables.
When surveying students in a school, you might ask them for their favorite subject. You ask the same question to \(2\) different populations of students:
You use a Chi-square test for homogeneity to determine if the freshmen's preferences differed significantly from the seniors' preferences.
You use the Chi-square test for independence when you have \(2\) categorical variables from the same population.
In this test, you randomly collect data from each subgroup separately to determine if the frequency count differed significantly across different populations.
In a school, students could be classified by:
You use a Chi-square test for independence to determine if handedness is related to choice of study.
Continuing from the example in the introduction, you decide to find an answer to the question: do men and women have different movie preferences?
You select a random sample of \(400\) college freshmen: \(200\) men and \(300\) women. Each person is asked which of the following movies they like best: The Terminator; The Princess Bride; or The Lego Movie. The results are shown in the contingency table below.
Table 8. Contigency table, Chi-Square test for homogeneity.
Contingency Table | |||
---|---|---|---|
Movie | Men | Women | Row Totals |
The Terminator | 120 | 50 | 170 |
The Princess Bride | 20 | 140 | 160 |
The Lego Movie | 60 | 110 | 170 |
Column Totals | 200 | 300 | \(n =\) 500 |
Solution:
Step \(1\): State the Hypotheses.
Step \(2\): Calculate Expected Frequencies.
Table 9. Table of data for movies, Chi-Square test for homogeneity.
Movie | Men | Women | Row Totals |
The Terminator | 68 | 102 | 170 |
The Princess Bride | 64 | 96 | 160 |
The Lego Movie | 68 | 102 | 170 |
Column Totals | 200 | 300 | \(n =\) 500 |
Step \(3\): Calculate the Chi-Square Test Statistic.
Table 10. Table of data for movies, Chi-Square test for homogeneity.
Movie | Person | Observed Frequency | Expected Frequency | O-E | (O-E)2 | (O-E)2/E |
Terminator | Men | 120 | 68 | 52 | 2704 | 39.767 |
Women | 50 | 102 | -52 | 2704 | 26.510 | |
Princess Bride | Men | 20 | 64 | -44 | 1936 | 30.250 |
Women | 140 | 96 | 44 | 1936 | 20.167 | |
Lego Movie | Men | 60 | 68 | -8 | 64 | 0.941 |
Women | 110 | 102 | 8 | 64 | 0.627 |
Decimals in this table are rounded to \(3\) digits.
The formula here uses the non-rounded numbers from the table above to get a more accurate answer.
Step \(4\): Find the Critical Chi-Square Value and the \(P\)-Value.
Step \(5\): Compare the Chi-Square Test Statistic to the Critical Chi-Square Value.
Step \(6\): Decide Whether to Reject the Null Hypothesis.
you have sufficient evidence to reject the null hypothesis.
A chi-square test for homogeneity is a chi-square test that is applied to a single categorical variable from two or more different populations to determine whether they have the same distribution.
A chi-square test for homogeneity requires a categorical variable from at least two populations, and the data needs to be the raw count of members of each category. This test is used to check if the two variables follow the same distribution.
You use the chi-square test of homogeneity when you have only 1 categorical variable from 2 (or more) populations.
You use the chi-square test of independence when you have 2 categorical variables from the same population.
This test has the same basic conditions as any other Pearson chi-square test:
You use a T-Test to compare the mean of 2 given samples. When you don't know the mean and standard deviation of a population, you use a T-Test.
You use a Chi-Square test to compare categorical variables.
What is a Chi-square test for homogeneity used for?
A Chi-square test for homogeneity is a Chi-square test that is applied to a single categorical variable from two or more different populations to determine whether they have the same distribution.
A Chi-square test for homogeneity has the same basic assumptions as any other Pearson Chi-square test:
The variables must be categorical.
The approach to use a Chi-square test for homogeneity has six basic steps:
State the hypotheses.
Calculate the expected frequencies.
Calculate the Chi-square test statistic.
Find the critical Chi-square value.
Compare the Chi-square test statistic to the Chi-square critical value.
Decide whether to reject the null hypothesis.
What is the null hypothesis of a Chi-square test for homogeneity?
The null hypothesis is that the two variables are from the same distribution.
\[ \begin{align}
H_{0}: p_{1,1} &= p_{2,1} \text{ AND } \\
p_{1,2} &= p_{2,2} \text{ AND } \ldots \text{ AND } \\
p_{1,n} &= p_{2,n}
\end{align} \].
What is the alternative hypothesis of a Chi-square test for homogeneity?
The alternative hypothesis is that the two variables are not from the same distribution, i.e., at least one of the null hypotheses is false.
\[ \begin{align}
H_{a}: p_{1,1} &\neq p_{2,1} \text{ OR } \\
p_{1,2} &\neq p_{2,2} \text{ OR } \ldots \text{ OR } \\
p_{1,n} &\neq p_{2,n}
\end{align} \].
As with any statistical test, your analysis plan when doing a Chi-square test for homogeneity describes how you will use the sample data to either accept or reject the null hypothesis. Your plan should specify the following:
Significance level.
Already have an account? Log in
Open in AppThe first learning app that truly has everything you need to ace your exams in one place
Sign up to highlight and take notes. It’s 100% free.
Save explanations to your personalised space and access them anytime, anywhere!
Sign up with Email Sign up with AppleBy signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.
Already have an account? Log in
Already have an account? Log in
The first learning app that truly has everything you need to ace your exams in one place
Already have an account? Log in