Find study content
Learning Materials

Discover learning materials by subject, university or textbook.

Explanations
All Subjects

Anthropology

Archaeology

Architecture

Art and Design

Bengali

Biology

Business Studies

Chemistry

Chinese

Combined Science

Computer Science

Economics

Engineering

English

English Literature

Environmental Science

French

Geography

German

Greek

History

Hospitality and Tourism

Human Geography

Japanese

Italian

Law

Macroeconomics

Marketing

Math

Media Studies

Medicine

Microeconomics

Music

Nursing

Nutrition and Food Science

Physics

Politics

Polish

Psychology

Religious Studies

Sociology

Spanish

Sports Sciences

Translation
Features
Features

Discover all of these amazing features with a free account.

Flashcards

StudySmarter AI

Notes

Study Plans

Study Sets

Exams
What’s new?

Flashcards
Study your flashcards with three learning modes.

Study Sets
All of your learning materials stored in one place.

Notes
Create and edit notes or documents.

Study Plans
Organise your studies and prepare for exams.
Resources
Discover

All the hacks around your studies and career - in one place.

Find a job

Student Deals

Magazine

Mobile App
Featured

Magazine
Trusted advice for anyone who wants to ace their studies & career.

Job Board
The largest student job board with the most exciting opportunities.

StudySmarter Deals
Verified student deals from top brands.

Our App
Discover our mobile app to take your studies anywhere.

Learning Materials

Features

Discover

Chi Square Test for Independence

Say your city is trying to encourage its residents to recycle their household trash, so they come up with two methods for asking them to do so:

Get started

+ Add tag
Immunology
Cell Biology
Mo

True or False?All the Pearson Chi-square tests, for independence, homogeneity, and goodness of fit, share the same basic assumptions.

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

To be able to use this test, the assumptions for a Chi-square test of independence are:

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

True or False?The Chi-square test for independence makes no claims about the kind of relationship between the two categorical variables, only whether a relationship exists.

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

True or False?All the Pearson Chi-square tests, for independence, homogeneity, and goodness of fit, share the same basic assumptions.

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

To be able to use this test, the assumptions for a Chi-square test of independence are:

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

True or False?The Chi-square test for independence makes no claims about the kind of relationship between the two categorical variables, only whether a relationship exists.

Show Answer

Fact Checked Content
Last Updated: 06.01.2023
18 min reading time

Content creation process designed by
Content cross-checked by
Content quality checked by

mailing an educational pamphlet; and
calling each resident.

Then, the city randomly selects $200$ households and randomly assigns them to one of three categories:

receiving the pamphlet;
receiving a phone call;
the control group (no form of intervention).

Finally, the city will use the results of this test to decide what is the best way to ask their residents to recycle more.

Can you guess which hypothesis test they will use to make this decision? A Chi-square test for independence!

Chi-Square Test of Independence Definition

Occasionally, you want to know if there is a relationship between two categorical variables.

Think of it this way:

If you know something about one variable, can you use that information to learn about the other variable?

You can use a Chi-square test of independence to do just that.

A Chi-square $ (\chi^{2}) $ test of independence is a non-parametric Pearson Chi-square test that you can use to determine whether two categorical variables in a single population are related to each other or not.

If there is a relationship between the two categorical variables, then knowing the value of one variable tells you something about the value of the other variable.

If there is no relationship between the two categorical variables, then they are independent.

Assumptions for a Chi-Square Test of Independence

All the Pearson Chi-square tests, for independence, homogeneity, and goodness of fit, share the same basic assumptions. The main difference is how the assumptions apply in practice. To be able to use this test, the assumptions for a Chi-square test of independence are:

The two variables must be categorical.
- This Chi-square test uses cross-tabulation, counting observations that fall in each category.
Groups must be mutually exclusive; i.e., the sample is randomly selected.
- Continuing from the introductory example, three months after the city's intervention methods are tested, they look at the outcome and put the data into a contingency table. The groups that must be mutually exclusive are the subgroups: (Recycles-Pamphlet), (Does Not Recycle-Control), etc.

Table 1. Contingency table, Chi-square test for independence.

Contingency Table
Intervention	Recycles	Does Not Recycle	Row Totals
Pamphlet	46	18	56
Phone Call	47	19	77
Control	49	21	67
Column Totals	142	58	$n =$ 200

Expected counts must be at least $5$.
- This means the sample size must be large enough, but how large is difficult to determine beforehand. In general, making sure there are more than $5$ in each category should be fine.
Observations must be independent.
- This is about how the data is collected. In the city recycling example, the researcher should not sample houses that are near each other. That is, it is more likely that a street of household recycle than that households chosen from different neighborhoods recycle.

Null Hypothesis and Alternative Hypothesis for a Chi-Square Test of Independence

When it comes to independence of variables, you almost always assume that two variables are independent, then try to prove that they aren’t.

The null hypothesis is that the two categorical variables are independent, i.e., there is no association between them, they are not related.\[ H_{0}: \text{“Variable A” and “Variable B” are not related.} \]
The alternative hypothesis is that the two categorical variables are not independent, i.e., there is an association between them, they are related.\[ H_{a}: \text{“Variable A” and “Variable B” are related.} \]

Notice that the Chi-square test for independence makes no claims about the kind of relationship between the two categorical variables, only whether a relationship exists.

Replacing “Variable A” and “Variable B” with the variables in the city recycling example, you get:

Your population is all the households in your city.

Null Hypothesis \[ \begin{align}H_{0}: &\text{“if a household recycles” and} \\&\text{“the type of intervention received”} \\&\text{are not related.}\end{align} \]
Alternative Hypothesis \[ \begin{align}H_{a}: &\text{“if a household recycles” and} \\&\text{“the type of intervention received”} \\&\text{are related.}\end{align} \]

Expected Frequencies of a Chi-Square Test of Independence

As with other Chi-square tests, a Chi-square test of independence works by comparing your observed and expected frequencies. You calculate expected frequencies using the contingency table. So, the expected frequency for row $r$ and column $c$ is given by the formula:

\[ E_{r,c} = \frac{n_{r} \cdot n_{c}}{n} \]

where,

$E_{r,c}$ is the expected frequency for population (or, row) $r$ at level (or, column) $c$ of the categorical variable,
$r$ is the number of populations, which is also the number of rows in a contingency table,
$c$ is the number of levels of the categorical variable, which is also the number of columns in a contingency table,
$n_{r}$ is the number of observations from population (or, row) $r$,
$n_{c}$ is the number of observations from level (or, column) $c$ of the categorical variable, and
$n$ is the total sample size.

Continuing with the city recycling example:

Your city now calculates the expected frequencies using the formula above and the contingency table.

$E_{1,1}=\frac{56 \cdot 142}{200} = 39.76$
$E_{1,2}=\frac{56 \cdot 58}{200} = 16.24$
$E_{2,1}=\frac{77 \cdot 142}{200} = 54.67$
$E_{2,2}=\frac{77 \cdot 58}{200} = 22.33$
$E_{3,1}=\frac{67 \cdot 142}{200} = 47.57$
$E_{3,2}=\frac{67 \cdot 58}{200} = 19.43$

Table 2. Contingency table with observed frequencies and expected frequencies, Chi-square test for independence.

Contingency Table with Observed (O) Frequencies and Expected Frequencies (E)
Intervention	Recycles	Does Not Recycle	Row Totals
Pamphlet	O_1,1 = 46E_1,1 = 39.76	O_1,2 = 18E_1,2 = 16.24	56
Phone Call	O_2,1 = 47E_2,1 = 54.67	O_2,2 = 19E_2,2 = 22.33	77
Control	O_3,1 = 49E_3,1 = 47.57	O_3,2 = 21E_3,2 = 19.43	67
Column Totals	142	58	$n =$ 200

Degrees of Freedom for a Chi-Square Test of Independence

Like in the Chi-square test for homogeneity, you are comparing two variables and need the contingency table to add up in both dimensions.

The formula for the degrees of freedom is the same in both the homogeneity and independence tests:

\[ k = (r - 1) (c - 1) \]

where,

$k$ is the degrees of freedom,
$r$ is the number of populations, which is also the number of rows in a contingency table, and
$c$ is the number of levels of the categorical variable, which is also the number of columns in a contingency table.

Formula for a Chi-Square Test of Independence

The formula (also called a test statistic) for a Chi-square test of independence is:

\[ \chi^{2} = \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \]

where,

$O_{r,c}$ is the observed frequency for population $r$ at level $c$, and
$E_{r,c}$ is the expected frequency for population $r$ at level $c$.

The Chi-square test statistic measures how much your observed frequencies differ from your expected frequencies if the two variables are unrelated.

Steps to Calculate the Test Statistic for a Chi-Square Test of Independence

Step $1$: Create a Table

Using your contingency table, create a table that separates your observed and expected values into two columns.

Table 3. Table of observed frequencies and expected frequencies, Chi-square test for independence.

Table of Observed and Expected Frequencies
Intervention	Outcome	Observed Frequency	Expected Frequency
Pamphlet	Recycles	46	39.76
Pamphlet	Does Not Recycle	18	16.24
Phone Call	Recycles	47	54.67
Phone Call	Does Not Recycle	19	22.33
Control	Recycles	49	47.57
Control	Does Not Recycle	21	19.43

Step $2$: Subtract Expected Frequencies from Observed Frequencies

Add a new column to your table called “O – E”. In this column, put the result of subtracting the expected frequency from the observed frequency.

Table 4. Table of observed frequencies and expected frequencies, Chi-square test for independence.

Table of Observed, Expected, and O-E Frequencies
Intervention	Outcome	Observed Frequency	Expected Frequency	O – E
Pamphlet	Recycles	46	39.76	6.24
Pamphlet	Does Not Recycle	18	16.24	1.76
Phone Call	Recycles	47	54.67	-7.67
Phone Call	Does Not Recycle	19	22.33	-3.33
Control	Recycles	49	47.57	1.43
Control	Does Not Recycle	21	19.43	1.57

Decimals in this table are rounded to $2$ digits.

Step $3$: Square the Results from Step $2$

Add a new column to your table called “(O – E)²”. In this column, put the result of squaring the results from the previous column.

Table 5. Table of observed frequencies and expected frequencies, Chi-square test for independence.

Table of Observed, Expected, O-E, and (O-E)² Frequencies
Intervention	Outcome	Observed Frequency	Expected Frequency	O – E	(O – E)²
Pamphlet	Recycles	46	39.76	6.24	38.94
Pamphlet	Does Not Recycle	18	16.24	1.76	3.10
Phone Call	Recycles	47	54.67	-7.67	58.83
Phone Call	Does Not Recycle	19	22.33	-3.33	11.09
Control	Recycles	49	47.57	1.43	2.04
Control	Does Not Recycle	21	19.43	1.57	2.46

Decimals in this table are rounded to $2$ digits.

Step $4$: Divide the Results from Step $3$ by the Expected Frequencies

Add a new column to your table called “(O – E)²”/E. In this column, put the result of dividing the results from the previous column by their expected frequencies.

Table 6. Table of observed frequencies and expected frequencies, Chi-square test for independence.

Table of Observed, Expected, O-E, (O-E)², and (O-E)²/E Frequencies
Intervention	Outcome	Observed Frequency	Expected Frequency	O – E	(O – E)²	(O – E)²/E
Pamphlet	Recycles	46	39.76	6.24	38.94	0.98
Pamphlet	Does Not Recycle	18	16.24	1.76	3.10	0.19
Phone Call	Recycles	47	54.67	-7.67	58.83	1.08
Phone Call	Does Not Recycle	19	22.33	-3.33	11.09	0.50
Control	Recycles	49	47.57	1.43	2.04	0.04
Control	Does Not Recycle	21	19.43	1.57	2.46	0.13

Decimals in this table are rounded to $2$ digits.

Step $5$: Add the Results from Step $4$ to get the Chi-Square Test Statistic

Finally, add up all the values in the last column of your table to calculate your Chi-square test statistic:

\[ \begin{align}\chi^{2} &= \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \\&= 0.9793 + 0.1907 + 1.0761 + 0.4966 + 0.04299 + 0.1269 \\&= 2.91259\end{align} \]

The formula here uses the non-rounded numbers from the tables above to get a more accurate answer.

The Chi-square test statistic for the Chi-square test of independence in the city recycling example is:

\[ \chi^{2} = 2.91259 \]

Steps to Perform a Chi-Square Test of Independence

If your calculated test statistic is large enough, then you can draw the conclusion that the observed frequencies are not what you would expect if the variables are indeed unrelated. But what is considered “large enough”?

To determine whether the test statistic is large enough to reject the null hypothesis, you compare the test statistic to a critical value from a Chi-square distribution table. This act of comparison is the heart of the Chi-square test of independence.

Follow the $6$ steps below to perform a Chi-square test of independence.

Note that steps $1, 2$ and $3$ were outlined in detail above.

Step $1$: State the Hypotheses

The null hypothesis is that the two categorical variables are independent, i.e., there is no association between them, they are not related.\[ H_{0}: \text{“Variable A” and “Variable B” are not related.} \]
The alternative hypothesis is that the two categorical variables are not independent, i.e., there is an association between them, they are related.\[ H_{a}: \text{“Variable A” and “Variable B” are related.} \]

Step $2$: Calculate the Expected Frequencies

Use your contingency table to calculate the expected frequencies using the formula:

\[ E_{r,c} = \frac{n_{r} \cdot n_{c}}{n} \]

Step $3$: Calculate the Chi-Square Test Statistic

Use the formula for a Chi-square test of independence to calculate the Chi-square test statistic:

\[ \chi^{2} = \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \]

Step $4$: Find the Critical Chi-Square Value

You have two options for finding the critical value:

use a Chi-square distribution table, or
use a critical value calculator.

Either way, there are two pieces of information you need to know to find the critical value:

the degrees of freedom, $k$, given by the formula:
\[ k = (r - 1) (c - 1) \]
and the significance level, $ \alpha $, which is usually $ 0.05 $.

Referring back to the city recycling example, find the critical value.

Find the critical Chi-square value.

Calculate the degrees of freedom.
- Using the contingency table for the city recycling example, recall that there are $3$ intervention groups (the rows of the contingency table) and $2$ outcome groups (the columns of the contingency table). So, the degrees of freedom are:\[ \begin{align} k &= (r - 1) (c - 1) \\&= (3 - 1) (2 - 1) \\&= 2 \text{ degrees of freedom}\end{align} \]
Choose a significance level.
- Typically, a significance level of $ 0.05 $ is used, so use that here.
Using either a Chi-square distribution table or a critical value calculator, determine the critical value.
- According to the Chi-square distribution table below, for $k = 2$ and $ \alpha = 0.05 $, the critical value is:\[ \chi^{2} \text{critical value} = 5.99 \]

Table 7. Percentage of points, Chi-square test for independence.

Percentage Points of the Chi-Square Distribution
Degrees of Freedom (k)	Probability of a Larger Value of X²; Significance Level (α)
Degrees of Freedom (k)	0.99	0.95	0.90	0.75	0.50	0.25	0.10	0.05	0.01
1	0.000	0.004	0.016	0.102	0.455	1.32	2.71	3.84	6.63
2	0.020	0.103	0.211	0.575	1.386	2.77	4.61	5.99	9.21
3	0.115	0.352	0.584	1.212	2.366	4.11	6.25	7.81	11.34

Step $5$: Compare the Chi-Square Test Statistic to the Critical Chi-Square Value

Now for the moment of truth! Is your test statistic large enough to reject the null hypothesis? Compare it to the critical value you just found to find out.

Again, continuing with the city recycling example, compare the test statistic to the critical value.

The Chi-square test statistic is: $ \chi^{2} = 2.91259 $

The critical value is: $ 5.99 $

The Chi-square test statistic is less than the critical value.

Step $6$: Decide Whether to Reject the Null Hypothesis

Finally, decide whether to reject the null hypothesis.

If the Chi-square value is greater than the critical value, then the difference between the observed and expected frequencies is significant; $ (p < \alpha) $
- This means you reject the null hypothesis that the variables are unrelated, and you have support that the alternative hypothesis is true.
If the Chi-square value is less than the critical value, then the difference between the observed and expected frequencies is not significant; $ (p > \alpha) $
- This means you do not reject the null hypothesis, but you do not have support that the alternative hypothesis is true.

Decide whether to reject the null hypothesis for the city recycling example.

The Chi-square value is less than the critical value.

So, the city does not reject the null hypothesis that whether a household recycles and the type of intervention they receive are unrelated.
There is not a significant difference between the observed frequencies and the expected frequencies. This suggests that the proportion of households that recycle is the same for all interventions.

The city concludes that their interventions do not have an effect on whether households choose to recycle.

Using Critical Value VS Using P-Value

In the steps to perform a Chi-square test of independence, you calculated and used the critical value to decide whether to reject the null hypothesis.

A critical value of a Chi-square test of independence is a value that is compared to the value of the test statistic, so you can determine whether to reject the null hypothesis.

It is important to know, however, that there is another option you can use: the $p$-value.

The $p$-value of a Chi-square test of independence is associated with the calculated value of its test statistic. It is the area to the right of the $ \chi^{2} $ under the chi square curve, and it has $k$ degrees of freedom.

The image below sums up the critical value approach vs. the $p$-value approach.

$Chi-Square Test for Independence, Figure 1. A diagram showing how you can use either a p-value or a critical value to determine whether to reject the null hypothesis. StudySmarter$ Figure 1. A diagram showing how you can use either a $p$-value or a critical value to determine whether to reject the null hypothesis.

Chi-Square Test for Independence – Example

Many jobseekers are applying via online job boards these days. Sites like Indeed, ZipRecruiter, and CareerBuilder have thousands of enticing posts inviting people to apply. It’s never been easier for fraudulent recruiters to lure in unsuspecting and vulnerable people.

Are fraudulent recruiters more prevalent in some industries than others?

The contingency table below contains real counts of fraudulent and non-fraudulent online job openings, by industry. These are the $10$ most common industries in the dataset. This is quite a big dataset, but a good representation of what statisticians do in the real world.

Table 7. Contingency table, Chi-square test for independence.

Contingency Table
Industry	Real	Fraud	Row Totals
Information Technology	1702	32	1734
Computer Software	1371	5	1376
Internet	1062	0	1062
Marketing / Advertising	783	45	828
Education	822	0	822
Financial Services	744	35	779
Healthcare	446	51	497
Consumer Services	334	24	358
Telecom.	316	26	342
Oil / Energy	178	109	287
Column Totals	7758	327	$n=$ 8085

Solution:

Step $1$: State the Hypotheses.

The null hypothesis is that the two categorical variables are independent, i.e., there is no association between them, they are not related.\[ H_{0}: \text{“if a job post is real” and “the job industry” are not related.} \]
The alternative hypothesis is that the two categorical variables are not independent, i.e., there is an association between them, they are related.\[ H_{a}: \text{“if a job post is real” and “the job industry” are related.} \]

Step $2$: Calculate Expected Frequencies.

Using the contingency table above and the formula:\[ E_{r,c} = \frac{n_{r} \cdot n_{c}}{n}, \]create a table that has your calculated expected frequencies.

Table 7. Table of expected frequencies, Chi-square test for independence.

Table of Expected Frequencies
Industry	Real	Fraud	Row Totals
Information Technology	1663.8679	70.1321	1734
Computer Software	1320.3473	55.6527	1376
Internet	1019.0471	42.9529	1062
Marketing / Advertising	794.5113	33.4887	828
Education	788.754	33.246	822
Financial Services	747.4931	31.5069	779
Healthcare	476.8987	20.1013	497
Consumer Services	343.5206	14.4794	358
Telecom.	328.1677	13.8323	324
Oil / Energy	275.3922	11.6078	287
Column Totals	7758	327	$n =$ 8085

Step $3$: Calculate the Chi-Square Test Statistic.

Create a table to hold your calculated values and use the formula:\[ \chi^{2} = \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \]to calculate your test statistic.

Table 7. Chi-square test statistics.

Using a Table to Calculate the Chi-Square Test Statistic
Industry	Job Post Status	Observed Frequency	Expected Frequency	O – E	(O – E)²	(O – E)²/E
Information Technology	Real	1702	1633.868	68.132	4641.983	2.841
Information Technology	Fraud	32	70.132	-38.132	1454.057	20.733
Computer Software	Real	1371	1320.347	50.653	2565.696	1.943
Computer Software	Fraud	5	55.653	-50.653	2565.696	46.102
Internet	Real	1062	1019.047	42.953	1844.952	1.811
Internet	Fraud	0	42.953	-42.953	1844.952	42.953
Marketing / Advertising	Real	783	794.511	-11.511	132.510	0.167
Marketing / Advertising	Fraud	45	33.4888	11.511	132.510	3.957
Education	Real	822	788.754	33.246	1105.297	1.401
Education	Fraud	0	33.246	-33.246	1105.297	33.246
Financial Services	Real	744	747.493	-3.493	12.202	0.016
Financial Services	Fraud	35	31.507	3.493	12.202	0.387
Healthcare	Real	446	476.899	-30.899	954.730	2.002
Healthcare	Fraud	51	20.101	30.899	954.730	47.496
Consumer Services	Real	334	343.521	-9.521	90.642	0.264
Consumer Services	Fraud	24	14.479	9.521	90.642	6.260
Telecom.	Real	316	328.168	-12.168	148.053	0.451
Telecom.	Fraud	26	13.832	12.168	148.053	10.703
Oil / Energy	Real	178	275.392	-97.392	9485.241	34.443
Oil / Energy	Fraud	109	11.608	97.392	9485.241	817.144

Decimals in this table are rounded to $3$ digits.

Add all the values in the last column of the table above to calculate the test statistic:\[ \begin{align}\chi^{2} &= 2.8411 + 20.7331 + 1.9432 + 46.1019 + 1.8105 \\&+ 42.9529 + 0.1668 + 3.9569 + 1.4013 + 33.246 \\&+ 0.0163 + 0.3873 + 2.0020 + 47.4959 + 0.2639 \\&+ 6.2601 + 0.4512 + 10.7034 + 34.4427 + 817.1437 \\&= 1074.319971.\end{align} \]
The formula here uses the non-rounded numbers from the table above to get a more accurate answer.
The Chi-square test statistic is:\[ \chi^{2} = 1074.319971 .\]

Step $4$: Find the Critical Chi-Square Value and the $P$-Value.

In the real world, a statistician would likely be more interested in calculating the $p$-value than simply reporting whether there was a significant result, but people much prefer to get a more specific conclusion. Say you want to be really sure that there is a relationship before you report one, and choose a significance level of $\alpha = 0.01$.

Calculate the degrees of freedom: \[ \begin{align}k &= (r - 1)(c - 1) \\&= (2 - 1) (10 - 1) \\&= 1 \cdot 9 \\&= 9 \text{ degrees of freedom}\end{align} \]
Using a Chi-square distribution table, look at the row for $9$ degrees of freedom and the column for $0.01$ significance to find the critical value of $21.67$.
To use a $p$-value calculator, you need the test statistic and degrees of freedom.
- Plugging the degrees of freedom and the test statistic into a $p$-value calculator, you get a $p$-value very close to $0$.

Step $5$: Compare the Chi-Square Test Statistic to the Critical Chi-Square Value.

The test statistic of $1074.319971$ is much, much larger than the critical value of $21.67$, which means you have sufficient evidence to reject the null hypothesis.
The $p$-value is also very low, much less than the significance level, which would also let you reject the null hypothesis.

Step $6$: Decide Whether to Reject the Null Hypothesis.

It looks like there is a strong relationship between industry and the number of fraudulent recruiters out there.
Look at the table from step $2$.
- Here, you can see that the number of fraudulent jobs in the Oil industry is way higher than expected, and by itself contributes enough for you to conclude that industry and recruiter scams are not independent.

Therefore, you can confidently reject the null hypothesis.

Chi-Square Test for Independence – Key takeaways

A Chi-square test of independence is a non-parametric Pearson Chi-square test that you can use to determine whether two categorical variables in a single population are related to each other or not.
The following must be true in order to use a Chi-square test of independence:
- The two variables must be categorical.
- Groups must be mutually exclusive; i.e., the sample is randomly selected.
- Expected counts must be at least $5$.
- Observations must be independent.
The null hypothesis is that the two categorical variables are independent, i.e., there is no association between them, they are not related.
The alternative hypothesis is that the two categorical variables are not independent, i.e., there is an association between them, they are related.
The expected frequency for row $r$ and column $c$ of a Chi-square test of independence is given by the formula:
\[ E_{r,c} = \frac{n_{r} \cdot n_{c}}{n} \]
The degrees of freedomfor a Chi-square test of independence is given by the formula:
\[ k = (r - 1) (c - 1) \]
The formula (also called a test statistic) for a Chi-square test of independence is:
\[ \chi^{2} = \sum \frac{(O_{r,c} - E_{r,c})^{2}}{E_{r,c}} \]

Flashcards in Chi Square Test for Independence

Start learning

True or False?

All the Pearson Chi-square tests, for independence, homogeneity, and goodness of fit, share the same basic assumptions.

True

To be able to use this test, the assumptions for a Chi-square test of independence are:

The two variables must be categorical.

True or False?

The Chi-square test for independence makes no claims about the kind of relationship between the two categorical variables, only whether a relationship exists.

True

Already have an account? Log in

Frequently Asked Questions about Chi Square Test for Independence

What are the requirements for the chi-square test for independence?

The following requirements must be met if you want to perform a chi-square test for independence:

The variables must be categorical.
Groups must be mutually exclusive.
Expected counts must be at least 5.
Observations must be independent.

What is a chi square test for independence?

A chi square test of independence is a non-parametric Pearson chi square test that you can use to determine whether two categorical variables are related to each other or not.

When to use chi square test for independence?

You use a chi square test for independence when you meet all the following:

You want to test a hypothesis about the relationship between two categorical variables.
The sample was randomly selected.
There is a minimum of 5 observations expected in each combined group.

How many variables does a chi-square test of independence have?

A chi-square test of independence has two categorical variables.

Is a chi-square test of independence a non parametric test?

Yes, along with all other chi-square tests, the chi-square test of independence is a non parametric test.

Save Article

How we ensure our content is accurate and trustworthy?

At StudySmarter, we have created a learning platform that serves millions of students. Meet the people who work hard to deliver fact based content as well as making sure it is verified.

Content Creation Process:

Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.

Get to know Lily

Content Quality Monitored by:

Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.

Get to know Gabriel

Discover learning materials with the free StudySmarter app

About StudySmarter

StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

Learn more

StudySmarter Editorial Team

Team Math Teachers