|
|
Chi-Square Distribution

Have you ever gone shopping to buy yourself new clothes and noticed how the prices differed between stores? Have you wondered if lottery numbers are evenly distributed, or if some numbers occur with a higher frequency? Or, have you thought about if a coffee vending machine dispenses the same amount of coffee each time? You can answer these questions by using a specific hypothesis test.

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Chi-Square Distribution

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

Have you ever gone shopping to buy yourself new clothes and noticed how the prices differed between stores? Have you wondered if lottery numbers are evenly distributed, or if some numbers occur with a higher frequency? Or, have you thought about if a coffee vending machine dispenses the same amount of coffee each time? You can answer these questions by using a specific hypothesis test.

Chi-Square Distribution, Figure 1. The chi-square distribution is useful for finding a relationship between two things; like clothing prices at different stores. StudySmarterFigure 1. The chi-square distribution is useful for finding a relationship between two things; like clothing prices at different stores.

In this article, you will learn about a new type of distribution to answer questions such as these – the chi-square distribution. You will study the chi-square distribution formula, the properties of a chi-square distribution, chi-square distribution tables, and work through several chi-square distribution examples. You will also be introduced to the major applications of the chi-square distribution, including:

  • the chi-square test for goodness of fit – which tells you if data fit a certain distribution, like in the lottery number example.

  • the chi-square test for homogeneity – which tells you if two populations have the same distribution, like in the clothing example.

  • the chi-square test for independence – which tests for variability, like in the coffee machine example.

each of which has an article of its own.

Chi-Square Distribution Definition

What happens when you square a normally distributed random variable? You know the probability distribution of the random variable itself, but what does it tell you about the distribution of the squared random variable? That question led to the discovery of the chi-square distribution, and it turns out to be useful in a wide variety of contexts.

A Chi-Square \( (\chi^{2}) \) Distribution is a continuous probability distribution of the sum of squared, independent, standard normal random variables that is widely used in hypothesis tests.

The chi-square distribution is the basis for three chi-square tests:

  1. the chi-square test for goodness of fit – allowing you to compare observed probability distributions to expected distributions,

  2. the chi-square test for independence and the chi-square test for homogeneity;

    • the chi-square test for independence allows you to test the independence of categorical variables, and

    • the chi-square test for homogeneity allows you to test if two categorical variables follow the same probability distribution

  3. the test of a single variance – allowing you to estimate the sampling distribution of the variance.

The basic shape of a chi-square distribution is determined by its degrees of freedom, denoted by \(k\).

The degrees of freedom, \(k\), are the number of values that are free to vary.

Let's take a look at an example.

Say you have \(4\) numbers that add up to \(1\):

\[ X_{1} + X_{2} + X_{3} + X_{4} = 1 \]

How many of the \(X\) values are free to vary?

Solution:

The answer is \(3\) because if you know \(3\) of the numbers, then you can solve for the \(4^{th}\) one:

\[ X_{4} = 1 - (X_{1} + X_{2} + X_{3}) \]

So, this example has \(3\) degrees of freedom.

In practice, the degrees of freedom, \(k\), is often one less than the number of observations.

The following graph illustrates examples of chi-square distributions with differing values of \(k\).

Chi-Square Distribution, Figure 2. A comparison of Chi-Square Distributions with varying degrees of freedom. StudySmarterFigure 2. A comparison of Chi-Square Distributions with varying degrees of freedom.

Because very few real-world observations follow a chi-square distribution, the main purpose of a chi-square distribution is hypothesis testing.

The Chi-Square Distribution's Relationship to the Standard Normal Distribution

The reason a chi-square distribution is useful for hypothesis testing is because of how closely it is related to the standard normal distribution: a normal distribution whose mean is \(0\) and variance is \(1\). Let's walk through this relationship.

Say you take a random sample of a standard normal distribution, \(Z\). If you square all the values in your sample, you now have the chi-square distribution with one degree of freedom, or \( k = 1 \). So, mathematically, you represent this as:

\[ \chi_{1}^{2} = Z^{2} \]

Now, say you want to take random samples from \(2\) independent standard normal distributions, \( Z_{1} \) and \( Z_{2} \). If you square each sample and add them together every time you sample a pair of values, you have the chi-square distribution with two degrees of freedom, or \( k = 2 \). You represent this mathematically as:

\[ \chi_{2}^{2} = (Z_{1})^{2} + (Z_{2})^{2} \]

Continuing with this pattern, in general, if you take random samples from \(k\) independent standard normal distributions and then square and sum those values, you get a chi-square distribution with \(k\) degrees of freedom. Again, this is represented mathematically as:

\[ \chi_{k}^{2} = (Z_{1})^{2} + (Z_{2})^{2} + \ldots + (Z_{k})^{2} \]

In summary, a common use of a chi-square distribution is to find the sum of squared, normally distributed, random variables. So, if \( Z_{i} \) represents a normally distributed random variable, then:

\[ \sum_{i=1}^{k} Z_{i}^{2} \sim \chi^{2}_{k} \]

Chi-Square Distribution Formula

Chi-square tests are hypothesis tests whose test statistics follow a chi-square distribution under the null hypothesis. The first, and most widely used, chi-square test to be discovered was Pearson's chi-square test.

Pearson's chi-square distribution formula (a.k.a. statistic, or test statistic) is

\[ \chi^{2} = \sum \frac{(O-E)^{2}}{E} \]

where,

  • \( \chi^{2} \) is the chi-square test statistic

  • \( \sum \) is the summation operator

  • \( O \) is the observed frequency

  • \( E \) is the expected frequency

If you take many samples from a population and calculate Pearson's chi-square test statistic for each sample, the test statistic will follow a chi-square distribution; provided the null hypothesis is true.

Mean of a Chi-Square Distribution

The mean of a chi-square distribution is the degrees of freedom:\[ \mu \left[ \chi^{2} \right] = k. \]

Variance of a Chi-Square Distribution

The variance of a chi-square distribution is twice the degrees of freedom:\[ \sigma^{2} \left[ \chi^{2} \right] = 2k. \]

Mode of a Chi-Square Distribution

The mode of a chi-square distribution is the degrees of freedom minus two (when \( k \geq 2 \)):\[ \text{mode} \left[ \chi^{2} \right] = k - 2, \text{ if } k \geq 2 \]

Standard Deviation of a Chi-Square Distribution

The standard deviation of a chi-square distribution is the square-root of twice the degrees of freedom:

\[ \sigma \left[ \chi^{2} \right] = \sqrt{2k} \]

Properties of a Chi-Square Distribution

The chi-square distribution has several properties that make it easy to work with and well-suited for hypothesis testing:

  • A chi-square distribution is a continuous distribution.

  • A chi-square distribution is defined by a single parameter: the degrees of freedom, \(k\).

  • The sum of independent chi-square random variables is also a chi-square random variable, with the degrees of freedom of the sum being the sum of the degrees of freedom:\[ \chi^{2}_{k_{1}} + \chi^{2}_{k_{2}} + \sim \chi^{2}_{k_{1} + k_{2}} \]

Range of a Chi-Square Distribution

A chi-square distribution is never negative. This is easiest to see in the ratio-of-variances formula. Since both the top and bottom of the fraction are positive, the ratio can never be negative. In other words:

\[ \frac{(n-1)s^{2}}{\sigma^{2}} \geq 0 \]

  • This means the range is:\[ \text{range} \left[ \chi^{2} \right] = 0 \to \infty \]

Symmetry of a Chi-Square Distribution

The constraint that a chi-square distributed random variable can never be negative means that a chi-square distribution cannot be symmetrical. It is a non-symmetric distribution. However, a chi-square distribution becomes increasingly symmetrical as \(k\) increases.

Shape of a Chi-Square Distribution

The shape of a chi-square distribution depends on the degrees of freedom, \(k\). As the value of \(k\) increases, the chi-square distribution more closely resembles the bell-curve of a normal distribution. This is because, while a chi-square distribution can never be negative, it goes all the way to infinity in the positive direction. In statistical terms, you say that a chi-square distribution is skewed to the right because the right tail is longer than the left.

The skewness of a chi-square distribution is equal to:

\[ \text{Skewness} \left[ \chi^{2} \right] = \sqrt{\frac{8}{k}} \]

This means the mean of a chi-square distribution is greater than the median and the mode. As \(k\) gets increasingly large, the number under the square root gets closer and closer to zero, so the skewness of the distribution approaches zero as \(k\) approaches infinity.

When a Chi-Square Distribution has one or two Degrees of Freedom

When a chi-square distribution has only one or two degrees of freedom ( \( k = 1 \) or \( k = 2 \) ) it is shaped like a backwards "J".

Chi-Square Distribution, Figure 3. Graphs of a Chi-Square Distribution when it has one and two degrees of freedom. StudySmarterFigure 3. Graphs of a Chi-Square Distribution when it has one and two degrees of freedom.

Because of the shape of these chi-square distributions, it means that there is a high probability that \( \chi^{2} \) is close to zero.

When a Chi-Square Distribution has three or more Degrees of Freedom

When a chi-square distribution has three or more degrees of freedom ( \( k \geq 3 \) ), it takes on a bump-shape that has a peak in the middle that more closely resembles a normal distribution. This means there is a low probability that \( \chi^{2} \) is either very close to or very far from zero.

When \(k\) is only slightly larger than \(2\), the chi-square distribution has a much longer right tail than left tail; that is, it is strongly right-skewed.

Chi-Square Distribution, Figure 4. Graphs of a Chi-Square Distribution when it has three and five degrees of freedom. StudySmarterFigure 4. Graphs of a Chi-Square Distribution when it has three and five degrees of freedom.

Remember that the mean of a chi-square distribution is the degrees of freedom, then notice that the peak is always to the left of the mean. Also notice that the left tail ends at zero, but the right tail goes on forever. The peak of the distribution can never truly be in the middle; it always must be left of center (because half of infinity is also infinite).

As the degrees of freedom, \(k\), gets larger and larger, the skew of the distribution gets smaller and smaller. As the degrees of freedom approaches infinity, the distribution approaches that of a normal distribution.

Chi-Square Distribution, Figure 5. A Chi-Square Distribution that has 90 degrees of freedom. At this point, you can use a normal distribution as a good approximation of the chi-square distribution. StudySmarterFigure 5. A Chi-Square Distribution that has \(90\) degrees of freedom. At this point, you can use a normal distribution as a good approximation of the chi-square distribution.

In fact, when \( k \geq 90 \), you can consider a normal distribution as a good approximation of the chi-square distribution.

Chi-Square Distribution Tables

Nowadays, many calculators and any statistical software can calculate a chi-square distribution. But before that software was ubiquitous, people needed an easy way to approximate that value. That’s why they created the chi-square table. The chi-square distribution table is a reference tool you can use to find chi-square critical values.

A chi-square critical value is a threshold for statistical significance for hypothesis tests. It also defines confidence intervals for certain parameters.

Below is an example of a chi-square distribution table for \(1-5\) degrees of freedom.

Percentage Points of the Chi-Square Distribution
Degrees of Freedom (k)Probability of a Larger Value of X2; Significance Level (α)
0.990.950.900.750.500.250.100.050.01
10.0000.0040.0160.1020.4551.322.713.846.63
20.0200.1030.2110.5751.3862.774.615.999.21
30.1150.3520.5841.2122.3664.116.257.8111.34
40.2970.7111.0641.9233.3575.397.789.4913.28
50.5541.1451.6102.6754.3516.639.2411.0715.09

Table 1: Table data, example chi-square distribution.

The leftmost column tells you the degrees of freedom, \(k\). The top row tells you the complement of the probability of a larger value of \( \chi^{2} \) you’re interested in; that is, if you want a probability of \(0.9\), you look in the column labelled \(0.1\). The number in each cell is the critical value at which the chi-square distribution has that much probability remaining to the right.

Let's walk through an example.

Say you have a chi-square distribution with \(6\) degrees of freedom, and you want to know the value at which your chi-square distribution reaches \(5\%\) probability. How can you use a chi-square distribution table to do this?

Solution:

  1. You are given the fact that you have \(6\) degrees of freedom, so choose the row that matches that; row \(6\).
  2. You want to know the value at which your chi-square distribution reaches \(5\%\) probability, so you choose the column that is the complement to \(5\% = 0.05\). This is column \(0.95\).
  3. With the row and column specified, identify the critical value in the cell. The critical value is \(1.635\).
    • This means that your chi-square distribution reaches \(5\%\) probability when it becomes greater than or equal to \(1.635\).
      • Mathematically, you write:\[ P(\chi_{6}^{2} \geq 1.635) = 0.95 \]

These tables are most important for hypothesis testing. If your test statistic is greater than the number in the appropriate cell, that means you have found evidence to reject the null hypothesis. See the article on Chi-Square Tests for more information.

Applications of the Chi-Square Distribution

What are some common applications of the chi-square distribution? Well, the chi-square distribution appears in many statistical tests and theories. Below are some of the most common.

Population Variance Inferences

A major motivation for the chi-square distribution is drawing inference about the population standard deviation \( (\sigma) \) or variance \( (\sigma^{2}) \) from a relatively small sample. Using a chi-square distribution, you can test the hypothesis that a population's variance is equal to a specific value by using the test of a single variance. Alternatively, you could calculate confidence intervals for the population's variance.

A large union strives to make sure that all employees who are at the same level of seniority get paid similar salaries. Their goal: a standard deviation in hourly salary that is less than \($3\).

  • To test if they have achieved their goal, the union randomly samples \(30\) employees who are at the same level of seniority. They find that the standard deviation of their sample is \($2.95\), which is just slightly less than their goal of \($3\).

Is this enough evidence to conclude that the true standard deviation of all employees who are at the same level of seniority is less than \($3\)?

Solution:

To find out, the union should use the test of a single variance to determine if the standard deviation is significantly different from \($3\), using:

  • a null hypothesis of \( H_{0}: \sigma^{2} \geq 3^{2} \) and

  • an alternative hypothesis of \( H_{a}: \sigma^{2} < 3^{2} \).

Then, by comparing a chi-square test statistic to the appropriate chi-square distribution, the union can decide if it is appropriate to reject the null hypothesis.

Pearson's Chi-Square Test

Pearson's chi-square tests are some of the most common applications of chi-square distributions. You use these tests to determine if your data are significantly different from what you expect. The two types of Pearson's chi-square tests are:

  1. the chi-square goodness of fit test and

  2. the chi-square test for independence.

Say a t-shirt company wants to know if all colors of their t-shirts are equally popular. To find out, they record the number of sales per shirt color for a week. This data is represented in the table below:

Sales per Shirt Color
ColorFrequency
Black80
Blue90
Gray70
Red60
White100

Table 2. Sales per shirt color data.

Since the company sold \(400\) t-shirts, \(80\) sales per color would mean that the colors were equally popular. Based on the numbers in the table, you know that the company did not sell \(80\) of each color of t-shirt. However, this is only a one-week sample, so you should expect that the numbers won't be equal due to chance.

But, does this sample give enough confidence to conclude that the frequency of t-shirt sales truly differs between colors?

Solution:

This is where a chi-square goodness of fit test comes in. It could test whether the observed frequencies are significantly different from equal frequencies.

If you tell the company to compare the Pearson chi-square test statistic to the appropriate chi-square distribution, then the company can determine the probability of these t-shirt sale values happening because of chance.

F Distribution Definition

Chi-square distributions are also integral in defining the \(F\) distribution, a distribution used in Analysis of Variance (ANOVAs).

How do you use chi-square distributions to define an \(F\) distribution?Solution:
  1. Say you take random samples from a chi-square distribution.
  2. Next, you divide the sample by the degrees of freedom of the chi-square distribution.
  3. Repeat steps \( 1-2 \) with a different chi-square distribution.
    • If you take the ratios of the values from these two distributions, you will get an \(F\) distribution.

Chi-Square Distribution Examples

Now, let's work through some examples!

Square and add up \(15\) standard normal random variables. What distribution does this sum follow?

Solution:

A squared standard normal random variable follows a chi-square distribution with \(1\) degree of freedom. The sum of chi-square random variables also follow a chi-square distribution, with the degrees of freedom of the sum being the sum of the individual degrees of freedom. Let’s follow this process.

  1. Let \(Z\) be a standard normal random variable:\[ Z_{i}^{2} \sim \chi_{1}^{2} \]
  2. Then:\[ Z_{1}^{2} + Z_{2}^{2} = \chi_{1}^{2} + \chi_{1}^{2} \sim \chi_{2}^{2} .\]
  3. So if you have the sum of \(15 Z_{i}^{2} \), you have:\[ \sum_{i = 1}^{15} Z_{i}^{2} = \sum_{i = 1}^{15} \chi_{1}^{2} \sim \chi_{15}^{2}. \]
  4. The sum of \(15\) squared standard normal random variables follows a chi-square distribution with \(15\) degrees of freedom.

Building on the previous example:

What are the

  1. mean,
  2. variance,
  3. standard deviation, and
  4. skewness

of the distribution from the previous example?

Solution:

  1. The mean of a chi-square distribution is equal to the degrees of freedom:\[ \mu \left[ \chi_{15}^{2} \right] = k = 15 \]
  2. The variance of a chi-square distribution is two times the degrees of freedom:\[ \sigma^{2} \left[ \chi_{15}^{2} \right] = 2(15) = 30 \]
  3. The standard deviation is the square root of the variance:\[ \begin{align}\sigma \left[\chi_{15}^{2} \right] &= \sqrt{ \sigma^{2} \left[ \chi_{15}^{2} \right]} \\&= \sqrt{30} \approx 5.477.\end{align} \]
  4. The skewness has a formula, too:\[ \text{Skewness} \left[ \chi_{15}^{2} \right] = \sqrt{\frac{8}{15}} \approx 0.73 \]

Here is an example using a chi-square table.

Using a chi-square table, find the \(90\%\), \(95\%\), and \(99\%\) critical values for a chi-square distribution with \(8\) degrees of freedom.

Solution:

All you have to do for this question is read the table.

  1. There are \(8\) degrees of freedom; find the row corresponding to \(8\) degrees of freedom.
  2. To find the critical values, find the columns for \(0.1\), \(0.05\), and \(0.01\).
    • \(0.1\) corresponds to the critical value for \(90\%\).
    • \(0.05\) corresponds to the critical value for \(95\%\).
    • \(0.01\) corresponds to the critical value for \(99\%\).
  3. Then read the numbers in the cells.
    • The critical values are:
      • \(13.36\),
      • \(15.51\), and
      • \(20.09\).
    • The results are highlighted in the table below:

Table 3. Chi-square distribution example.

Percentage Points of the Chi-Square Distribution
Degrees of Freedom (k)Probability of a Larger Value of X2; Significance Level (α)
0.990.950.900.750.500.250.100.050.01
60.8721.6352.2043.4555.3487.8410.6412.5916.81
71.2392.1672.8334.2556.3469.0412.0214.0718.48
81.6472.7333.4905.0717.34410.2213.3615.5120.09
92.0883.3254.1685.8998.34311.3914.6816.9221.67

Chi-Square Distribution – Key takeaways

  • A Chi-Square \( (\chi^{2}) \) Distribution is a continuous probability distribution of the sum of squared, independent, standard normal random variables that is widely used in hypothesis tests.
  • The chi-square distribution is the basis for three chi-square hypothesis tests:
    1. the chi-square test for goodness of fit,

    2. the chi-square test for independence and the chi-square test for homogeneity, and

    3. the test of a single variance.

  • Pearson's chi-square distribution formula (a.k.a. statistic, or test statistic) is:

    \[ \chi^{2} = \sum \frac{(O-E)^{2}}{E} \]

  • A common use of a chi-square distribution is to find the sum of squared, normally distributed, random variables. So, if \( Z_{i} \) represents a normally distributed random variable, then:

    \[ \sum_{i=1}^{k} z_{i}^{2} \sim \chi^{2}_{k} \]

  • Properties of a chi-square distribution are:
    • Mean: \[ \mu \left[ \chi^{2} \right] = k \]
    • Variance: \[ \sigma^{2} \left[ \chi^{2} \right] = 2k \]
    • Mode: \[ \text{mode} \left[ \chi^{2} \right] = k - 2, \text{ if } k \geq 2 \]
    • Standard deviation: \[ \sigma \left[ \chi^{2} \right] = \sqrt{2k} \]
    • Range: \[ \text{range} \left[ \chi^{2} \right] = 0 \to \infty \]
    • Symmetry: a non-symmetric distribution that becomes increasingly symmetrical as \(k\) increases.
    • Shape: depends solely on the degrees of freedom, \(k\), the number of values that are free to vary.
    • The sum of chi-square distributed random variables also follows a chi-square distribution, with the mean of the sum being the sum of means and the variance of the sum being the sum of the variance:\[ \chi_{k_{1}}^{2} + \chi_{k_{2}}^{2} = \chi_{k_{1}+k_{2}}^{2} \]

Frequently Asked Questions about Chi-Square Distribution

A chi squared distribution is a probability distribution that represents the sum of squared, independent standard normal random variables.

The chi-square distribution can be used to model the sum of squared, independent standard normal random variables, including modeling variance. The chi-square distribution can also represent the ratio of observed variance to theoretical variance. For these two reasons, the chi-square distribution is used in many statistical hypothesis tests.

The chi-square distribution is a special form of gamma distribution, with parameters a=(k/2) and b=(1/2), where k is the degrees of freedom. Because the formula requires difficult calculus to evaluate, it is most often calculated using statistical software or on a calculator.

The shape of a chi-square distribution is determined by the degrees of freedom. With only one of two degrees of freedom, the probability density function (PDF) starts high and quickly decays toward zero, like an exponential distribution. With higher degrees of freedom, the chi-square PDF begins to look bell-shaped. The distribution can never be negative, so start with a point at 0. The peak of the distribution is to the left of the mean. A chi-square distribution has an infinite right tail.

A common example of a chi-square distribution is the variance of a normal distribution. This is especially useful for drawing a confidence interval for the sampling variance.

Test your knowledge with multiple choice flashcards

What is the standard deviation of a \( \chi^{2}_{k} \) distribution?

A chi-square distribution with \(4\) degrees of freedom has a \(95\%\) critical value of \(9.49\).

A chi-square distribution with \(18\) degrees of freedom has a \(10\%\) critical value of \(25.99\).

Next

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App