|
|
Spearman's Rank Correlation Coefficient

Suppose you and your friend decided to do your own soda taste test. You buy \(10\) different brands of soda, and you each taste them, ranking them from best to worst. What if the two of you gave some sodas different ranks? How would you tell if you and your friend gave approximately the same ranks even if you didn't give exactly the same ranks? That is what the Spearman's rank correlation coefficient can tell you!

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Spearman's Rank Correlation Coefficient

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

Suppose you and your friend decided to do your own soda taste test. You buy \(10\) different brands of soda, and you each taste them, ranking them from best to worst. What if the two of you gave some sodas different ranks? How would you tell if you and your friend gave approximately the same ranks even if you didn't give exactly the same ranks? That is what the Spearman's rank correlation coefficient can tell you!

Definition of Spearman's rank correlation coefficient

Remember that a product moment correlation coefficient (PMCC) is used to measure a linear correlation between two variables.

See the articles Correlation and Product Moment Correlation Coefficient for more details.

But what if your data isn't linearly correlated, or can't even be measured on a continuous scale? In that case, you can use the Spearman's rank correlation coefficient. In fact, you might use the Spearman's rank correlation coefficient as an approximation to the product moment correlation coefficient even if the data is linearly correlated simply because Spearman's rank correlation coefficient is a simpler calculation.

For more details, see Comparing Spearman's Rank and Product Moment Correlation Coefficient.

In general, you would use Spearman's rank correlation coefficient if:

  • one or both of your data sets are from a population which is not normally distributed;

  • the relationship between the data sets is non-linear; or

  • one or both of the data sets is already represented as a ranking.

Values of the Spearman's rank correlation coefficient range between \(-1\) and \(1\).

A Spearman's rank correlation coefficient of:

  • \(1\) means the rankings are in perfect agreement;
  • \(0\) means there is no relationship between the rankings; and
  • \(-1\) means the rankings are in reverse order.

Often the Spearman's rank correlation coefficient will not be exactly \(1\), \(0\) or \(-1\). Generally, when you do a hypothesis test using the Spearman's rank correlation coefficient, you are testing to see if there either is or is not, a relationship between the rankings.

See Testing for Zero Correlation for more details on this type of hypothesis test.

Spearman's rank graph

When looking to see if there might be a correlation when using the Spearman's rank, it can help to graph the data. Remember, you are not looking to see if the data in the graph makes a line, you are looking to see if the rankings are the same.

In the graph below, you can see the rankings that two judges gave at a competition. The rankings that Judge A gave the competitors are noted by circles, while the rankings that Judge B gave are noted by crosses.

Spearman's Rank Correlation Coefficient plot of rankings given by two judges StudySmarterFig. 1 - Plot of rankings given by two different judges.

For example, Judge A gave the first competitor a ranking of \(1\), while Judge B gave the competitor a ranking of \(2\). While the data plotted does not form a line, it does appear that both judges gave approximately the same score to all of the competitors, and in three cases, they gave exactly the same score. So you could expect the Spearman's rank correlation coefficient for the rankings here to be closer to \(1\) than to \(0\).

Spearman's rank correlation coefficient formula

Using the formula for the Spearman's rank correlation coefficient requires the data sets to be ranked. It doesn't matter how you rank them (for example, best to worst or worst to best) as long as you rank both sets the same way. Before looking at the formula, let's look at an example of organising the rankings.

Two coffee tasters were asked to rank \(8\) brands of coffee in order of preference. Their order preferences for the brands are given in the table below.

Table 1. Coffee preferences by the taster.

Coffee BrandABCDEFGH
Taster \(x\)\(4\)\(5\)\(2\)\(8\)\(1\)\(3\)\(7\)\(6\)
Taster \(y\)\(4\)\(6\)\(1\)\(7\)\(3\)\(2\)\(5\)\(8\)

Each coffee is given a preference number by the taster. As long as taster \(x\) and taster \(y\) both use \(1\) to mean the same thing on the scale, then you will be able to compare the rankings. If you don't know that taster \(x\) and taster \(y\) used \(1\) to mean the coffee they prefer the most, you won't be able to tell what the correlation coefficient means even though you will be able to calculate it.

To calculate the correlation coefficient, you will need the following values:

\[ S_{xy} = \sum x_iy_i - \frac{1}{n}\sum x_i \sum y_i; \]

\[ S_{xx} = \sum x_i^2 - \frac{1}{n} \left(\sum x_i\right)^2;\]

and

\[S_{yy} = \sum y_i^2 - \frac{1}{n} \left(\sum y_i\right)^2.\]

Then the Spearman's rank correlation coefficient can be found using the formula

\[ r_s = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} .\]

You might find an example where the same score is given to more than one data point. This is called a tied rank.

A tied rank occurs when two or more data values in one of the data sets are the same.

Let's look at a quick example.

Suppose a coffee taster was instead asked to give the coffee a letter grade depending on how much they liked it. For the coffees they tasted, they gave scores of: A, C, F, D, B, C, C, C. Notice that of the eight coffees listed, three of them have a score of C! So if you tried to make a ranking table you would get:

Table 2. Possible ranking table

Rank\(1\)\(2\)\(7\)\(8\)
GradeABCCCCDF

But what do you do with the four coffees that each scored a C? Do you give them a rank of \(3\), \(4\), \(5\) or \(6\)? It turns out that you give them the average of the ranks since they are tied. Finding the average gives you

\[ \frac{3+4+5+6}{4} = 4.5,\]

so each one would be ranked \(4.5\). The completed ranking table would be:

Table 3. Completed ranking table

Rank\(1\)\(2\)\(4.5\)\(4.5\)\(4.5\)\(4.5\)\(7\)\(8\)
GradeABCCCCDF

Notice that in the previous example, you are not comparing the ranks of taster \(x\) to the ranks of taster \(y\). You are only comparing the ranks given by a single taster.

If there are more than two tied ranks, then the formula

\[ r_s = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \]

needs to be used. However, if there are two or fewer tied ranks, then you can use the following formula instead:

\[ r_s = 1 - \frac{6}{n(n^2-1)} \sum d^2,\]

where \(n\) is the number of pairs of observations and \(d\) is the difference between the ranks of each observation. The difference formula will give you a good approximation of the Spearman's rank correlation coefficient as long as there aren't tied ranks.

Spearman's rank table

Once you know the Spearman's rank correlation coefficient, you will often use it to do a hypothesis test. While you can use technology to find the critical value, it is helpful to be able to read a Spearman's rank table. Below is a section from a Spearman's rank table.

Table 4. Spearman's rank table

\(n\)/\(\alpha \)

\(0.1\)

\(0.05\)

\(0.25\)

\(0.01\)

\(6\)

\(0.657\)

\(0.829\)

\(0.886\)

\(0.943\)

\(7\)

\(0.571\)

\(0.714\)

\(0.786\)

\(0.893\)

\(8\)

\(0.524\)

\(0.643\)

\(0.738\)

\(0.833\)

The first column of the table is the sample size \(n\), and the first row of the table gives you the confidence level. Notice that as the sample size increases, the critical value for a given confidence level decreases. Remember that the margin of error depends on the critical value:

margin of error = (critical value)(standard error).

This means that if you increase the sample size, the margin of error will decrease.

Critical value of Spearman's rank correlation coefficient

The critical value of the Spearman's rank correlation coefficient depends on the sample size and the confidence level you are using. The critical value can be found using a table or through statistical software. For example, if you are doing a one-tailed test, with a sample size of \(7\), at the \(0.25\) confidence level, you would use a table of Spearman's coefficients to see that the critical value is \(0.786\). You can find this critical value in the table above.

In other words, for a sample size of \(7\), the critical value of \(r_s\) is significant at the \(0.25\) level on a one-tailed test at \(\pm 0.786\).

Spearman's rank correlation coefficient example

Let's go back to the coffee example and work out what the correlation coefficient is.

Two coffee tasters were asked to rank eight brands of coffee in order of preference, with \(1\) being the coffee they liked the most. Their order preferences for the brands are given in the table below.

Table 5. Coffee preferences by the taster.

Coffee BrandABCDEFGH
Taster \(x\)\(4\)\(5\)\(2\)\(8\)\(1\)\(3\)\(7\)\(6\)
Taster \(y\)\(4\)\(6\)\(1\)\(7\)\(3\)\(2\)\(5\)\(8\)

Find and interpret the Spearman's rank correlation coefficient.

Solution:

Notice that even though both tasters ranked coffee A as their fourth choice, this is not an example of a tied rank. Tied ranks would happen if one taster gave two coffees the same rank. So it is reasonable to use the simplified formula

\[ r_s = 1 - \frac{6}{n(n^2-1)} \sum d^2 .\]

Here there are eight coffee brands, so \(n=8\). Looking at the summation first,

\[\begin{align} \sum\limits_{i=1}^8 d_i^2 &= (4-4)^2 + (5-6)^2 + (2-1)^2 + (8-7)^2 \\ & \quad + (1-3)^2 + (3-2)^2 + (7-5)^2 + (6-8)^2 \\ &= 0+1+1+1+4+1+4+4 \\ &= 16. \end{align}\]

Then

\[\begin{align} r_s &= 1 - \frac{6}{n(n^2-1)} \sum d^2 \\ &= 1-\frac{6}{8(8^2-1)}(16) \\ &= 1-\frac{6}{8(63)}(16) \\ &\approx 0.81. \end{align}\]

Since \(r_s \not= 0\), you can't say there is no relationship between the rankings. However, since it is close to zero, you can say there is very little correlation between the rankings of the two tasters.

Spearman's Rank Correlation Coefficient - Key takeaways

  • Use the Spearman's rank correlation coefficient if:
    • one or both of your data sets are from a population which is not normally distributed;

    • the relationship between the data sets is non-linear; or

    • one or both of the data sets is already represented as a ranking.

  • A Spearman's rank correlation coefficient of:

    • \(1\) means the rankings are in perfect agreement;
    • \(0\) means there is no relationship between the rankings; and
    • \(-1\) means the rankings are in reverse order.
  • A tied rank occurs when two or more data values in one of the data sets are the same.
  • If there are two or fewer tied ranks, then you can use the formula:

    \[ r_s = 1 - \frac{6}{n(n^2-1)} \sum d^2,\]

    to approximate the Spearman's rank correlation coefficient, where \(n\) is the number of pairs of observations and \(d\) is the difference between the ranks of each observation.

Frequently Asked Questions about Spearman's Rank Correlation Coefficient

It is used to measure the correlation between variables when there isn't a linear relationship between them.

First you rank the data, and then you calculate the correlation coefficient.

In much the same way as you would measure a linear correlation, except the data isn't assumed to be linear.

Whether or not two sets of rankings agree.

Spearman's is used for data which is not linear, and Pearson's is used for linear correlations.

Test your knowledge with multiple choice flashcards

If you have data that is ranked and you wanted to compare the rankings, you would use the ____ correlation coefficient.

If you believe that two data sets are related but not in a linear fashion you would use the ____ correlation coefficient.

If you think your data might be linearly related you would use the ____ correlation coefficient.

Next

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App