|
|
Categorical Variables

How satisfied are you with this app?  Please rate it on the following scale,

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Categorical Variables

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

How satisfied are you with this app? Please rate it on the following scale,

  • \(1\) very unsatisfied

  • \(2\) somewhat unsatisfied

  • \(3\) neither satisfied nor unsatisfied

  • \(4\) somewhat satisfied

  • \(5\) very satisfied

You have just seen categorical variables!

What are Categorical Variables?

Remember that univariate data, also known as one-variable data, are observations that are made on the individuals in a population or sample. That data comes in different types, like qualitative, quantitative, categorical, continuous, discrete, and so on. In particular, you will be looking at categorical variables, which are also often called categorical data. Let's first look at the definition.

A variable is called a categorical variable if the collected data falls into categories. In other words, categorical data is data which can be divided into different groups instead of being measured numerically.

Categorical variables are qualitative variables because they deal with qualities, not quantities. So, some examples of categorical data would be hair colour, the type of pets someone has, and favourite foods. On the other hand things like height, weight, and the number of cups of coffee that someone drinks per day would be measured numerically, and so are not categorical data.

To see the various types of data and how they are used you can take a look at One-Variable Data and Data Analysis.

Categorical vs. Quantitative Data

Now you know what categorical data is, but how is that different from quantitative data? It helps to look at the definition first.

Quantitative data is data that is a count of how many things in a data set we have a particular quality.

Quantitative data usually answers questions like "how many" or "how much". For example quantitative data would be collected if you wanted to know how much people spent on buying a cell phone. Quantitative data is often used to compare multiple sets of data together. For a more complete discussion of quantitative data and what it is used for, take a look at Quantitative Variables.

Categorical data is qualitative, not quantitative!

Categorical vs. Continuous Data

All right, what about continuous data? Can that be categorical? Let's take a look at the definition of continuous data.

Continuous data is data that is measured on a scale of numbers, where the data could be any number on the scale.

A good example of continuous data is height. For any of the numbers between \(4 \, ft.\) and \(5 \, ft.\) there could be someone of that height. In general, categorical data is not continuous data.

Types of Categorical Variables

There are two main types of categorical variables, nominal and ordinal.

Ordinal Categorical Variables

A categorical variable is called ordinal if it has an implied order to it.

An example of ordinal categorical data would be the survey at the start of this article. It asked you to rate satisfaction on a scale of \(1\) to \(5\), meaning there is an implied order to your rating. Remember that numerical data is data that involves numbers, which the survey example does have. So it is possible for survey data to be both ordinal and numerical.

Nominal Categorical Variables

A categorical variable is called nominal if the categories are named, i.e. if the data does not have numbers assigned.

Suppose a survey asked you what kind of housing you live in, and the options you could pick from were dorm, house, and apartment. Those are examples of named categories, so that is nominal categorical data. In other words, if it has a named category but isn't numerically ordered, then it is a nominal categorical variable.

Categorical Variables in Statistics

Before you go on to look at more examples of categorical variables, let's look at some of the advantages and disadvantages of categorical data.

On the advantage side are:

  • The results are very straightforward because people only get a few options to choose from.

  • Because the options are laid out ahead of time, there are no open-ended questions that need to be analyzed. Categorical data is called concrete because of this property.

  • Categorical data can be much easier to analyze (and less expensive to analyze) than other kinds of data.

On the disadvantage side are:

  • In general, you need to get quite a few samples to make sure the survey accurately represents the population. This can be expensive to do.

  • Because the categories are laid out at the start of the survey, it isn't very sensitive. For example, if the only two options for hair colour on a survey are brown hair and white hair, people will have trouble deciding which category to put their hair colour in (assuming they have any at all). This can lead to non-responses, and people making unanticipated choices on what their hair color is which skews the data.

  • You can't do quantitative analysis on categorical data! Because it isn't numerical data you can't do arithmetic on it. For example, you can't take a survey satisfaction of \(4\), and add it to a survey satisfaction of \(3\) to get a survey satisfaction of \(7\).

You can see a summary of the advantages and disadvantages of categorical variables in statistics in the following table:

Table 1. Advantages and disadvantages of categorical variables
AdvantagesDisadvantages
Results are straightforwardLarge samples
Concrete dataNot very sensitive
Easier and less expensive to analyseNo quantitative analysis

Collecting Categorical Data

How do you collect categorical data? This is often done through interviews (either in person or on the phone) or surveys (either online, in the mail, or in person). In either case, the questions asked are not open-ended. They will always ask people to choose between a specific set of options.

Categorical Data Analysis

The collected data then needs to be analysed, so how do you analyze categorical data? Often it is done with proportions or percentages, and it can be in tables or graphs. Two of the most frequent ways to look at categorical data are bar charts and pie charts.

Suppose you were asked to give a survey to decide whether people liked a particular soft drink and got back the following information:

  • 14 people liked the soft drink; and
  • 50 people did not like it.

First, we should figure out if this categorical data.

Solution

Yes. You can divide up the answers into two categories, in this case "liked it" and "didn't like it". This would be an example of nominal categorical data.

Now, how could we represent this data? We could do so with a bar or a pie chart.


Categorical Data in Tables bar chart showing the number of people who like the soda as a smaller bar than the one for people who didn't like it StudySmarter

Like and Didn't Like Bar Chart

Categorical Data in Tables pie chart showing the percentage of people who like the soda as a smaller pie wedge than the one for people who didn't like it StudySmarter

Pie chart showing percentage of people who liked or didn't like the soda

Either one gives you a visual comparison of the data. For many more examples of how to construct a chart for categorical data, see Bar Graphs.

Examples of Categorical Variables

Let's look at some examples of what categorical data can be.

Suppose you are interesting in seeing a movie, and you ask a bunch of your friends whether they liked it or not in order to decide whether you want to spend money on it. Of your friends, \(15\) liked the movie and \(50\) didn't like it. What is the variable here, and what kind of variable is it?

Solution

First of all, this is categorical data. It is divided into two categories, "liked" and "didn't like". There is one variable in the data set, namely your friends' opinions of the movie. In fact, this is an example of nominal categorical data.

Let's look at another example.

Going back to the movie example, suppose you asked your friends whether or not they liked a particular movie, and what city they live in. How many variables are there, and what kind are they?

Solution

Just like in the previous example, your friends' opinions of the movie is one variable, and it is categorical. Since you also asked what city your friends live in, there is a second variable here, and it is the name of the state they live in. There are only so many states in the US, so there are a finite number of places they could list as their state. So the state is a second nominal categorical variable you have collected data on.

Let's change what you are asking in your survey a bit.

Now suppose you have asked your friends about how much they are willing to pay to see the movie, and you give them three price ranges: less than $5; between $5 and $10; and more than $10. What kind of data is this?

Solution

This is still categorical data because you have laid out the categories your friends can answer in before you asked them to answer your survey. However this time it is ordinal categorical data since you can order the categories by price (which is a number).

So how do you compare categorical variables anyway?

Correlation Between Categorical Variables

Suppose you asked your friends whether or not they liked a particular movie, and whether they paid less than \($5\), between \($5\) and \($10\), or more than \($10\) to see it. Those are two categorical variables, so how can you compare them? Is there any way to see if how much they paid to see the movie influenced how much they liked it?

One thing you can do is look at comparative bar charts of the data, or at a two-way table. You can find more information about those in the article Bar Graphs. The other thing you can do is a more official kind of statistical test, called a chi-square test. This topic can be found in the article Inference for Distributions of Categorical Data.

Categorical Variables - Key takeaways

  • A variable is called a categorical variable if the data collected falls into categories.
  • Categorical variables are qualitative variables because they deal with qualities, not quantities.
  • A categorical variable is called ordinal if it has an implied order to it.
  • A categorical variable is called nominal if the categories are named.
  • Ways to look at categorical variables include tables and bar charts.

Frequently Asked Questions about Categorical Variables

A categorical variable is one where the data collected isn't a measurement.  For example, hair color is a kind of  categorical data, but pounds of produce bought per week is not.

Hair color, educational level, and customer satisfaction on a scale of 1 to 5 are all categorical variables.

A nominal categorical variable is one that can be put into categories, but the categories aren't intrinsically ordered.  For example whether you live in a house, apartment, or someplace else are categorical, but they don't have an intrinsic number associated with them.

Quantitative data is data that represents an amount, like height in inches.  Categorical data is data that is collected in categories, for example if a survey asked someone if they were less than 4 feet tall, between 4 and 6 feet tall, or more than 6 feet tall.

The most common way to measure categorical data is with percentages that are displayed graphically, as in bar graphs.

Test your knowledge with multiple choice flashcards

You can use bar graphs to represent the distribution of a data set when the observations refer to only one variable.

Dot plots represent numerical data and bar graphs represent categorical data.

Histograms can be represented vertically or horizontally.

Next

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App