Today I was feeling extra-hungry so I went to an all-you-can-eat buffet. There is a wide variety of dishes and desserts, but what really caught my attention was the pricing chart for children under the age of \(15\). The younger they were, the less they would charge them.
Explore our app and discover over 50 million learning materials for free.
Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken
Jetzt kostenlos anmeldenNie wieder prokastinieren mit unseren Lernerinnerungen.
Jetzt kostenlos anmeldenToday I was feeling extra-hungry so I went to an all-you-can-eat buffet. There is a wide variety of dishes and desserts, but what really caught my attention was the pricing chart for children under the age of \(15\). The younger they were, the less they would charge them.
After thinking about it for a while, it made perfect sense to me. Because toddlers don't eat as much as teenagers, it is fine to charge them less. It seems that there is a relationship between age and the amount of food a person eats. How to study this relationship? Maybe you could run a survey for this research!
Whenever you are looking at the relation between two variables that you can measure you are dealing with two quantitative variables. Here you will learn how to study their relationship and the techniques used for their representation.
Before proceeding, it is important to review the difference between quantitative and categorical variables.
A quantitative variable is a variable that can be measured with units.
It does not matter which type of units you are using, as long as you can measure a variable then it is a quantitative variable. What about categorical variables?
A categorical variable, also known as a qualitative variable, is a variable whose properties are described rather than measured.
Categorical variables are usually things like colors, names, favorite meals, and so on.
Suppose you are doing a survey in your neighborhood, and you will ask for the following data:
Which variables are quantitative, and which are categorical?
Solution:
To find which variables are quantitative you must ask yourself which variables can be measured. From the given list, age is typically measured in years, while you can measure height in feet, inches, meters, or more units. This means that the quantitative variables are:
The rest of the variables will be given as words rather than numbers, so it is easier to think of them either as labels (in the case of the last name) or as a description of themselves (like gender and their favorite activity). So, the categorical variables are:
Typically, a survey is made in order to gather data for its inspection. Decisions are made based on conclusions drawn from data, so it is important to analyze the relationship between variables.
When comparing two quantitative variables you can have a clearer picture of the data by organizing it according to the numerical values that are being represented. This is not the case for categorical variables, as you will see in the next example.
Suppose you want to make a graph to study the relationship between these two pairs of variables:
When doing a graph of two quantitative variables, as in the case of the height and weight of the students, you can arrange data in numerical order. That is, each axis will represent a number line, so before filling it with data, your graph will look like this:
The graph comparing two quantitative variables is more insightful as if you move to the right you are looking at taller people, and if you move up you will look at people with higher weights. You can tell this even if the graph is empty!
If you were to use a graph to represent two categorical variables, as in the case of the favorite sport and color, there is no clear arrangement for the data. You might organize it in alphabetical order, or maybe you will arrange it according to your preferences, but this arrangement does not tell you anything beforehand.
It is important to keep in mind the context of the survey in order to properly classify a variable as quantitative or categorical. For example, you might think that a zip code is a quantitative variable because it is a number, but since it is just a label, it is a categorical variable instead.
If you want to know how to analyze categorical variables, please reach out to our Two Categorical Variables article.
A natural question that arises whenever you are given two variables is: Are these two variables related to each other?
Consider the case of height and weight. The taller a person is, the more they will weigh. This does not mean that a taller person will always weigh more than a shorter person, but rather it tells you that there is a relation between these variables.
It might also be possible to have two unrelated variables, like the age and height of a population of full-grown men. Whenever you are dealing with two variables, be they related or not, you are dealing with bivariate data.
Bivariate data is data that is given as pairs of variables.
In the height and weight example, when you are doing a survey you will be asking for both the height and the weight of each individual, so each of these values will be paired. This is an example of bivariate data.
Bivariate quantitative data is bivariate data that consists of two quantitative variables.
Bivariate quantitative data can be represented in many ways. For example, you can use a table of values, where each column represents one of the variables.
Suppose you want to investigate if there is a relationship between consumption habits and age. For this reason, you go to your local mall and politely ask each leaving person if they are up to a survey. In this survey, you just ask for their age and how many items they bought if any. Your data can be arranged in a table like this:
Age (Years) | Number of Bought Items |
\[12\] | \[0\] |
\[36\] | \[4\] |
\[21\] | \[12\] |
\[24\] | \[5\] |
\[15\] | \[2\] |
\[23\] | \[7\] |
\[45\] | \[2\] |
\[67\] | \[1\] |
\[11\] | \[1\] |
From the above table, you can begin to note some patterns. It looks like children tend to buy fewer things, maybe because they lack money. On the other hand, young adults seem to like getting their hands on a lot of stuff. Of course, there are many more factors involved in consumption habits, but this is a good start!
You can rearrange the above table by ordering the data by age, in which case you need to make sure that you pair correctly each entry.
Age (Years) | Number of Bought Items |
\[11\] | \[1\] |
\[12\] | \[0\] |
\[15\] | \[2\] |
\[21\] | \[12\] |
\[23\] | \[7\] |
\[24\] | \[5\] |
\[36\] | \[4\] |
\[45\] | \[2\] |
\[67\] | \[1\] |
Please keep in mind that the table can also be written horizontally, in which case each row will represent an inquiry.
Another way of representing bivariate quantitative data is by drawing points in a plane, as you will see in the next section.
There are many ways of displaying quantitative data. For example, if you are interested in doing a survey about the ages of high school students, you can use a histogram, a dot plot, or a stem-and-leaf display. However, all these graphs are used to display a single variable along with its frequency.
Suppose you are given a set of bivariate quantitative data, this means that both variables are quantitative variables, so you are dealing with a pair of numbers. This makes graphing bivariate quantitative data a straightforward task, as you can represent data by points on the plane. In order to do this, you need to assign an axis to each variable.
Consider our mall example. You can assign the either variable to either axis, but you will usually assign the \(x\)-axis to variables like age and height, which either change at a constant rate or are less likely to change.
On the other hand, variables like weight, the number of bought items, or bottles of water drank in a week, are more likely to be assigned to the \(y\)-axis.
Note that the ages of customers range between \(11\) and \(67\) years, so the \(x\)-axis is scaled accordingly. Likewise, the \(y\)-axis ranges from \(0\) to \(12\).
Now that you have the plane labeled in a representative way, it is time to draw a lot of points. Here, each point represents an inquiry.
The graph shown in the previous example is known as a scatter plot, and it is one of the most common ways of displaying bivariate quantitative data.
For more information about these plots, please check out our Scatter Plots article!
One of the reasons scatter plots are often used to represent bivariate quantitative data is that it is possible to identify patterns in data. Consider the following scatter plot.
From the above scatter plot you might have found a pattern in which, in general, as the ages of children increase, they become taller, which makes perfect sense. In this case, we say that both variables are correlated.
Correlation is a measure of how much two quantitative variables are associated with each other.
It is worth noting that correlation only applies to two quantitative variables. If you are dealing with bivariate data where one or both variables are categorical, then you should not be looking for correlation.
When two variables are correlated, you can draw a straight line that more or less describes how the data behaves. This line is known as the line of best fit, which is obtained by means of linear regression.
Reach out to our Linear Regression article for more information about this topic!
If two variables are correlated, you expect the change of one to impact the other in a significant way. Because of this, if the variables are correlated then the line of best fit will either be an increasing or a decreasing line.
On the other hand, if two variables are not correlated at all, you should expect the line of best fit to be horizontal, as the change in one variable does not impact the other at all. In this scenario, the data will be scattered all the way around.
In order to measure how correlated are two variables, you need to look at the Pearson correlation coefficient.
The Pearson correlation coefficient, also known as Pearson's \(r\), or just as correlation coefficient, is a number that ranges between \(-1\) and \(1\), which is used to measure the correlation of bivariate data.
For more information about the correlation coefficient, and how it is obtained, please take a look at our article about Linear Correlation.
There are some things to keep in mind when talking about correlation, which can be addressed using the correlation coefficient.
Correlation typically refers to the linear association between two variables, but it is also possible to find that some variables are related by other types of relations, like quadratic or exponential.
These other types of correlation will not be discussed further in this article.
The Pearson correlation coefficient only applies to linearly correlated bivariate data!
In the previous examples, you have seen how in the case of two correlated variables, as one increases, so does the other. This is a particular type of correlation, called positive correlation.
Two variables are positively correlated if the Pearson correlation coefficient is positive.
When two variables are positively correlated, the line of best fit has a positive slope. However, it is also possible to have negatively correlated variables.
Two variables are negatively correlated if the Pearson correlation coefficient is negative.
Likewise, when two variables are negatively correlated, the line of best fit has a negative slope.
Remember that negative correlation means that the data is correlated. The word negative is used to address the slope of the line of best fit.
Sometimes you might notice that a scatter plot strongly resembles a linear graph, while others have data so scattered around that it looks almost as if it was random! The absolute value of the Pearson correlation coefficient will give you insight into this matter.
Let \(r\) be the Pearson correlation coefficient of a set of bivariate quantitative data. The closer \(|r|\) is to \(1,\) the stronger is the correlation. A Pearson correlation coefficient of exactly \(1\) or \(-1\) means the data is completely linear, which is a scenario so perfect that it is unlikely to happen.
On the other hand, if the Pearson correlation coefficient is close to \(0\), then the data suggests that the variables are not correlated, or are correlated in a weak way.
Here you can look at some examples of scatter plots of two quantitative variables and tell whether they represent correlated data or not.
A survey was made on female adults about their reading habits, obtaining the following scatter plot.
Solution:
You can find correlations even in your grocery store!
Suppose you are on a diet and you are recommended to avoid added sugar in canned beverages like juice and soda.
You are skeptical about this suggestion, so you decide to study if there is a relationship between added sugar and calories per serving.
In order to do so, you go to the grocery store and check the nutrition facts on \(20\) different products that you would consume. When you get back home, you make the following scatter plot.
Should you follow the recommendation?
Solution:
By looking at the data you can find that as the amount of added sugar increases, so does the amount of calories per serving in these canned beverages.
You can conclude that there is a positive correlation between the amount of average sugar and the calories of canned beverages. Since you are on a diet, you should limit the calories you consume, so you should follow the recommendation.
An example of two quantitative variables is the height and weight of a person. Both variables can be measured, and for each survey you do on a population you get these two values.
A correlation test is used to test if two quantitative variables are related to each other.
A quantitative variable is a variable whose value can be measured or counted. Some examples are weight, age, height, number of books read, number of siblings, and many more.
A variable can be quantitative or categorical. Quantitative variables are variable whose values can be measured with units or counted, like weight, age, and number of persons.
Categorical variables are variables whose properties need to be described rather than measured, like color, flavors, movie genres, and gender.
Two quantitative variables are usually graphed using scatter plots.
This type of variable can be measured using units.
Quantitative variable.
Categorical variables are also known as ____.
qualitative variables.
Which of the following are quantitative variables?
Age.
Suppose you take a look at a bottle of soda. Which of the following are not quantitative variables?
Flavor.
True/False: The Pearson correlation coefficient can be equal to \(2\).
False.
True/False: You can assign a line of best fit to any scatter plot.
True.
Already have an account? Log in
Open in AppThe first learning app that truly has everything you need to ace your exams in one place
Sign up to highlight and take notes. It’s 100% free.
Save explanations to your personalised space and access them anytime, anywhere!
Sign up with Email Sign up with AppleBy signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.
Already have an account? Log in
Already have an account? Log in
The first learning app that truly has everything you need to ace your exams in one place
Already have an account? Log in