# Statistical Analysis in Biology

Have you ever heard of people saying correlation does not imply causation? If not, have you ever done experiments in class and been asked for data? If so, for what purpose did you do it? Why did teachers tell you to observe, collect, and analyze data? The answer is that observation, collection, and analysis are the basis of the scientific method. We can use statistical data to quantify biological data. Consistent data collection has made big data a critical field, particularly in biology. In the following, we will define statistical analysis, and describe the different types and methods to use statistics to analyze data in biology. We will also see some examples and introduce the emerging field of biostatistics and its importance.

#### Create learning materials about Statistical Analysis in Biology with our free learning app!

• Flashcards, notes, mock-exams and more
• Everything you need to ace your exams

## What is the definition of statistical analysis in biology?

Statistical analysis involves collecting, exploring, and interpreting data sets to discover trends and patterns to make conclusions.

Within biology, we have the field of biostatistics.

Biostatistics is the field of study where researchers apply statistical analysis to biological topics.

For example, we can design biological experiments, with the purpose of extracting and analyzing data and finally interpreting the results to reach conclusions.

Essential parts of biological experiments that involve statistical analysis in order are:

1. Determining sample sizes

• Random selection means that all members of a population sampled have a chance of being selected.
• We perform random selection because researchers hope that, by drawing from a large population sample, the sample or group would be more representative of the whole population.
• Overall, random selection helps ensure that bias is less likely to occur.
• Other vital parts of determining sample sizes include selecting the correct number of trials. For example, in clinical trials for drug development, the FDA (the U.S. Food and Drug Administration) requires Phase I, II, and III trials. These trials are needed to determine if the drug can be approved for use.

2. Testing hypotheses

• After choosing the sample size and the number of trials, scientists must consider if their data will yield statistically significant results.
• Scientists do this by establishing a null hypothesis and calculating the p-value. Both of which we will go over more in-depth in the later sections. The important thing right now is to understand that to yield statistically significant results means that a result is unlikely due to chance. Therefore the finding is most likely accurate and can be trusted.

3. Interpretation of data

• To conclude anything, we need to be able to interpret data.
• Nowadays, we use Statistical Analysis Software (SAS), Excel, or programming languages such as Python and R for substantial data sets.
• Some methods that help you interpret data are sample size determination, standard deviation, the mean, and regression. We will expand more on this in the methods of statistical analysis section.

## Types of statistical analysis in biology

After going over what statistical analysis is, we can now focus on the common types of statistical analysis used in biological experiments.

Descriptive statistical analysis

• Descriptive statistics describe or summarize data.
• This statistic doesn't allow you to make inferences or conclusions beyond the data.
• Descriptive statistics are essential in helping visualize considerable amounts of data.
• Common examples of descriptive analysis are measures of central tendency (mean, median, mode), measures of spread (standard deviation, range, and quartiles), skew, and the correlation coefficient.

Inferential statistical analysis

• Inferential statistics allow us to compare what we are testing and make predictions from the data, unlike descriptive statistics.
• Inferential statistics use multiple sample sizes and trials to make generalizations about the larger population.
• The generalizations we can make are whether our sample data or experiment is based on chance or is statistically significant. Statistically significant means that the sample data is representative of the population in which we are interested.
• Common examples of inferential analysis are t-tests, chi-square, p-values, and linear regression.

While there are more statistical analyses than the two mentioned above, descriptive and inferential are the most common ones used in biological research.

## Methods of statistical analysis

Now that we've looked at the types of statistical analysis in biology. We can go over the common examples mentioned under descriptive and inferential statistics in more detail. The amount of information in this article will be limited to only the statistics we need to know in biology.

Correlational

Correlational studies or tests measure how closely related two or more variables are. By closely related, we mean linearly or how they change together at a constant rate. Scientists usually use this method to describe relationships between two or more variables without linking a cause and effect. Since correlational studies fall under descriptive statistics, they help describe simple relationships. For example, think about how the time spent studying, and grades correlate. Usually, we'd say they are positively correlated if students are studying actively.

We measure correlations using the correlation coefficient or r, ranging from -1 to +1.

• Negative r values indicate a negative correlation. A negative correlation or relationship occurs when one variable increases and the other decreases, as shown in Figure 1.
• Positive r values indicate a positive correlation. A positive correlation or relationship occurs when both variables increase, as shown in Figure 2.
• A perfect negative correlation would be r= -1, and a perfect positive correlation would be r= +1. It's extremely rare to find perfect correlations in the real world.
• For instance, if we look at Figure 2, a perfect correlation of r= +1 would occur if every time the student studied, their grades went up. This doesn't happen because 1) the highest grade a student can get is 100%, 2) after a while of studying, we get diminishing returns because students get distracted, tired, etc., and 3) we need an adequate amount of nutrition and sleep for our brain to function optimally.
• The closer r is to 0, the more likely it is not to have any correlation, as shown in Figure 3.

Fig. 1: Negative Correlation illustrated. Fig. 2: Positive Correlation illustrated. Fig. 3: No Correlation illustrated.

The larger the slope, whether negative or positive, the steeper the line gets. The difference is that positive slopes lean or slants to the right compared to negative slopes, which tilt or incline to the left. The no-correlation graph is just a straight line or slope of 0.

Regression

Regressions define the strength between an independent variable (usually denoted as X) and a dependent variable (usually marked as Y). If more than two independent variables are involved, we are dealing with a multiple linear regression model. We measure regression using the coefficient of determination or $$R^2$$. The higher the coefficient of determination, the better the model fits our data.

Mean

The mean or the average of a data set is a commonly known mathematical term. We use it to look at a data set's big picture or overall trend. Keep in mind that the mean can be an inaccurate statistical method if the data has a lot of outliers. We calculate it by adding up all the numbers in the data set and then dividing by however many numbers there are in the data set.

Standard deviation

Standard deviation is a statistical method that measures how far our data is spread from the mean. A low standard deviation means that our data is close to the mean and spread out from the mean or norm if our standard deviation is high. Normal distributions have symmetrical data with no skew. Researchers usually use standard deviation when they need to determine if their data points are clustered or not.

## Examples of statistical analysis

Now that we understand the definition of statistical analysis and the types and methods of statistical analysis, it's time to move on to examples or applications of the methods of statistical analysis mentioned above.

Examples of how scientists use the mean in statistical analysis include hypothesis testing or comparison of means

The comparison of means method involves comparing the means of two or more different sets or groups.

If we compare two groups or sets, we can use t-tests, but if you need to compare more than two groups, researchers usually use an ANOVA test. We will only go over the more commonly used t-test.

To use a t-test, we must first assume that our data is:

• Both samples were obtained through random sampling.
• Both samples have independent observations from each other.
• Both samples are normally distributed or close to it.
• Both samples have similar variance or distribution.
• Both samples have continuous data or data that can take on any value, not just whole numbers.

What type of t-test do scientists use?

• A paired t-test is used when the groups of interest come from a single population. It compares the means of two values coming from the same object, population, etc. For example, comparing the same humans or subjects' cholesterol levels in 1999 and 2007.
• An independent t-test or two-sample t-test is used when the groups of interest come from different populations. It compares the means of two values coming from different objects, populations, etc. For example, comparing cholesterol levels in 1999 between different humans or subjects.
• A one-sample t-test is used when one group is compared against a standard measurement or value. For example, scientists know that a woman's ideal cholesterol level is around 50 mg/dl. Scientists collect data from women who go to the same hospital and have been cleared of cholesterol problems. They want to see how the mean from this group compares to the 50 mg/dl value.

After selecting the type of t-test that's needed based on their experiments, researchers usually use statistical software to calculate the t-value. The bigger the absolute t-value, the more likely the sample mean differs from the population mean.

Important values that relate to a t-test are:

• The null hypothesis ($$H_0$$) states that there's no correlation or relationship between the two variables of interest. In other words, the results are not statistically significant and are due to chance. The null hypothesis is always assumed to be true until proven otherwise.
• The p-value describes the probability that our null hypothesis is true. A p > 0.05 indicates that your results aren't statistically significant, and the null hypothesis is true. When p is less than or equal to 0.05, the study is statistically significant.

The standard deviation and the mean can be used together to tell you where the values in your data set fall or lie if they follow a normal distribution

We call this rule the empirical rule or the 68-95-99.7 rule, which states:

• About 68% of scores are within 1 standard deviation (SD) from the mean.

• About 95% of scores are within 2 standard deviations (SD) from the mean.

• About 99.7% of scores are within 3 standard deviations (SD) from the mean.

Scientists use the empirical rule to make sure that their data set or predicted values are close to the average or mean. If it's not then the likelihood of the prediction, experiment, or theory being right is low.

## Purpose of statistical analysis

Researchers use statistical analysis with the purpose of answering essential and often experimental questions in biology. Usually, scientists collect data to answer questions such as "What's the degree of correlation?", "How much?", "How many?" etc. Statistical analysis can provide a method for quantifying collected data and observations.

For example, a pharmaceutical company tells us that most of their patients showed no adverse effects from the drug during all three trials. We'd want to determine what their sample size was. In other words, we want to know what "most" means. Do most mean 120/200 or only 60% of people survive, or 199/200 people survive? We'd also want to know if they were randomly sampled, the mean of the data set, and what they mean by adverse side effects.

## Statistical Analysis in Biology - Key takeaways

• Statistical analysis involves collecting, exploring, and interpreting data sets to discover trends and patterns to make conclusions.
• Within biology, we have the field of biostatistics. Biostatistics is the field of study where researchers apply statistical analysis to biological topics.
• Descriptive and inferential are the most common types of statistical analysis used in biological research.

• Common methods of statistical analysis in biology include correlation, regression, standard deviation, and the mean.

• Researchers use statistical analysis with the purpose of answering essential and often experimental questions in Biology.

## References

1. https://www.biointeractive.org/teacher-guide-math-and-statistics
2. https://www.sciencedirect.com/topics/medicine-and-dentistry/statistical-analysis
3. https://www.sciencedirect.com/topics/medicine-and-dentistry/inferential-statistics
4. https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/sas/sas4-onesamplettest/SAS4-OneSampleTtest7.html

#### Flashcards in Statistical Analysis in Biology 30

###### Learn with 30 Statistical Analysis in Biology flashcards in the free StudySmarter app

We have 14,000 flashcards about Dynamic Landscapes.

What is statistical analysis in biology?

Statistical analysis in biology involves collecting, exploring, and interpreting data sets to discover trends and patterns to make conclusions.

What is the use of statistics in biology?

We use statistics in biology to test hypotheses, perform experiments, choose sample sizes, and even interpret results.

What are the 5 basic methods of statistical analysis?

The five basic methods of statistical analysis are standard deviation, the mean, regression, hypothesis testing, and sample size determining.

What is an example of statistical analysis?

An example or application of statistical analysis that researchers use is the t-test to compare the means between two groups either against a standard value, the same population, or different populations.

What is the purpose of statistical analysis in biology?

Researchers use statistical analysis to answer essential and often experimental questions in Biology. Usually, scientists collect data to answer questions such as "What's the degree of correlation?", "How much?", "How many?" etc. Statistical analysis can provide a method for quantifying collected data and observations.

## Test your knowledge with multiple choice flashcards

What is statistical analysis?

How are biology and statistics related?

What is descriptive statistical analysis?

StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

##### StudySmarter Editorial Team

Team Biology Teachers

• Checked by StudySmarter Editorial Team