|
|
Bioinformatics

For the past few years, an enormous amount of biological data has been generated as automated DNA sequencing technologies continue to improve. Our ability to read DNA through sequencing has enabled a massive revolution in biomedical sciences by creating new fields of study such as genomics, which is the study of the genome. Knowing the genetic makeup of an organism (genotype), meaning the sequence of base pairs that form its DNA, has allowed us to understand the causes behind genetic diseases better and see how life has evolved. 

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Bioinformatics

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

For the past few years, an enormous amount of biological data has been generated as automated DNA sequencing technologies continue to improve. Our ability to read DNA through sequencing has enabled a massive revolution in biomedical sciences by creating new fields of study such as genomics, which is the study of the genome. Knowing the genetic makeup of an organism (genotype), meaning the sequence of base pairs that form its DNA, has allowed us to understand the causes behind genetic diseases better and see how life has evolved.

Our efforts have progressively moved from sequencing individual genes to mapping complete genomes through genome projects, these new subfields all form part of bioinformatics. The first organism and bacterium to be fully sequenced was Haemophilus influenza in 1995, and the first multicellular organism was the nematode Caenorhabditis elegans in 1998.

What is the definition of bioinformatics?

Genome projects have enabled us to research and understand what genes are present and expressed in all organisms. Since the effort to map the human genome in the late 90s, billions of DNA base pairs and genomes from various species have been collected.

Still, this information is difficult to assemble and analyse manually!

The human genome alone accounts for some 3 billion base pairs and 20 000 genes. The Human Genome Project (HGP) led the effort to map the human genome completely and was one of the largest international collaboration efforts ever undertaken in biology. It took 13 years to complete the HGP. The project began in 1990, and in 2003 the first draft was published! 1

Computer technology made it possible to collect and use the enormous amount of sequencing data generated and led to the development of bioinformatics.

Bioinformatics is an emerging area of bioscience that combines computer science, statistics, biology, and sequencing data. Computing tools and software, like algorithms and statistical tests, applied to raw biological data make this data faster and easier to understand, organise, store and find patterns in.

Importantly, computer software also makes biological data accessible to everyone over the internet, stimulating collaboration and further research.

Bioinformatics is an interdisciplinary field of bioscience that develops methodologies to collect, process, and analyse large amounts of raw biological data using computer science tools.

The importance of bioinformatics

As we collect more and more biodata, bioinformatics will be essential to any scientific discovery. Without bioinformatics and the ability to leverage computer science tools to big data, understanding and concluding biodata would be very hard.

The goals of bioinformatics

The main goals of bioinformatics are:

  • Organise biodata so that it becomes easily accessible and searchable

  • Develop software to help analyse biodata

  • Analyse and accurately interpret biodata from a biological perspective

The roles of bioinformatics

One of the main tools created by bioinformatics was databases. Several hundred databases hold different types of biological data like complete genomes and gene sequences. Databases allow the data to be stored and searched logically, enabling comparisons and links to be made that would have otherwise escaped the naked eye. These databases have increasing amounts of data that are growing at an exponential rate as we sequence more DNA.

Evolutionary relationships between organisms are examples of links that bioinformatics tools can make.

When comparing genomes present in these databases, sequence similarity can be assessed. Increasing DNA sequence similarity is indicative of recent common ancestry. These tools allow us to build evolutionary trees and see how life relates to each other because knowing the basic mutation rate of DNA and how similar two sequences/genomes are, we can infer when two genetic sequences from different species diverged from a common ancestor.

The mutation rate describes the amount of change a DNA sequence has undergone in a given period of time.

In 2014 bioinformatics databases had over 6 x 1011 base pairs of sequence data. This is roughly the equivalent of 200 human genomes and is probably even larger today!

Popular bioinformatics databases include the Ensembl database, which holds genomes of eukaryotic organisms like the human genome. Ensembl also includes the genomes of other important model organisms like the zebrafish, house mouse or the fruit fly. Other popular databases include GenBank and DDBJ.

Model organisms are organisms that are frequently used in biomedical research!

The BLAST (Basic Local Alignment Search Tool) tool is one of bioinformatics most relevant software algorithms used today. The BLAST tool allows researchers to compare millions of primary biological sequences present in the database with minimal effort. These comparisons help find sequence similarities amongst unknown sequences researchers are studying with those already present in the database.

As our knowledge of the DNA coding sequencing of living organisms’ genomes grew through DNA sequencing, so did our knowledge of what it codes: proteins. Knowing the genetic code of life, we can decipher what a gene encodes, meaning the protein its transcription and translation might result in. Databases were also created to contain the resulting amino acid sequences of proteins and protein structures, like UniProt (Universal protein resource). UniProt contains various amino acid sequence data alongside its respective protein function.


Bioinformatics is closely related to another emergent field in bioscience known as computational biology. The bioinformatics field created the computational biology field. Whereas bioinformatics collects and processes vast amounts of biodata, computational biology uses such data to construct theoretical models of biological systems. These models try to predict, for example, 3D structures of proteins or help identify specific genes linked to diseases in populations.

Computational biology is the study of biology through computational modelling software.

The benefits of bioinformatics to society

The ability to analyse large sets of biodata through bioinformatics has made it easier to understand DNA and its meaning and influence in our lives.

For example, as the result of sequencing and analysing the human genome, 1.4 million single nucleotide polymorphisms (SNP) were found.

SNPs are the most common genetic variation consisting of single-base variations caused by inherited point mutations in the DNA. The number of SNPs discovered since the HGP has greatly increased, and most of them are innocuous. However, some SNPs are associated with an increased risk of diseases like diabetes or heart disease.

Screening for such variations allows early detection and treatment of potential medical problems.

Bioinformatics Bioinformatics benefits to society, study smarterFig. 2 - Single Nucleotide Polymorphism

As our knowledge of the genome and proteome of other organisms also increases, new revelations and possibilities regarding those organisms' utility to improve human life and the environment also emerge.

The proteome refers to all the proteins produced by an organism.

Analysing the genome of parasites, like the malaria-causing parasite Plasmodium falciparum, is fuelling research on how to fight this disease and control the parasite, namely through the development of vaccines. This parasite’s genome has been fully sequenced, and all 5300 of its genes can be found in databases, helping us understand its proteome and metabolism.

By sequencing and analysing their genome and proteome, identifying how organisms can withstand extreme temperatures or other lethal environmental conditions can have various biotechnological applications like producing biofuels or cleaning up pollutants.

Bioinformatics - Key Takeaways

  • Bioinformatics is an interdisciplinary field of bioscience that develops methodologies to collect, process, and analyse large amounts of raw biological data using computer science tools.

  • The main goals of Bioinformatics are: to organise biodata so that it becomes easily accessible and searchable; develop software to help analyse biodata; analyse and accurately interpret biodata from a biological perspective.

  • One of the main tools created by bioinformatics was databases. Databases allow the data to be stored and searched logically, enabling comparisons between the biodata.

  • Popular bioinformatics tools include Ensembl, BLAST, UniProt, GenBank and DDBJ.


1. Francis Collins, A vision for the future of genomics research, Nature, 2003

Frequently Asked Questions about Bioinformatics

Bioinformatics is limited when considering the differences in how available data is analysed, presented and annotated.  

Bioinformatics is an interdisciplinary bioscience field that develops methodologies to better collect, process and analyse large amounts of raw biological data using computer science tools.

Bioinformatic tools include, for example the Ensemble or UniProt databases.

Bioinformatics leverages computer science tools to analyse biodata so that we can better understand complex biological systems.

The main goals of Bioinformatics are: organise biodata so that it becomes easily accessible and searchable; develop software to help analyse biodata; analyse and accurately interpret biodata from a biological perspective.

Test your knowledge with multiple choice flashcards

The BLAST tool allows sequences to be compared for sequence ______.

What does SNP stand for?

Computational Biology tools can help determine the 3D structures of ______.

Next

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App