Variant Calling Workflow

Overview

Teaching: 35 min
Exercises: 25 min
Questions
  • How do I find sequence variants between my sample and a reference genome?

Objectives
  • Understand the steps involved in variant calling.

  • Describe the types of data formats encountered during variant calling.

  • Use command line tools to perform variant calling.

We mentioned before that we are working with files from a long-term evolution study of an E. coli population (designated Ara-3). Now that we have looked at our data to make sure that it is high quality, and removed low-quality base calls, we can perform variant calling to see how the population changed over time. We care how this population changed relative to the original population, E. coli strain REL606. Therefore, we will align each of our samples to the E. coli REL606 reference genome, and see what differences exist in our reads versus the genome.

GO directly to the next episode.

Key Points

  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you will need to use the command line.

  • There are many different file formats for storing genomics data. It is important to understand what type of information is contained in each file, and how it was derived.