Flashcard Fridays – Part 2. Let’s talk about variant callers!

If you’re interested in the underlying algorithm of variant callers, it’s always a safe bet to check out the paper(s) about the variant caller. So here are the papers for the two most commonly used variant callers:

SAMtools

1. The Sequence Alignment/Map format and SAMtools. Li et al. 2009

Although, this is the first publication about SAMtools, and according to the SAMtools webpage, it is the recommended article for citing the tool, this paper is mostly about the SAM format. If you’re interested in the actual algorithm, check out the next paper, which is about the quasi-predecessor of SAMtools, called Maq.

2. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Li et al. 2008

As I’ve mentioned before, this article is about Maq, which was an aligner and variant caller developed by Heng Li. This tool is not updated any more, but as it shares some algorithms with SAMtools, it’s definitely worth reading. Be sure to check out the supplement as well, as some important details are mentioned there (e.g. about quality calibration).

GATK

3. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. McKenna et al. 2010

This is a fairly detailed description of GATK, including some details about the architecture of the GATK suite, plus some usage examples (see the figure below).

Flashcard Fridays - Part 2. Let’s talk about variant callers!

Figure 3 from McKenna et al. 2010: MHC depth of coverage in JPT samples of the 1000 Genomes Project pilot 2, calculated using the GATK depth of coverage tool.

4. A framework for variation discovery and genotyping using next-generation DNA sequencing data. DePristo et al. 2011

This article presents an analysis workflow using GATK. It’s not really about algorithmic details, it’s more like practical guide, which presents an analysis framework and some results, using real life data from the probably the most sequenced human on this planet, whom we lovingly call NA12878.