Flashcard Fridays – Comparison of variant callers

A survey of tools for variant analysis of next-generation genome sequencing data. Pabinger et al. 2013

Following last week”s variant caller themed FF post, I will present you a very great article about variant detection and annotation. The paper gives a great review of basically every aspect of NGS based variant detection from a whole NGS analysis workflow and available tools to specific problems (e.g. detection of somatic mutations).

My favourite part is Figure 2, which illustrates perfectly, that there is no perfect variant caller around, even for germline mutations. Each tool finds (and misses) a different set of variants.

Flashcard Fridays - Comparison of variant callers

Figure 2 from Pabinger et al. 2013: Venn diagrams showing the number of identified variants for tested germline (A), somatic (B), CNV (C) and exome CNV (D) tools. The depicted numbers in (A) and (B) report identified SNPs and INDELs.

Omixon’s upcoming HLA typing webinar

Broadcast date: 3rd July 2013, Wednesday

Time: 12.00-1.00 PM EDT/ 18.00-19.00 CET

Come and join our CEO, Attila Berces online in a presentation and demo of how Omixon Target HLA typing will bring you value during the analysis of NGS data.

Our guest speaker is Dr. Dimitri Monos, University of Pennsylvania and The Children”s Hospital of Philadelphia, who will talk about his experiences with HLA typing protocol development on NGS platform.

Currently, there are 69 on-going clinical trials investigating HLA as a potential biomarker for safety or efficacy.  It appears that beside being the most important marker in transplantation, HLA is becoming a more important biomarker for cancer therapeutic development as well. The analytical performance of NGS-based genetic tests highly depends on the bioinformatics software. Although the current false variant rate can be acceptable for research market, it is simply unsuitable for making clinical decisions. Omixon tailors the analysis for the sequencer, amplification method, primer kit, and the characteristics of the gene target itself.  This approach results in a robust and highly accurate method to identify genetic variants.

Register today!

In this webinar you will learn:

  • How to reduce ambiguity and increase resolution
  • How to achieve the highest accuracy
  • How to use Omixon Target HLA as an HLA-typing tool
  • How to increase efficiency by reducing effort, time and cost

Who should attend?

  • Scientists and researchers from HLA-typing labs
  • Molecular biologists working with NGS
  • Medical professionals working in transplantation,oncology and immunology
  • Bioinformathics working in transplantaion, oncology and immunology

Presenting:

  • Dr. Dimitri Monos
  • Attila Berces, PhD

A live Q&A session will follow the presentation, offering you the opportunity to put forward your questions.

Reserve your webinar seat here.

Omixon's upcoming HLA typing webinar

Workflow Wednesdays – Part 3. Read preprocessing – Read quality control 1. – Running QC tools

Read quality control tools can provide very useful information about the the success of your sequencing experiment, without the need to run time consuming alignments and variant calls. Based on read length, per base quality, base content and other basic statistics you can find out a lot about your data. You can decide, whether a pre-alignment processing is needed for the particular read file (e.g. adaptor trimming, quality based trimming). You can sometimes find interesting clues, that can lead you to problems with basically any step of the sequencing workflow, from sample collection to the actual sequencing step. For example, an unusually high GC-content in a subset of the samples can lead you to bacterial plasmid contamination in the  library preparation step.

Continue reading

Bioinformatics for Beginners – How to get NGS data? Part 2. Reference sequences

The most obvious (and probably the most used) source of reference sequences (or any kind of sequences) is the NCBI Nucleotide database and its “sister” sites: the EBI European Nucleotide Archive, and the DNA Data Bank of Japan.

All three sites provide some kind of search functionality and a few (shorter) sequences can be downloaded from the result pages directly, in multiple formats. For larger reference sequences (e.g. full human or mouse chromosomes or full genomes) or a long list of references the ftp sites or batch query tools should be used.

Ftp sites:

Batch search/Download tools:

There are some other pages with a more limited focus, that can be very useful for retrieving reference sequences:

Tip: the Genome Analysis Toolkit (GATK) needs the human chromosomes in a special order (karyotypic, to be precise). Different versions of the karyotypically sorted human genome are provided by the GATK team as a resource bundle, which can be downloaded from their ftp site.