Workflow Wednesdays – Part 1. Introduction, datasets, plans

 

You can find the selected data here (the first link is the NCBI SRA link, the second is the EBI ENA  ftp link)

454 data: NCBI, EBI

Illumina data: NCBI, EBI read 1, EBI read 2

Ion Torrent data: NCBI, EBI

An informative video about E. coli (Source: IAQ Video Network):

Here is a list of bioinformatics related tasks I plan to discuss, not necessarily in this order  (links will be added later):

  • Read preprocessing
  • Reference
  • Targeted vs. whole reference alignment
    • Running the alignments (1, 2, 3)
    • Alignment statistics (coverage analysis, proper pairs…)
    • Manipulating an alignment file (format conversion, indexing)
    • Subsampling, getting reads/regions out of an alignment
    • Indel realignment
    • Quality recalibration
    • Duplicate marking
    • Sorting
    • Merging
    • Filtering
    • De novo, semi de novo assembly
    • Visualisation (sam/bam, de novo)
  • Variant call
    • Available tools (GATK, Samtools, also smaller tools like VarScan,…)
    • Variant filtering
    • Variant effect prediction (known and unknown variants)
    • Visualisation
  • Submitting to databases