Workflow Wednesdays – Part 1. Introduction, datasets, plans

We created an in-house training material for new employees a while ago, which contained a single page, very short workflow for bioinformatics analysis tasks we usually do. I thought it would be a great idea, to go through this workflow step-by-step, collect some tools that can be used for each task and show some examples using data from all main sequencing platforms. Note, that we mostly do sequence based alignments, so de novo assemblies will only be briefly mentioned.

I will use open access data sets, so you can reproduce each step if you want to. Strain K12 substrain MG1655 of E. coli was selected as an example, because I could find Illumina, 454 and Ion Torrent reads for this substrain. I know, that there are a few very useful blog posts around (e.g. here and here), using the same (or very similar) datasets. Unlike the previously mentioned blog posts, comparison of different sequencing platforms is not an objective of these posts.