Workflow Wednesdays – Short read aligners – Omixon Variant Toolkit

Omixon Variant Toolkit (aka Omixon Read Mapper or ORM)is our “own” alignment algorithm which is available as a standalone java application or in Omixon Target. The aligner actually contains two, slightly different algorithms*: one for shorter Illumina reads, called “bridge” and one for longer (IonTorrent and 454) reads, called “lace”. Both methods do gapped alignment.

If you are interested in tying out the command line version of the toolkit, see here. You can also download a trial version of Omixon Target, from here.

The toolkit has predefined profiles (i.e. basically parameter sets) for different kind of data and advanced users can also set any of the parameters by hand.

I will mostly concentrate on the command line version in this post, as for the GUI, extensive documentation is available (check out the Quick Start Guide or the User Manual). For the standalone version, documentation is available in form of a sample parameter file called “orm.default.properties” which you can find the toolkit jar.

The easiest way to run ORM is to use the built-in profiles. The first thing we should do is to create a config file, which can be “fed to” the Omixon Variant Toolkit, using the following syntax:

java -jar toolkit.jar -config orm.advanced.properties

Let’s try to create a config file for our Illumina example data:

#I will use "vim" which is a command line text editor/manipulator tool. You can of course create the properties file with any kind of text editor (on the other hand, using any version of Office is a big no-no.)
#First, let's create a file named "orm.advanced.properties" and open it in vim, in a single step!
vim orm.advanced.properties
#To edit the file, you have to press "INSERT", after that, --INSERT-- should appear in the bottom line of your screen.
#Now, you can edit the file.
#You should add the following lines:
# the name of this process = orm (Omixon Read Mapper) 
toolkit.process=orm

#Then, you have to select the name and location of input and output files:
# the input reference url to use
 (i.e. the relative PATH to the reference fasta file you'd like to use)
orm.referenceUrl=../NC_000913.2.fa
# the url of where to write the output (i.e. the name and location of the sam aoutput)
orm.outputUrl=SRR022913_ORM.sam
# the url to use for the input fastq file
# for mate pair two input files will be required for each set in the orm.inputUrl, separated by a comma
orm.inputUrl=SRR022913_1.fastq,SRR022913_2.fastq
#Now, that you specified the input and output files, you have to turn on paired mode.
# turn on mate pair mode
orm.mate.mode=true
orm.mate.orientation=FR

#You can also define some additional parameters, but the default value is usually fine for these:
# defines the orientation for the fragments. Possible values (case is ignored) are 

# FF or tandem, FR or inward, and RF or outward.
orm.mate.orientation=FR
# define the minimum and maximum template lengths. According to the SAM Format Specification v1.4,
# the template length extends from the leftmost mapped base to the rightmost mapped base. In 
# other words, it equals the gap between the two mappings plus the two read lengths.
orm.mate.template.min=-250
orm.mate.template.max=600
# whether the output should also include unpaired mappings for the reads.
orm.output.unpaired=false
#orm.output.unpaired=true

#Now, we just have to select the profiles that fit the example data and we are ready to roll!
#These fitting profiles would be "Illumina" and "bacterial"
orm.profile=bacterial25,illumina

#To save the changes and exit the editor: press Esc, then SHIFT+":", then "x"

Now, you can run the toolkit. If you have loads of memory available, you can also set the memory for the java machine higher:

java -jar -Xmx5g toolkit.jar -config orm.advanced.properties

* The toolkit actually has an aligner for SOLiD reads as well.