Workflow Wednesdays – Alignments – Format conversion

I could go on forever and ever about different short read aligners, but I think you get the gist of it. So let’s talk about alignment files! Most modern reference based aligners produce their output in sam format. If you’re not familiar with this file format, check out my post about alignment file formats!

As “sam” files are simple text files, they can take up a lot of place on your hard disk. To deal with this problem, the “bam” format was created, which is basically a binary, compressed variation of the sam format.

After creating an alignment, the first thing you should do is to convert your sam file into bam format. Let’s see how you can do this! There are two main possibilites: you can use either Samtools or Picard. I usually use Samtools, as I’m more familiar with the syntax, but the output of the two methods should be identical.

Let’s see Samtools first!

If your sam file has an SQ line in the header, you can use the following command:

samtools view -bS SRR022913_bwa_backtrack.sam > SRR022913_bwa_backtrack.bam

If your sam file doesn’t have an SQ line, you have to have the indexed reference sequence file and you can use the following command:

samtools view -bt NC_000913.2.fa SRR797242_bwa_bwasw.sam > SRR797242_bwa_bwasw.bam

 

If, for some reason, you want to convert back your bam file to sam format, you can do that as well:

samtools view -h -o SRR022913_bwa_backtrack.bamtosam.sam SRR022913_bwa_backtrack.bam

Keep in mind, that you can “get inside” the bam file with samtools view, without actually recreating the original sam file. A few examples:

#You can print all reads in sam format to STDOUT with:
samtools view SRR022913_bwa_backtrack.bam
#You can print the header with:
samtools view -H SRR022913_bwa_backtrack.bam
#You can print reads with a mapping quality higher than a given threshold:
samtools view -q 10 SRR022913_bwa_backtrack.bam

If you are interested in more options, check the Samtools manual.

As I’ve mentioned before, you can also convert sam files to different formats with Picard. To format your sam file to bam, use the following syntax:

java -jar SamFormatConverter.jar INPUT=SRR022913_bwa_backtrack.sam OUTPUT=SRR022913_bwa_backtrack_picard.bam
#To do a BAM to SAM conversion, you just have to reverse the input and output:
java -jar SamFormatConverter.jar INPUT=SRR022913_bwa_backtrack.bam OUTPUT=SRR022913_bwa_backtrack_picard.sam

Similarly to the Samtools view command, Picard has a SAM/BAM viewer as well, called ViewSam. Picard has a whole list of tools for manipulating and fixing SAM/BAM files, but more about those later.