Workflow Wednesdays – Sorting alignments

Most variant calling softwares require sorted alignments as an input. There are two main ways to sort a SAM or BAM file: you can sort the lines by read names or you can sort the aligned reads by leftmost coordinates.

Similarly to format conversion, you can use Samtools or Picard for sorting alignment files. To sort a BAM file by leftmost coordinate, you can use the following syntax (note, that for the sorted file, you shouldn’t specify the file extension, otherwise, you’ll end up with a file named “something.bam.bam”):

 samtools sort SRR797242_bwa_bwasw.bam SRR797242_bwa_bwasw_sorted

To sort by read names, instead of coordinates, you have to add the “-n” flag:

 samtools sort -n SRR797242_bwa_bwasw.bam SRR797242_bwa_bwasw_sorted_RN

If you prefer Picard, you should use the “SortSam” command. To sort an alignment by coordinate, you should use the following command:

  java -jar SortSam.jar I=SRR797242_bwa_bwasw.bam O=SRR797242_bwa_bwasw_coord_sorted_picard.bam SO=coordinate

To sort by read name, you should use the following syntax:

java -jar SortSam.jar I=SRR797242_bwa_bwasw.bam O=SRR797242_bwa_bwasw_name_sorted_picard.bam SO=queryname

If you’re not sure, whether a sam/bam file is sorted by readname or chromosomes or even sorted at all, you should check out the SO field in the header of the file. For example, the header of a coordinate sorted bam file looks like this:

$ samtools view -H SRR797242_bwa_bwasw_coord_sorted_picard.bam
@HD     VN:1.4  SO:coordinate
@SQ     SN:gi|49175990|ref|NC_000913.2| LN:4639675