Workflow Wednesdays – Part 3. Read preprocessing – Read quality control 2. – QC results

Text output

Both FastQC and PRINSEQ can generate some basic statistics for each fastq file. For FASTQC these are the following: “Total Sequences” (i.e. the number of reads), “Filtered Sequences”, “Sequence Length” (read length range) and “%GC” (average GC-content). Additionally, tables for “Overrepresented sequences” and “Kmer content” are generated. PRINSEQ calculates the following measures:

Continue reading

Bioinformatics for Beginners – File formats: Part 1. Reference sequences

The most widely used file format for reference sequences is the fasta format. Both nucleotide and protein sequences can be represented in fasta format.

A fasta formatted file begins with a single-line description, followed by the sequence data. The description line starts with a greater-than (“>”) symbol. In the next line, the nucleotide or protein sequence starts. This sequence can be in a single line, but usually it’s broken into shorter, uniform length lines. Coding of the sequences follows the IUPAC code.

Continue reading

Omixon on LinkedIn!

We see amazing potential in using social media channels for spreading the word about next generation sequencing, HLA typing and bioinformatics in general. This is why we have been writing this blog, updating the Facebook and Twitter channels and now we open a new channel on LinkedIn.

We hope to see you there too!

Omixon on LinkedIn!