Extracting regions from a fasta file
If you are doing targeted sequencing, it’s usually a good idea to use a relatively large reference sequence (e.g. a whole chromosome) to avoid problems caused by mismappings. It can still be very useful sometimes to use a “subset” or “subsample” of the reference sequence for an alignment to save computing time, investigate alignment problems or other reasons. To get a specific region from a fasta file, you can use the Bedtools suite’s “getfasta” function. To use this function, you’ll need a bed file, containing the coordinates of the required region(s). As this is just a test run, let’s select the first gene from the NCBI record of the reference genome and create a bed file by hand! The first gene is the “thrL” which is between positions 190 and 255.