Genotyping Engine Race – Measuring Performance

Designing an Experiment

Designing a validation study for any kind of tool can be a great challenge. For cutting edge technology, this challenge can be even greater. In this blog post I’ll try to cover some problematic aspects and methodological challenges that one can bump into when validating an NGS based HLA typing method.

We don’t even have fool’s gold

Similarly to sequencing in general, Sanger sequencing based methods are still considered as a gold standard for HLA genotyping. Although this is not without a reason, generating high resolution HLA types with high accuracy using solely Sanger sequencing gets harder and harder as the number of known HLA alleles increases. There are also some general technical difficulties, caused simply by the nature of the method. For example, phasing between heterozygous positions can be difficult or even impossible. Due to the length limitations of Sanger sequencing, usually only a subset of the gene regions is covered even when multiple amplicons are sequenced, this can lead to additional difficulties during allele selection.

When shopping around for samples or cell lines that could be used for a reference study, an additional problem arises: even when reference genotyping is available for the so called reference samples, it is not always the result of modern typing methods (i.e. SBT), also, in some cases, the typing method is not even easily available. So, before ordering cell lines or DNA for a few hundred samples, make sure that you know exactly what you’re getting yourself into.

Anything that can go wrong, will go wrong

People make mistakes, samples get mixed up, the DNA doesn’t always end up in the correct tube, random errors can occur. So, to avoid catastrophe (and also random positive effects), it is always wise to have redundancy. I.e. sequence and genotype at least a subset of your samples twice (or even more times) and make sure to have repeats when needed (e.g. within and between manufacturing lots). It also makes sense to make technical repeats as independent as possible, otherwise you might end up measuring how good your lab tech is.

Diversify your bonds

Although, in general, it might seem like a good idea to do validation on random samples, there are some aspects that must be considered when validating HLA data:

  1. There is a copious amount of known (and as of now, unknown) alleles for most HLA loci. Therefore, covering all alleles in a reasonably sized validation set is not really an option.

defined_alleles

2. Allele frequencies of a single allele can vary greatly between different populations. So it might happen that a method works great in one population, but fails miserably in another.

allele_freq

3. Common alleles can also greatly vary between populations.

allele_map1 allele_map2

4. Don’t forget the special little snowflakes (e.g. rare alleles, alternatively expressed alleles, homozygous loci).

snowflakes

5. Chicken or egg problem: high resolution typing for all loci would be needed for sample selection, but high resolution typing is not available before going through the whole workflow.

trex_chicken

In an ideal world…

  • High resolution typing is available for a high amount of commercially available samples, typing methods are well documented, the samples are regularly retyped.
  • These samples originate from diverse populations, from all around the world.
  • All common and uncommon alleles are well represented in the sample pool.
  • High quality genomic DNA is available for all the samples.

In reality…

  • Some reference panels are available.
  • Typing resolution is generally low and information is only available for a small subset of the HLA loci. (Or, the resolution is high, and information for 10+ loci is available, but the sample pool is small.)
  • Reference data is only available for a few selected “unusual” alleles.

So what should we do?

  • Shop around. Maybe something new is available.
  • Ask questions before final decision (e.g. typing method, date of typing).
  • Try to cover as many alleles as possible.
  • Take a step back, consider genotypes, haplotypes, populations.
  • When in doubt, do confirmatory retyping.
  • Consider alternative sample sources (e.g. clinical samples).

– By Krisztina Rigó (to be continued…)