Bioinformatics for Beginners – Finding known somatic mutations

I’ve already written about mutation databases a few weeks back (you can read my post here), but there is a big group of variations I haven’t mentioned in my post. These variants are somatic mutations (if you’re not familiar with the main classes of mutations, check out this page). Very shortly: somatic variations are mutations that can occur in any cell of the body, except germ cells. Therefore, these mutations are not passed on to children. Specific kinds of somatic mutations can cause cancer or other diseases.

There are several ongoing projects around that aim to identify and collect somatic mutations. Here’s a collection of databases and other useful links, feel free to share additional resources in the comments!

COSMIC (Catalogue Of Somatic Mutations In Cancer)

  • One of the most comprehensive collections of somatic mutations.
  • Contains information about mutations, genes and samples.
  • Data can be accessed via an online search engines or downloaded from the Sanger Institute ftp site.

ICGC (International Cancer Genome Consortium)

The Cancer Genome Atlas

  • Collection of biological specimens from cancer patients.
  • Specimens are analysed with different methods.
  • Open access and controlled access datasets are available.

IntoGEn (Integrative Onco Genetics)

  • Has mutation information about different types of cancers.
  • Also as info about driver mutations (see paper here).
  • Offers mutation analysis tools.

cBioPortal for Cancer Genomic

  • They are using large-scale cancer genomic data sets.
  • Data from more than 10 000 cancer samples.
  • Data is available via a web API or R/MatLab packages.
  • Different types of information is available (including variants for some of the datasets).

Leukemia Gene Atlas

  • Mainly contains information about DNA-methylation, gene expression, copy number/genotype.
  • Impressive literature collection.
  • Has links for COSMIC.

PCGP (Pediatric Cancer Genome Project)

  • Concentrates on childhood cancer types.
  • Not somatic mutations, but mostly cancer related germline mutations.
  • Mutation data can be downloaded in bed/csv format. Raw data is controlled-access.

SM-EGFR-DB (Somatic Mutations in Epidermal Growth Factor Receptor DataBase)

  • Main focus is non-small cell lung cancer, but has information about other cancer types.
  • Contains information published in peer reviewed journals.
  • By the looks of it, the database hasn’t been updated since 2008.

SomamiR DB  (Somatic mutations impacting microRNA targeting)

  • One of the more specialized databases concentrating on miRNA related mutations.
  • Fairly new (see article here).
  • Database can be downloaded in the form of tab limited text files.

IARC TP53 Database

  • Contains information about mutations found in the TP53 gene.
  • Both somatic and germline mutations.
  • Also has some info about cell lines, mouse models, etc.

BIC (Breast Cancer Information Core)

  • Collection of information about breast cancer.
  • Accessing the data requires BIC membership, but the membership is open for anyone.

RCGDB (The Roche Center cancer Genome Database)

  • Looks like a fairly comprehensive somatic mutation collection, but hasn’t been updated since 2010.
  • Contains information about cancer related genes, mutations, etc.

Progenetix

  • Concentrates on genomic copy number aberrations in cancer.

Oncoreveal

  • Expression data for cancer-associated genes.