1 Mutational signatures
“Underlying cancer hallmarks are genome instability, which generates the genetic diversity that expedites their acquisition, and inflammation, which fosters multiple hallmark functions” (Hanahan 2011). Cancer genomes typically harbors more than 1,000 somatic mutations in small (e.g., point mutations, short insertions and deletions) and large scale (e.g., copy number variations, rearrangements). DNA contexts where mutation may accumulate in response to both endogenous processes and exogeneous exposures (Alexandrov et al. 2013). In recent years, computational approaches including non-negative matrix factorization (NMF) have been applied to the mutation catalog of human/mouse tumors to detect characteristic DNA mutational patterns (Alexandrov et al. 2013, 2020; Wang et al. 2021), also known as “mutational signatures”.
1.1 Biological significance of mutational signature
To better illustrate the biological significance of mutational signatures, we show some well organized figures here.
1.1.1 COSMIC signatures
SBS (single base substitution, or SNV) signature is a famous and well-established type of mutational signature. SBS signatures are well studied and related to single-strand changes, typically caused by defective DNA repair. Common etiologies contain aging, defective DNA mismatch repair, smoking, ultraviolet light exposure and APOBEC family members.
Currently, all SBS signatures are summarized in COSMIC database, it has two versions: v2 and v3.
Recently, Alexandrov et al. (2020) extends the concept of mutational signature to three types of alteration: SBS, DBS (short for doublet base substitution) and INDEL (short for short insertion and deletion). All reported common signatures are recorded in COSMIC (https://cancer.sanger.ac.uk/cosmic/signatures/), so we usually call them COSMIC signatures.
SBS signatures:
DBS signatures:
INDEL (i.e. ID) signatures:
1.1.2 Copy number signatures
Unlike several mutation types presented in current COSMIC database for generating mutational signatures, it is hard to represent copy number features and generate the matrix for NMF decomposition.
Macintyre et al. (2018) created a new method to generate the matrix for extracting signature by NMF algorithm. The steps are:
- derive 6 copy number features from absolute copy number profile
- apply mixture modeling to breakdown each feature distribution into mixtures of Gaussian or mixtures of Poisson distributions
- generate a sample-by-component matrix representing the sum of posterior probabilities of each copy-number event being assigned to each component.
Based on the work, we devised a new method which discards the statistical modeling and create a fixed number of predefined components from 8 copy number features to generate the matrix as the input of NMF (Wang et al. 2021), it is easier to reproduce the result, apply to different cancer types and compare results. To test if the method would works, we applied it to prostate cancer and successfully identified 5 copy number signatures (Wang et al. 2021).
Currently, there are few studies focus on copy number signatures and no reference signature database for matching and explaining the etiologies (The signatures presented in two papers above can be used as references). If you study them, you should do extra work to explore and validate them. Furthermore, the input absolute copy number data may be generated by different methods and platforms, it is normal that the contribution of some copy number feature components varies a little and result in relatively lower signature similarity when comparing different cohorts or different copy number profile generation methods.
Copy number signatures Wang et al approach:
In addition, Alexandrov team adopted a similar approach for generating signatures like current COSMIC signatures and described them in ~10,000 human tumors (Steele et al. 2021). Many interesting results have been reported and this would be a standard approach for allele-specific copy number profile.
Copy number signatures Steele et al approach:
1.1.3 Genome rearrangement signatures
Genome rearrangement signatures (RS) also used COSMIC signature like approach to generation rearrangement classes in each tumor sample. RS are limited to whole genome sequencing data and also less studied. RS has been successfully applied to 560 breast tumor WGS data (Nik-Zainal et al. 2016) and linked to clinical outcomes in high grade serous ovarian cancers (Hillman et al. 2018).
1.1.4 More information
More information about mutational signatures you can read this wiki page and COSMIC signature data page.