The cancer genome is shaped by various mutational processes over its lifetime, stemming from exogenous and cell-intrinsic DNA damage, and error-prone DNA replication, leaving behind characteristic mutational spectra, termed mutational signatures. This package, sigminer, helps users to extract, analyze and visualize signatures from genome alteration records, thus providing new insight into cancer study.
For pipeline tool, please see its co-evolutionary CLI sigflow.
Copy number signatures:
INDEL (i.e. ID) signatures:
Genome rearrangement signatures:
Sigminer provides many approaches to extract mutational signatures. To test their performances, I use 4 mutation catalog datasets (each mutation catalog dataset is composed of 30 samples, 10 COSMIC v2 (SBS) signatures are randomly assigned to each sample with random signature exposure) from reference #6. The following table shows how many signatures can be recovered and the corresponding average cosine similarity to COSMIC reference signatures for each approach with settings.
|Approach||Selection Way||Setting||Caller||Recommend||Driver||Set1||Set2||Set3||Set4||Success /Mean||Run time||Note|
|Standard NMF||Manual||Default. 50 runs (estimation) + 100 runs (extraction)||
||YES ⭐⭐⭐||R||10 (0.884)||10 (0.944)||9 or 10 (0.998)||10 (0.994)||~90%/0.955||~1min (8 cores)||This is a basic method, suitable for good mutation data with enough mutations.|
|SigProfiler||Manual/Automatic||Default. 100 runs||
||YES ⭐⭐⭐⭐||Python/Anaconda||10 (0.961)||10 (0.999)||10 (0.990)||10 (0.997)||100%/0.987||~1h (8 cores)||A golden standard like approach in this field, but longer run time, and the requirement for Python environment and extra large packages reduce its popularity here.|
|Best Practice||Manual/Automatic||Use bootstrapped catalog (1000 runs)||
||YES ⭐⭐⭐⭐⭐||R||10 (0.973)||10 (0.990)||10 (0.992)||10 (0.971)||100%/0.981||~10min (8 cores)||My R implementation for methods from reference #5 and #6. Should be the best option here. (Pay attention to the suggested solution)|
|Best Practice||Manual/Automatic||Use original catalog (1000 runs)||
||NO ⭐||R||10 (0.987)||9 (0.985)||10 (0.997)||9 (0.987)||50%/0.989||~10min (8 cores)||This is created to compare with the approach with bootstrapped catalogs above and the standard NMF way.|
|Bayesian NMF||Automatic||L1KL+optimal (20 runs)||
||YES ⭐⭐⭐||R||10 (0.994)||9 (0.997)||9 (0.998)||9 (0.999)||25%/0.997||~10min (8 cores)||The Bayesian NMF approach auto reduce the signature number to a proper value from a initial signature number, here is 20.|
|Bayesian NMF||Automatic||L1KL+stable (20 runs)||
||YES ⭐⭐⭐⭐||R||10 (0.994)||9 (0.997)||10 (0.988)||9 (0.999)||50%/0.995||~10min (8 cores)||See above.|
|Bayesian NMF||Automatic||L2KL+optimal (20 runs)||
||NO ⭐||R||12 (0.990)||13 (0.988)||12 (0.902)||12 (0.994)||0%/0.969||~10min (8 cores)||See above.|
|Bayesian NMF||Automatic||L2KL+stable (20 runs)||
||NO ⭐||R||12 (0.990)||12 (0.988)||12 (0.902)||12 (0.994)||0%/0.969||~10min (8 cores)||See above.|
|Bayesian NMF||Automatic||L1WL2H+optimal (20 runs)||
||YES ⭐⭐⭐||R||9 (0.989)||9 (0.999)||9 (0.996)||9 (1.000)||0%/0.996||~10min (8 cores)||See above.|
|Bayesian NMF||Automatic||L1WL2H+stable (20 runs)||
||YES ⭐⭐⭐⭐||R||9 (0.989)||9 (0.999)||9 (0.996)||9 (1.000)||0%/0.996||~10min (8 cores)||See above.|
NOTE: although Bayesian NMF approach with L1KL or L1WL2H prior cannot recover all 10 signatures here, but it is close to the true answer from initial signature number 20 in a automatic way, and the result signatures are highly similar to reference signatures. This also reminds us that we could not use this method to find signatures with small contributions in tumors.
You can install the stable release of sigminer from CRAN with:
You can install the development version of sigminer from Github with:
remotes::install_github("ShixiangWang/sigminer", dependencies = TRUE) # For Chinese users, run remotes::install_git("https://gitee.com/ShixiangWang/sigminer", dependencies = TRUE)
You can also install sigminer from conda
bioconda channel with
# Please note version number of the bioconda release # You can install an individual environment firstly with # conda create -n sigminer # conda activate sigminer conda install -c bioconda -c conda-forge r-sigminer
A complete documentation of sigminer can be read online at https://shixiangwang.github.io/sigminer-doc/ (For Chinese users, you can also read it at https://shixiangwang.gitee.io/sigminer-doc/). All functions are well organized and documented at https://shixiangwang.github.io/sigminer/reference/index.html (For Chinese users, you can also read it at https://shixiangwang.gitee.io/sigminer/reference/index.html). For usage of a specific function
?fun in your R console to see its documentation.
If you use sigminer in academic field, please cite one of the following papers.
Wang, Shixiang, et al. “Copy number signature analyses in prostate cancer reveal distinct etiologies and clinical outcomes” medRxiv (2020) https://www.medrxiv.org/content/10.1101/2020.04.27.20082404v1
Shixiang Wang, Ziyu Tao, Tao Wu, Xue-Song Liu, Sigflow: An Automated And Comprehensive Pipeline For Cancer Genome Mutational Signature Analysis, Bioinformatics, btaa895, https://doi.org/10.1093/bioinformatics/btaa895
Please properly cite the following references when you are using any corresponding features. The references are also listed in the function documentation. Very thanks to the works, sigminer cannot be created without the giants.
The software is made available for non commercial research purposes only under the MIT. However, notwithstanding any provision of the MIT License, the software currently may not be used for commercial purposes without explicit written permission after contacting Shixiang Wang firstname.lastname@example.org or Xue-Song Liu email@example.com.
MIT © 2019-Present Shixiang Wang, Xue-Song Liu
MIT © 2018 Anand Mayakonda
Cancer Biology Group @ShanghaiTech
Research group led by Xue-Song Liu in ShanghaiTech University