Chapter 2 COSMIC Signature Identification
In this chapter, we will introduce how to identify COSMIC signatures from records of variant calling data. The COSMIC signatures include three type of signatures: SBS, DBS and ID (short for INDEL).
The signature identification procedure has been divided into 3 steps:
- Read mutation data.
- Tally components: for SBS, it means classifying SBS records into 96 components (the most common case) and generate sample matrix.
- Extract signatures: estimate signature number and identify signatures.
2.1 Read Data
Make sure
library(sigminer)
before running the following code.
The input data should be in VCF, MAF format.
- For VCF, it can only be VCF file paths.
- For MAF, it can be either a MAF file or a
data.frame
.
MAF format is the standard way to represent small-scale variants in Sigminer. There is a popular R/Bioconductor package maftools (Mayakonda et al. 2018) for analyzing MAF data. It provides an R class MAF to represent MAF format data.
2.1.1 VCF as input
If you use VCF files as input, you can use read_vcf()
to read multiple VCF files as a MAF
object.
<- list.files(system.file("extdata", package = "sigminer"), "*.vcf", full.names = TRUE)
vcfs <- read_vcf(vcfs)
maf #> Reading file(s): /Users/wsx/Library/R/sigminer/extdata/test1.vcf, /Users/wsx/Library/R/sigminer/extdata/test2.vcf, /Users/wsx/Library/R/sigminer/extdata/test3.vcf
#> It seems /Users/wsx/Library/R/sigminer/extdata/test2.vcf has no normal VCF header, try parsing without header.
#> Annotating Variant Type...
#> Downloading https://zenodo.org/record/4480398/files/human_hg19_gene_info.rds to /Users/wsx/Library/R/sigminer/extdata/human_hg19_gene_info.rds
#> Annotating mutations to first matched gene based on database /Users/wsx/Library/R/sigminer/extdata/human_hg19_gene_info.rds...
#> Transforming into a MAF object...
#> -Validating
#> --Non MAF specific values in Variant_Classification column:
#> Unknown
#> -Summarizing
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 0.038s elapsed (0.036s cpu)
<- read_vcf(vcfs, keep_only_pass = FALSE)
maf #> Reading file(s): /Users/wsx/Library/R/sigminer/extdata/test1.vcf, /Users/wsx/Library/R/sigminer/extdata/test2.vcf, /Users/wsx/Library/R/sigminer/extdata/test3.vcf
#> It seems /Users/wsx/Library/R/sigminer/extdata/test2.vcf has no normal VCF header, try parsing without header.
#> Annotating Variant Type...
#> Annotating mutations to first matched gene based on database /Users/wsx/Library/R/sigminer/extdata/human_hg19_gene_info.rds...
#> Transforming into a MAF object...
#> -Validating
#> --Non MAF specific values in Variant_Classification column:
#> Unknown
#> -Summarizing
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 0.030s elapsed (0.029s cpu)
2.1.2 MAF as input
MAF format is the most recommended input, you can provides it either as a file or as a data.frame
.
Typically, you can obtain the data in the following ways:
- You get multiple VCF files and convert them into a MAF file (vcf2maf is the most used tool for conversion).
- You get a MAF file from a reference or a public data portal, e.g., cBioPortal or GDC portal.
- You get a EXCEL file providing MAF-like data from a reference, you should read the data firstly (with
readxl::read_excel()
) and then construct adata.frame
providing necessary columns.
Once a MAF file or a MAF-like data.frame
is ready, you can read/convert it as a MAF
object with read_maf()
. Here TCGA LAML dataset is used as an example:
<- system.file("extdata", "tcga_laml.maf.gz", package = "maftools", mustWork = TRUE)
laml.maf <- read_maf(maf = laml.maf)
laml #> -Reading
#> -Validating
#> -Silent variants: 475
#> -Summarizing
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 0.264s elapsed (0.238s cpu)
laml#> An object of class MAF
#> ID summary Mean Median
#> 1: NCBI_Build 37 NA NA
#> 2: Center genome.wustl.edu NA NA
#> 3: Samples 193 NA NA
#> 4: nGenes 1241 NA NA
#> 5: Frame_Shift_Del 52 0.269 0
#> 6: Frame_Shift_Ins 91 0.472 0
#> 7: In_Frame_Del 10 0.052 0
#> 8: In_Frame_Ins 42 0.218 0
#> 9: Missense_Mutation 1342 6.953 7
#> 10: Nonsense_Mutation 103 0.534 0
#> 11: Splice_Site 92 0.477 0
#> 12: total 1732 8.974 9
The laml
is a MAF
object. The MAF
class is exported from maftools to sigminer. So laml
can be directly use functions provided by maftools.
As a MAF
object, the mutation records are stored in slot data
and maf.silent
.
head(laml@data)
#> Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
#> 1: ABCA10 10349 genome.wustl.edu 37 17 67170917
#> 2: ABCA4 24 genome.wustl.edu 37 1 94490594
#> 3: ABCB11 8647 genome.wustl.edu 37 2 169780250
#> 4: ABCC3 8714 genome.wustl.edu 37 17 48760974
#> 5: ABCF1 23 genome.wustl.edu 37 6 30554429
#> 6: ABCG4 64137 genome.wustl.edu 37 11 119031351
#> End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1
#> 1: 67170917 + Splice_Site SNP T T
#> 2: 94490594 + Missense_Mutation SNP C C
#> 3: 169780250 + Missense_Mutation SNP G G
#> 4: 48760974 + Missense_Mutation SNP C C
#> 5: 30554429 + Missense_Mutation SNP G G
#> 6: 119031351 + Missense_Mutation SNP A A
#> Tumor_Seq_Allele2 Tumor_Sample_Barcode Protein_Change i_TumorVAF_WU i_transcript_name
#> 1: C TCGA-AB-2988 p.K960R 45.66000 NM_080282.3
#> 2: T TCGA-AB-2869 p.R1517H 38.12000 NM_000350.2
#> 3: A TCGA-AB-3009 p.A1283V 46.97218 NM_003742.2
#> 4: T TCGA-AB-2887 p.P1271S 56.41000 NM_003786.1
#> 5: A TCGA-AB-2920 p.G658S 40.95000 NM_001025091.1
#> 6: G TCGA-AB-2934 p.Y567C 32.84000 NM_022169.1
head(laml@maf.silent)
#> Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
#> 1: ABCC11 85320 genome.wustl.edu 37 16 48244997
#> 2: ACAN 176 genome.wustl.edu 37 15 89401084
#> 3: ACAT1 38 genome.wustl.edu 37 11 108009744
#> 4: ACCN2 41 genome.wustl.edu 37 12 50452780
#> 5: ACTA2 59 genome.wustl.edu 37 10 90695109
#> 6: ACTL9 284382 genome.wustl.edu 37 19 8808551
#> End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1
#> 1: 48244997 + Silent SNP G G
#> 2: 89401084 + Silent SNP C C
#> 3: 108009744 + Silent SNP T T
#> 4: 50452780 + Silent SNP C C
#> 5: 90695109 + Silent SNP C C
#> 6: 8808551 + Silent SNP G G
#> Tumor_Seq_Allele2 Tumor_Sample_Barcode Protein_Change i_TumorVAF_WU i_transcript_name
#> 1: A TCGA-AB-2830 p.I490I 34.2700000 NM_032583.3
#> 2: T TCGA-AB-2898 p.S1756S 38.3000000 NM_013227.2
#> 3: G TCGA-AB-2887 p.T185T 49.0400000 NM_000019.3
#> 4: G TCGA-AB-3009 p.L77L 48.1000000 NM_020039.2
#> 5: T TCGA-AB-2973 p.P335P 0.2012072 NM_001613.1
#> 6: A TCGA-AB-2936 p.F167F 46.1500000 NM_178525.3
The data
slot contains non-silent variants, and the maf.silent
slot contains silent variants.
Default uses Variant Classifications with High/Moderate variant consequences as non-silent variants. http://asia.ensembl.org/Help/Glossary?id=535: “Frame_Shift_Del,” “Frame_Shift_Ins,” “Splice_Site,” “Translation_Start_Site,”“Nonsense_Mutation,” “Nonstop_Mutation,” “In_Frame_Del,”“In_Frame_Ins,” “Missense_Mutation” (see ?read_maf
). If you want to change, please set vc_nonSyn
option.
Other slots in MAF
object are summary data either by sample or gene/variant type etc.
slotNames(laml)
#> [1] "data" "variants.per.sample"
#> [3] "variant.type.summary" "variant.classification.summary"
#> [5] "gene.summary" "summary"
#> [7] "maf.silent" "clinical.data"
Acute myeloid leukemia is not a good object to study mutational signatures due to low mutation burden, we will use a subset of TCGA breast cohort as for illustration of the following analyses.
Anand Mayakonda has already stored whole TCGA mutation data as MAF objects in TCGAmutations package. Here I will load the TCGA BRCA cohort and create a sub-cohort with 100 tumors.
library(TCGAmutations)
#> Loading required package: maftools
#> Loading required package: data.table
#> data.table 1.14.0 using 1 threads (see ?getDTthreads). Latest news: r-datatable.com
#> **********
#> This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.
#> This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage with Apple and ask them for support. Check r-datatable.com for updates, and our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation. After several years of many reports of installation problems on Mac, it's time to gingerly point out that there have been no similar problems on Windows or Linux.
#> **********
#>
#> Attaching package: 'data.table'
#> The following object is masked from 'package:sigminer':
#>
#> :=
::kable(tcga_available()) knitr
Study_Abbreviation | Study_Name | MC3 | Firehose |
---|---|---|---|
ACC | Adrenocortical_carcinoma | 92 | 62 [dx.doi.org/10.7908/C1610ZNC] |
BLCA | Bladder_Urothelial_Carcinoma | 411 | 395 [dx.doi.org/10.7908/C1MW2GGF] |
BRCA | Breast_invasive_carcinoma | 1020 | 978 [dx.doi.org/10.7908/C1TB167Z] |
CESC | Cervical_squamous_cell_carcinoma_and_endocervical_adenocarcinoma | 289 | 194 [dx.doi.org/10.7908/C1MG7NV6] |
CHOL | Cholangiocarcinoma | 36 | 35 [dx.doi.org/10.7908/C1K936V8] |
COAD | Colon_adenocarcinoma | 404 | 367 [dx.doi.org/10.7908/C1DF6QJD] |
DLBC | Lymphoid_Neoplasm_Diffuse_Large_B-cell_Lymphoma | 37 | 48 [dx.doi.org/10.7908/C1X066DK] |
ESCA | Esophageal_carcinoma | 184 | 185 [dx.doi.org/10.7908/C1BV7FZC] |
GBM | Glioblastoma_multiforme | 390 | 283 [dx.doi.org/10.7908/C1XG9QGN] |
HNSC | Head_and_Neck_squamous_cell_carcinoma | 507 | 511 [dx.doi.org/10.7908/C18C9VM5] |
KICH | Kidney_Chromophobe | 66 | 66 [dx.doi.org/10.7908/C1765DQK] |
KIRC | Kidney_renal_clear_cell_carcinoma | 369 | 476 [dx.doi.org/10.7908/C10864RM] |
KIRP | Kidney_renal_papillary_cell_carcinoma | 281 | 282 [dx.doi.org/10.7908/C19C6WTF] |
LAML | Acute_Myeloid_Leukemia | 140 | 193 [dx.doi.org/10.7908/C1D21X2X] |
LGG | Brain_Lower_Grade_Glioma | 511 | 516 [dx.doi.org/10.7908/C1MC8ZDF] |
LIHC | Liver_hepatocellular_carcinoma | 363 | 373 [dx.doi.org/10.7908/C128070B] |
LUAD | Lung_adenocarcinoma | 515 | 533 [dx.doi.org/10.7908/C17P8XT3] |
LUSC | Lung_squamous_cell_carcinoma | 485 | 178 [dx.doi.org/10.7908/C1X34WXV] |
MESO | Mesothelioma | 82 | NA |
OV | Ovarian_serous_cystadenocarcinoma | 411 | 466 [dx.doi.org/10.7908/C1736QC5] |
PAAD | Pancreatic_adenocarcinoma | 177 | 126 [dx.doi.org/10.7908/C1513XNS] |
PCPG | Pheochromocytoma_and_Paraganglioma | 179 | 179 [dx.doi.org/10.7908/C13T9GN0] |
PRAD | Prostate_adenocarcinoma | 497 | 498 [dx.doi.org/10.7908/C1Z037MV] |
READ | Rectum_adenocarcinoma | 149 | 122 [dx.doi.org/10.7908/C1S46RDB] |
SARC | Sarcoma | 236 | 247 [dx.doi.org/10.7908/C137785M] |
SKCM | Skin_Cutaneous_Melanoma | 466 | 290 [dx.doi.org/10.7908/C1J67GCG] |
STAD | Stomach_adenocarcinoma | 439 | 393 [dx.doi.org/10.7908/C1C828SM] |
TGCT | Testicular_Germ_Cell_Tumors | 129 | 147 [dx.doi.org/10.7908/C1S1820D] |
THCA | Thyroid_carcinoma | 492 | 496 [dx.doi.org/10.7908/C16W99KN] |
THYM | Thymoma | 123 | 120 [dx.doi.org/10.7908/C15T3JZ6] |
UCEC | Uterine_Corpus_Endometrial_Carcinoma | 530 | 248 [dx.doi.org/10.7908/C1C828T2] |
UCS | Uterine_Carcinosarcoma | 57 | 57 [dx.doi.org/10.7908/C1PC31W8] |
UVM | Uveal_Melanoma | 80 | 80 [dx.doi.org/10.7908/C1S1821V] |
Unknown | Samples_I_couldnt_figure_out_the_source | 77 | NA |
set.seed(1234)
<- tcga_load("BRCA")
brca <- maftools::subsetMaf(brca,
brca tsb = as.character(sample(brca@variants.per.sample$Tumor_Sample_Barcode, 100))
)saveRDS(brca, file = "data/brca.rds")
Here we save this cohort to let user can also run the examples without installing package TCGAmutations.
<- readRDS("data/brca.rds") brca
2.2 Tally Components
2.2.1 The most common 96 components
According to 3-nucleotide context (mutated base, 5’ and 3’ adjacent bases) and base complementary pairing principle, we can divide all SBS mutations into 96 mutation types. We call each mutation type as a component here.
This classification is based the six substitution subtypes: C>A, C>G, C>T, T>A, T>C, and T>G (all substitutions are referred to by the pyrimidine of the mutated Watson—Crick base pair). Further, each of the substitutions is examined by incorporating information on the bases immediately 5’ and 3’ to each mutated base generating 96 possible mutation types (6 types of substitution x 4 types of 5’ base x 4 types of 3’ base).
We tally components in each sample, and generate a sample-by-component matrix.
<- sig_tally(
mt_tally
brca,ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
useSyn = TRUE
)#> ℹ [2021-05-18 23:15:43]: Started.
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:data.table':
#>
#> first, second
#> The following object is masked from 'package:NMF':
#>
#> nrun
#> The following object is masked from 'package:pkgmaker':
#>
#> new2
#> The following object is masked from 'package:base':
#>
#> expand.grid
#>
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:data.table':
#>
#> shift
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
#> ✓ [2021-05-18 23:15:50]: Reference genome loaded.
#> ✓ [2021-05-18 23:15:50]: Variants from MAF object queried.
#> ✓ [2021-05-18 23:15:50]: Chromosome names checked.
#> ✓ [2021-05-18 23:15:50]: Sex chromosomes properly handled.
#> ✓ [2021-05-18 23:15:50]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✓ [2021-05-18 23:15:50]: Variant start and end position checked.
#> ✓ [2021-05-18 23:15:50]: Variant data for matrix generation preprocessed.
#> ℹ [2021-05-18 23:15:50]: SBS matrix generation - start.
#> ℹ [2021-05-18 23:15:50]: Extracting 5' and 3' adjacent bases.
#> ℹ [2021-05-18 23:15:52]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2021-05-18 23:15:53]: Estimating APOBEC enrichment scores.
#> ℹ [2021-05-18 23:15:53]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✓ [2021-05-18 23:15:53]: APOBEC related mutations are enriched in 28% of samples (APOBEC enrichment score > 2; 28 of 100 samples)
#> ℹ [2021-05-18 23:15:53]: Creating SBS sample-by-component matrices.
#> ✓ [2021-05-18 23:15:53]: SBS-6 matrix created.
#> ✓ [2021-05-18 23:15:53]: SBS-96 matrix created.
#> ✓ [2021-05-18 23:15:53]: SBS-1536 matrix created.
#> ℹ [2021-05-18 23:15:53]: Return SBS-96 as major matrix.
#> ✓ [2021-05-18 23:15:53]: Done.
#> ℹ [2021-05-18 23:15:53]: 10.073 secs elapsed.
Here set
useSyn = TRUE
to include all variant records in MAF object to generate sample matrix.
$nmf_matrix[1:5, 1:5]
mt_tally#> A[T>C]A C[T>C]A G[T>C]A T[T>C]A A[C>T]A
#> TCGA-A1-A0SH-01A-11D-A099-09 0 0 1 1 0
#> TCGA-A2-A04N-01A-11D-A10Y-09 0 0 0 1 2
#> TCGA-A2-A0CP-01A-11W-A050-09 0 0 0 0 0
#> TCGA-A2-A0EP-01A-52D-A22X-09 0 0 1 0 0
#> TCGA-A2-A0EV-01A-11W-A050-09 0 0 1 0 0
We use notion left[ref>mut]right
to mark each component, e.g. C[T>G]A
means a base T with 5’ adjacent base C and 3’ adjacent base A is mutated to base G.
2.2.2 Other Situations
Above we show the most common SBS classifications, there are other situations supported by sigminer, including other classifications for SBS records and other mutation types (DBS and ID). All situations about SBS, DBS and ID signatures are well documented in wiki of SigProfilerMatrixGenerator package.
2.2.2.1 Other SBS classifications
After calling sig_tally()
, the most used matrix is stored in nmf_matrix
, and all matrices generated by sigminer are stored in all_matrices
.
str(mt_tally$all_matrices, max.level = 1)
#> List of 3
#> $ SBS_6 : int [1:100, 1:6] 7 6 5 4 9 7 5 5 0 5 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_96 : int [1:100, 1:96] 0 0 0 0 0 0 1 2 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_1536: int [1:100, 1:1536] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
If you add the strand classification, all matrices can be generated by sigminer will return.
<- sig_tally(
mt_tally2
brca,ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
useSyn = TRUE, add_trans_bias = TRUE
)#> ℹ [2021-05-18 23:15:53]: Started.
#> ✓ [2021-05-18 23:15:53]: Reference genome loaded.
#> ✓ [2021-05-18 23:15:53]: Variants from MAF object queried.
#> ✓ [2021-05-18 23:15:53]: Chromosome names checked.
#> ✓ [2021-05-18 23:15:53]: Sex chromosomes properly handled.
#> ✓ [2021-05-18 23:15:53]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✓ [2021-05-18 23:15:53]: Variant start and end position checked.
#> ✓ [2021-05-18 23:15:53]: Variant data for matrix generation preprocessed.
#> ℹ [2021-05-18 23:15:53]: SBS matrix generation - start.
#> ℹ [2021-05-18 23:15:53]: Extracting 5' and 3' adjacent bases.
#> ℹ [2021-05-18 23:15:54]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2021-05-18 23:15:55]: Estimating APOBEC enrichment scores.
#> ℹ [2021-05-18 23:15:55]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✓ [2021-05-18 23:15:55]: APOBEC related mutations are enriched in 28% of samples (APOBEC enrichment score > 2; 28 of 100 samples)
#> ℹ [2021-05-18 23:15:55]: Creating SBS sample-by-component matrices.
#> ✓ [2021-05-18 23:15:55]: SBS-6 matrix created.
#> ✓ [2021-05-18 23:15:55]: SBS-96 matrix created.
#> ✓ [2021-05-18 23:15:55]: SBS-1536 matrix created.
#> ✓ [2021-05-18 23:15:55]: SBS-24 (6x4) matrix created.
#> ✓ [2021-05-18 23:15:55]: SBS-384 (96x4) matrix created.
#> ✓ [2021-05-18 23:15:55]: SBS-6144 (1536x4) matrix created.
#> ℹ [2021-05-18 23:15:55]: Return SBS-192 as major matrix.
#> ✓ [2021-05-18 23:15:55]: Done.
#> ℹ [2021-05-18 23:15:55]: 2.189 secs elapsed.
str(mt_tally2$all_matrices, max.level = 1)
#> List of 7
#> $ SBS_6 : int [1:100, 1:6] 7 6 5 4 9 7 5 5 0 5 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_24 : int [1:100, 1:24] 6 3 3 2 6 4 1 2 0 3 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_96 : int [1:100, 1:96] 0 0 0 0 0 0 1 2 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_192 : int [1:100, 1:192] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_384 : int [1:100, 1:384] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_1536: int [1:100, 1:1536] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_6144: int [1:100, 1:6144] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
2.2.2.2 DBS and ID components
If you want to generate DBS or ID matrices, just modify the mode
option.
<- sig_tally(
mt_tally_DBS
brca,ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
useSyn = TRUE,
mode = "DBS",
add_trans_bias = TRUE
)#> ℹ [2021-05-18 23:15:55]: Started.
#> ✓ [2021-05-18 23:15:55]: Reference genome loaded.
#> ✓ [2021-05-18 23:15:55]: Variants from MAF object queried.
#> ✓ [2021-05-18 23:15:55]: Chromosome names checked.
#> ✓ [2021-05-18 23:15:55]: Sex chromosomes properly handled.
#> ✓ [2021-05-18 23:15:55]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✓ [2021-05-18 23:15:55]: Variant start and end position checked.
#> ✓ [2021-05-18 23:15:55]: Variant data for matrix generation preprocessed.
#> ℹ [2021-05-18 23:15:55]: DBS matrix generation - start.
#> ℹ [2021-05-18 23:15:55]: Searching DBS records...
#> ✓ [2021-05-18 23:15:56]: Done.
#> ✓ [2021-05-18 23:15:57]: Reference sequences queried from genome.
#> ✓ [2021-05-18 23:15:57]: DBS-78 matrix created.
#> ✓ [2021-05-18 23:15:57]: DBS-1248 matrix created.
#> ✓ [2021-05-18 23:15:57]: DBS-186 matrix created.
#> ℹ [2021-05-18 23:15:57]: Return DBS-186 as major matrix.
#> ✓ [2021-05-18 23:15:57]: Done.
#> ℹ [2021-05-18 23:15:57]: 1.562 secs elapsed.
str(mt_tally_DBS$all_matrices, max.level = 1)
#> List of 3
#> $ DBS_78 : int [1:100, 1:78] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ DBS_186 : int [1:100, 1:186] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ DBS_1248: int [1:100, 1:1248] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
Program will stop if no records to analyze.
Let’s see ID records.
<- sig_tally(
mt_tally_ID
brca,ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
useSyn = TRUE,
mode = "ID",
add_trans_bias = TRUE
)#> ℹ [2021-05-18 23:15:57]: Started.
#> ✓ [2021-05-18 23:15:57]: Reference genome loaded.
#> ✓ [2021-05-18 23:15:57]: Variants from MAF object queried.
#> ✓ [2021-05-18 23:15:57]: Chromosome names checked.
#> ✓ [2021-05-18 23:15:57]: Sex chromosomes properly handled.
#> ✓ [2021-05-18 23:15:57]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✓ [2021-05-18 23:15:57]: Variant start and end position checked.
#> ✓ [2021-05-18 23:15:57]: Variant data for matrix generation preprocessed.
#> ℹ [2021-05-18 23:15:57]: INDEL matrix generation - start.
#> ✓ [2021-05-18 23:15:58]: Reference sequences queried from genome.
#> ✓ [2021-05-18 23:15:58]: INDEL length extracted.
#> ✓ [2021-05-18 23:15:58]: Adjacent copies counted.
#> ✓ [2021-05-18 23:16:21]: Microhomology size calculated.
#> ✓ [2021-05-18 23:16:21]: INDEL records classified into different components (types).
#> ✓ [2021-05-18 23:16:21]: ID-28 matrix created.
#> ✓ [2021-05-18 23:16:21]: ID-83 matrix created.
#> ✓ [2021-05-18 23:16:21]: ID-415 matrix created.
#> ℹ [2021-05-18 23:16:21]: Return ID-415 as major matrix.
#> ✓ [2021-05-18 23:16:21]: Done.
#> ℹ [2021-05-18 23:16:21]: 24.093 secs elapsed.
str(mt_tally_ID$all_matrices, max.level = 1)
#> List of 3
#> $ ID_28 : int [1:100, 1:28] 0 1 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ ID_83 : int [1:100, 1:83] 0 1 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ ID_415:'data.frame': 100 obs. of 415 variables:
2.2.2.3 Take togother
If you want to get all matrices for SBS, DBS and ID at the same time, you don’t need to write a for loop or type three times to do this.
Just set mode='ALL'
, sigminer will do it for you!
<- sig_tally(
mt_tally_all
brca,ref_genome = "BSgenome.Hsapiens.UCSC.hg19",
useSyn = TRUE,
mode = "ALL",
add_trans_bias = TRUE
)#> ℹ [2021-05-18 23:16:21]: Started.
#> ✓ [2021-05-18 23:16:21]: Reference genome loaded.
#> ✓ [2021-05-18 23:16:21]: Variants from MAF object queried.
#> ✓ [2021-05-18 23:16:21]: Chromosome names checked.
#> ✓ [2021-05-18 23:16:21]: Sex chromosomes properly handled.
#> ✓ [2021-05-18 23:16:21]: Only variants located in standard chromosomes (1:22, X, Y, M/MT) are kept.
#> ✓ [2021-05-18 23:16:21]: Variant start and end position checked.
#> ✓ [2021-05-18 23:16:21]: Variant data for matrix generation preprocessed.
#> ℹ [2021-05-18 23:16:21]: All types of matrices generation - start.
#> ℹ [2021-05-18 23:16:21]: SBS matrix generation - start.
#> ℹ [2021-05-18 23:16:21]: Extracting 5' and 3' adjacent bases.
#> ℹ [2021-05-18 23:16:23]: Extracting +/- 20bp around mutated bases for background C>T estimation.
#> ℹ [2021-05-18 23:16:23]: Estimating APOBEC enrichment scores.
#> ℹ [2021-05-18 23:16:23]: Performing one-way Fisher's test for APOBEC enrichment.
#> ✓ [2021-05-18 23:16:23]: APOBEC related mutations are enriched in 28% of samples (APOBEC enrichment score > 2; 28 of 100 samples)
#> ℹ [2021-05-18 23:16:23]: Creating SBS sample-by-component matrices.
#> ✓ [2021-05-18 23:16:23]: SBS-6 matrix created.
#> ✓ [2021-05-18 23:16:23]: SBS-96 matrix created.
#> ✓ [2021-05-18 23:16:23]: SBS-1536 matrix created.
#> ✓ [2021-05-18 23:16:23]: SBS-24 (6x4) matrix created.
#> ✓ [2021-05-18 23:16:24]: SBS-384 (96x4) matrix created.
#> ✓ [2021-05-18 23:16:24]: SBS-6144 (1536x4) matrix created.
#> ℹ [2021-05-18 23:16:24]: Return SBS-192 as major matrix.
#> ℹ [2021-05-18 23:16:24]: DBS matrix generation - start.
#> ℹ [2021-05-18 23:16:24]: Searching DBS records...
#> ✓ [2021-05-18 23:16:24]: Done.
#> ✓ [2021-05-18 23:16:25]: Reference sequences queried from genome.
#> ✓ [2021-05-18 23:16:25]: DBS-78 matrix created.
#> ✓ [2021-05-18 23:16:25]: DBS-1248 matrix created.
#> ✓ [2021-05-18 23:16:25]: DBS-186 matrix created.
#> ℹ [2021-05-18 23:16:25]: Return DBS-186 as major matrix.
#> ℹ [2021-05-18 23:16:25]: INDEL matrix generation - start.
#> ✓ [2021-05-18 23:16:27]: Reference sequences queried from genome.
#> ✓ [2021-05-18 23:16:27]: INDEL length extracted.
#> ✓ [2021-05-18 23:16:27]: Adjacent copies counted.
#> ✓ [2021-05-18 23:16:50]: Microhomology size calculated.
#> ✓ [2021-05-18 23:16:50]: INDEL records classified into different components (types).
#> ✓ [2021-05-18 23:16:50]: ID-28 matrix created.
#> ✓ [2021-05-18 23:16:50]: ID-83 matrix created.
#> ✓ [2021-05-18 23:16:50]: ID-415 matrix created.
#> ℹ [2021-05-18 23:16:50]: Return ID-415 as major matrix.
#> ℹ [2021-05-18 23:16:50]: All types of matrices generation (APOBEC scores included) - end.
#> ✓ [2021-05-18 23:16:50]: Done.
#> ℹ [2021-05-18 23:16:50]: 28.846 secs elapsed.
str(mt_tally_all, max.level = 1)
#> List of 14
#> $ SBS_6 : int [1:100, 1:6] 7 6 5 4 9 7 5 5 0 5 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_24 : int [1:100, 1:24] 6 3 3 2 6 4 1 2 0 3 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_96 : int [1:100, 1:96] 0 0 0 0 0 0 1 2 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_192 : int [1:100, 1:192] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_384 : int [1:100, 1:384] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_1536 : int [1:100, 1:1536] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ SBS_6144 : int [1:100, 1:6144] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ DBS_78 : int [1:100, 1:78] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ DBS_186 : int [1:100, 1:186] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ DBS_1248 : int [1:100, 1:1248] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ ID_28 : int [1:100, 1:28] 0 1 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ ID_83 : int [1:100, 1:83] 0 1 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ ID_415 :'data.frame': 100 obs. of 415 variables:
#> $ APOBEC_scores:Classes 'data.table' and 'data.frame': 100 obs. of 44 variables:
#> ..- attr(*, ".internal.selfref")=<externalptr>
#> ..- attr(*, "index")= int(0)
#> .. ..- attr(*, "__APOBEC_Enriched")= int [1:100] 17 21 23 24 25 26 27 28 30 32 ...
Please note, in this case, just a list containing matrices will return.
2.3 Extract Signatures
There are two different approaches can be used to implement signature extraction.
- Semi-automatic way. Firstly estimate signature number and then extract signatures. This is the basic usage of NMF package (Gaujoux and Seoighe 2010). Run multiple times of NMF in a range of signature number firstly and compute some measures to let you consider which signature number is suitable and then extract specified number of signatures again with NMF package.
- Automatic way. This includes multiple implementations, and will be described in corresponding part.
2.3.1 Method 1: Estimate Signature Number
To estimate signature number, we firstly run NMF for multiple times and calculate some measures from the results, then select the best signature number based on the measure vs. signature number plot.
This is done by sig_estimate()
function. Let’s try signature number 2-6. For simplicity, we just run NMF twice for each signature number. We use 4 cores to speed up the computation.
library(NMF)
<- sig_estimate(mt_tally$nmf_matrix,
mt_est range = 2:6,
nrun = 10, # increase this value if you wana a more stable estimation
use_random = FALSE, # if TRUE, add results from randomized input
cores = 4,
pConstant = 1e-13,
verbose = TRUE
)
pConstant
option is set to avoid errors raised by NMF package.
We can show signature number survey for different measures by show_sig_number_survey2()
.
## You can also select the measures to show
## by 'what' option
show_sig_number_survey2(mt_est$survey)
> For the details of all the measures above, please read Gaujoux and Seoighe (2010) and vignette of R package NMF.
The measures either provide stability (cophenetic
) or how well can be reconstructed (rss
).
Typically, measure cophenetic is used for determining the signature number. We can easily generate an elbow plot
with function show_sig_number_survey()
.
show_sig_number_survey(mt_est$survey, right_y = NULL)
> The most common approach is to use the cophenetic correlation coefficient. Brunet et al. suggested choosing the smallest value of r for which this coefficient starts decreasing. (Gaujoux and Seoighe 2010)
Cophenetic value (range from 0-1) indicates the robustness of consensus matrix clustering. In this situation, 3 is good. However, we can found that the cophenetic values are all >=0.9 from 2 to 5. So the more suitable way is considering both stability and reconstruction error at the same time, it can be easily done by show_sig_number_survey()
.
show_sig_number_survey(mt_est$survey)
> This function is very flexible, you can pick up any measure to the left/right axis. However, the default setting is the most recommended way.
We can see that we get a minimal RSS in signature number, and when this value goes from 5 to 6, the RSS increase! So we should not choose signature number more than 5 here because 6 is overfitting.
NOTE: There are no gold standard to determine the signature number. Sometimes, you should consider multiple measures. Remember, the most important thing is that you should have a good biological explanation for each signature. The best solution in study may not be the best solution in math.
2.3.2 Method 1: Extract Signatures
After selecting a suitable signature number, now you can extract signatures. In general, use 30~50 NMF runs will get a robust result.
Here we extract 5 signatures.
<- sig_extract(mt_tally$nmf_matrix,
mt_sig n_sig = 5,
nrun = 30,
cores = 4,
pConstant = 1e-13
)
2.3.3 Method 2: Auto-extraction
If you have no idea to select an optimal signature number from procedures above, you can try auto-extraction approaches provided by sigminer.
The latest version of sigminer provides two ways to auto-extract mutation signatures.
- Auto-extract signatures by automatic relevance determination technique in non-negative matrix factorization (Tan and Févotte 2012), the code is implemented by SignatureAnalyzer (Kim et al. 2016) and exported to sigminer. This approach is known as bayesian NMF.
- Auto-extract signatures by SigProfiler, the gold-standard tool used for identifying signatures cataloged in COSMIC database. The technical details please read Alexandrov et al. (2020).
2.3.3.1 Method 2: bayesian NMF
In this approach, you need to set a maximum signature number (default is 25
) and run times to get the result. 10 for nrun
here is okay, and more than 100 is not recommended.
The bayesian NMF will starts from a larger signature number and reduce it to a proper signature number to maximize posterior probability.
<- sig_auto_extract(mt_tally$nmf_matrix,
mt_sig2 K0 = 10, nrun = 10,
strategy = "stable"
)
Here the program uses ‘robust’ strategy to return the result (see strategy
option). It means that if you run 10 times and 6 of them return 4
signatures, then the optimal result with 4
signatures will be returned.
The info of each run can be given as:
::kable(mt_sig2$Raw$summary_run) knitr
Run | K | posterior | file |
---|---|---|---|
3 | 3 | -1497.670 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.3.rds |
9 | 3 | -1497.752 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.9.rds |
2 | 3 | -1498.428 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.2.rds |
10 | 3 | -1498.528 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.10.rds |
6 | 3 | -1498.596 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.6.rds |
1 | 3 | -1498.830 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.1.rds |
7 | 3 | -1498.890 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.7.rds |
4 | 3 | -1499.207 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.4.rds |
8 | 3 | -1524.316 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.8.rds |
5 | 4 | -1617.962 | /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpqXIwLJ/BayesNMF.5.rds |
The mt_sig2
has similar structure as mut_sig
.
2.3.3.2 Method 2: SigProfiler
Sigminer provides two functions sigprofiler_extract()
and sigprofiler_import()
to install, use SigProfiler and import SigProfiler results into R as a Signature
object like other extraction methods mentioned above.
An (not running) example is given below (see ?sigprofiler
for more info).
load(system.file("extdata", "toy_copynumber_tally_M.RData",
package = "sigminer", mustWork = TRUE
))
::conda_list()
reticulate
sigprofiler_extract(cn_tally_M$nmf_matrix, "~/test/test_sigminer",
use_conda = TRUE
)
sigprofiler_extract(cn_tally_M$nmf_matrix, "~/test/test_sigminer",
use_conda = FALSE, py_path = "/Users/wsx/anaconda3/bin/python"
)
2.4 Match Signatures
After extracting signatures, we need to know their etiologies. This can be done by comparing the identified signatures and reference signatures from COSMIC database.
<- get_sig_similarity(mt_sig2)
sim #> -Comparing against COSMIC signatures
#> ------------------------------------
#> --Found Sig1 most similar to COSMIC_3
#> Aetiology: defects in DNA-DSB repair by HR [similarity: 0.82]
#> --Found Sig2 most similar to COSMIC_1
#> Aetiology: spontaneous deamination of 5-methylcytosine [similarity: 0.942]
#> --Found Sig3 most similar to COSMIC_2
#> Aetiology: APOBEC Cytidine Deaminase (C>T) [similarity: 0.84]
#> ------------------------------------
#> Return result invisiblely.
The result object sim
is a list.
str(sim)
#> List of 4
#> $ similarity : num [1:3, 1:30] 0.82 0.264 0.373 0.581 0.942 0.165 0.085 0.256 0.84 0.714 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:3] "Sig1" "Sig2" "Sig3"
#> .. ..$ : chr [1:30] "COSMIC_3" "COSMIC_1" "COSMIC_2" "COSMIC_4" ...
#> $ aetiology_db:List of 1
#> ..$ : chr [1:30] "spontaneous deamination of 5-methylcytosine" "APOBEC Cytidine Deaminase (C>T)" "defects in DNA-DSB repair by HR" "exposure to tobacco (smoking) mutagens" ...
#> $ best_match :List of 3
#> ..$ Sig1:List of 2
#> .. ..$ aetiology : chr "defects in DNA-DSB repair by HR"
#> .. ..$ best_match: chr "Best match: COSMIC_3 [similarity: 0.82]"
#> ..$ Sig2:List of 2
#> .. ..$ aetiology : chr "spontaneous deamination of 5-methylcytosine"
#> .. ..$ best_match: chr "Best match: COSMIC_1 [similarity: 0.942]"
#> ..$ Sig3:List of 2
#> .. ..$ aetiology : chr "APOBEC Cytidine Deaminase (C>T)"
#> .. ..$ best_match: chr "Best match: COSMIC_2 [similarity: 0.84]"
#> $ rss : num [1:3, 1:30] 0.04166 0.00711 0.14698 0.2492 0.24257 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:3] "Sig1" "Sig2" "Sig3"
#> .. ..$ : chr [1:30] "COSMIC_1" "COSMIC_2" "COSMIC_3" "COSMIC_4" ...
#> - attr(*, "class")= chr [1:2] "similarity" "list"
From the result we can see that three signatures are properly matched to COSMIC reference signatures. If you find unknown signatures in your study, you should explore the etiologies by other analyses and even experiments.
The similarity matrix can be plotted.
::pheatmap(sim$similarity) pheatmap
You can also try the COSMIC signature database V3 with:
<- get_sig_similarity(mt_sig2, sig_db = "SBS")
sim_v3 #> -Comparing against COSMIC signatures
#> ------------------------------------
#> --Found Sig1 most similar to SBS3
#> Aetiology: Defective homologous recombination DNA damage repair [similarity: 0.821]
#> --Found Sig2 most similar to SBS1
#> Aetiology: Spontaneous deamination of 5-methylcytosine (clock-like signature) [similarity: 0.876]
#> --Found Sig3 most similar to SBS2
#> Aetiology: Activity of APOBEC family of cytidine deaminases [similarity: 0.748]
#> ------------------------------------
#> Return result invisiblely.
2.5 Operate Signature
The result of sig_extract()
or sig_auto_extract()
is a list
with Signature
class. You can use $
or use operation function to obtain the data stored in it.
To get the signature matrix:
# Or mt_sig2$Signature
sig_signature(mt_sig2)[1:5, ]
#> Sig1 Sig2 Sig3
#> A[T>C]A 0.008982565 0.012815512 0.0027641452
#> C[T>C]A 0.006295256 0.006378091 0.0008926155
#> G[T>C]A 0.013961272 0.002795088 0.0010525932
#> T[T>C]A 0.004574805 0.004748670 0.0012218410
#> A[C>T]A 0.004672896 0.018070168 0.0063343170
To get signature exposure matrix:
# Or mt_sig$Exposure
sig_exposure(mt_sig2)[, 1:5]
#> TCGA-A1-A0SH-01A-11D-A099-09 TCGA-A2-A04N-01A-11D-A10Y-09 TCGA-A2-A0CP-01A-11W-A050-09
#> Sig1 32.15907 15.839732 22.947414
#> Sig2 16.59291 32.478516 22.476482
#> Sig3 87.05947 3.137845 4.608987
#> TCGA-A2-A0EP-01A-52D-A22X-09 TCGA-A2-A0EV-01A-11W-A050-09
#> Sig1 14.4657384 35.25795
#> Sig2 0.4796753 25.49099
#> Sig3 6.9200983 17.57883
get_sig_exposure()
may be more useful, it can be used to return a data.frame
and set an exposure threshold.
get_sig_exposure(mt_sig2)
#> sample Sig1 Sig2 Sig3
#> 1: TCGA-A1-A0SH-01A-11D-A099-09 3.215907e+01 1.659291e+01 8.705947e+01
#> 2: TCGA-A2-A04N-01A-11D-A10Y-09 1.583973e+01 3.247852e+01 3.137845e+00
#> 3: TCGA-A2-A0CP-01A-11W-A050-09 2.294741e+01 2.247648e+01 4.608987e+00
#> 4: TCGA-A2-A0EP-01A-52D-A22X-09 1.446574e+01 4.796753e-01 6.920098e+00
#> 5: TCGA-A2-A0EV-01A-11W-A050-09 3.525795e+01 2.549099e+01 1.757883e+01
#> 6: TCGA-A2-A0SX-01A-12D-A099-09 1.890198e+01 1.190887e+01 9.765504e+00
#> 7: TCGA-A2-A0T7-01A-21D-A099-09 7.058436e+00 2.206062e+01 2.063678e-01
#> 8: TCGA-A2-A0YF-01A-21D-A10G-09 1.308864e+01 2.872085e+01 3.276018e-121
#> 9: TCGA-A2-A25F-01A-11D-A167-09 8.956460e+00 2.075076e-322 0.000000e+00
#> 10: TCGA-A2-A3XW-01A-11D-A23C-09 1.022872e+01 7.998442e+00 6.307634e-01
#> 11: TCGA-A2-A4S1-01A-21D-A25Q-09 7.051307e+01 2.151502e+01 4.647996e+00
#> 12: TCGA-A7-A0D9-01A-31W-A071-09 2.911793e+01 1.460116e+01 1.642104e+01
#> 13: TCGA-A7-A13F-01A-11D-A12Q-09 1.445889e-99 7.200445e+01 1.005387e+01
#> 14: TCGA-A7-A5ZV-01A-11D-A28B-09 1.744084e+02 3.859699e+01 3.691587e+01
#> 15: TCGA-A8-A06P-01A-11W-A019-09 2.224565e+01 1.850442e+01 3.520406e+00
#> 16: TCGA-A8-A076-01A-21W-A019-09 3.334543e+01 4.665212e+01 1.243664e+01
#> 17: TCGA-A8-A07W-01A-11W-A019-09 1.027996e+02 4.510342e+01 2.145630e+01
#> 18: TCGA-A8-A084-01A-21W-A019-09 1.875160e+01 3.524792e+01 2.879974e+01
#> 19: TCGA-A8-A08S-01A-11W-A050-09 1.735008e+01 4.659754e+01 1.427450e-05
#> 20: TCGA-A8-A09G-01A-21W-A019-09 2.720241e-49 1.598340e+01 3.015537e+02
#> 21: TCGA-A8-A0A4-01A-11W-A019-09 1.301530e+01 2.414905e+01 9.673844e-01
#> 22: TCGA-A8-A0AB-01A-11W-A050-09 2.224326e+01 1.070162e+01 4.625000e+00
#> 23: TCGA-AC-A2B8-01A-11D-A17D-09 4.581254e-74 2.077930e+01 1.039934e+02
#> 24: TCGA-AC-A2FO-01A-11D-A17W-09 8.224234e+00 6.022061e+00 1.457732e+01
#> 25: TCGA-AC-A3YI-01A-21D-A23C-09 1.181688e+01 8.021717e+00 4.392543e-188
#> 26: TCGA-AC-A8OS-01A-12D-A41F-09 3.481344e+01 2.558226e+01 1.892168e+01
#> 27: TCGA-AN-A0FK-01A-11W-A050-09 2.057755e-97 5.410512e+01 6.931546e-201
#> 28: TCGA-AN-A0FT-01A-11W-A050-09 2.445782e+01 6.250409e+01 3.692872e+01
#> 29: TCGA-AN-A0XO-01A-11D-A10G-09 1.852324e+01 1.228874e+01 1.174155e+01
#> 30: TCGA-AO-A1KS-01A-11D-A13L-09 4.164234e+00 3.175461e+01 1.956542e+01
#> 31: TCGA-AQ-A54O-01A-11D-A25Q-09 3.055399e-28 2.459937e+01 5.953473e-01
#> 32: TCGA-AQ-A7U7-01A-22D-A351-09 1.358585e+01 1.441648e+01 7.330881e+01
#> 33: TCGA-AR-A0TP-01A-11D-A099-09 5.768672e+01 6.630986e+00 5.803076e+00
#> 34: TCGA-AR-A0U3-01A-11D-A10G-09 1.733324e+01 2.281135e+01 3.013800e-322
#> 35: TCGA-AR-A1AH-01A-11D-A12B-09 9.496609e+01 1.443596e+01 8.395297e+00
#> 36: TCGA-AR-A1AJ-01A-21D-A12Q-09 7.534377e+00 2.369358e+01 2.277121e+01
#> 37: TCGA-AR-A1AN-01A-11D-A12Q-09 6.529774e+00 2.102345e+01 5.820990e+00
#> 38: TCGA-AR-A24N-01A-11D-A167-09 2.004502e+01 2.261256e+01 4.435549e+00
#> 39: TCGA-AR-A252-01A-11D-A167-09 6.130705e+00 1.263644e+01 3.556170e-312
#> 40: TCGA-AR-A2LL-01A-11D-A17W-09 1.503319e+01 2.218907e+01 8.682292e-233
#> 41: TCGA-AR-A2LO-01A-31D-A18P-09 1.413060e+01 2.481548e+00 3.266687e+00
#> 42: TCGA-B6-A0IE-01A-11W-A050-09 2.018322e+01 1.947366e+01 2.614518e+00
#> 43: TCGA-B6-A0IM-01A-11W-A050-09 2.162813e+01 1.865495e+01 3.521674e-57
#> 44: TCGA-B6-A0IP-01A-11D-A045-09 1.442865e+01 3.592925e+01 8.112508e-01
#> 45: TCGA-B6-A0RV-01A-11D-A099-09 1.560069e-177 1.750629e+01 4.905081e+01
#> 46: TCGA-B6-A0WZ-01A-11D-A10G-09 6.958489e+00 6.417496e+00 8.073838e+01
#> 47: TCGA-B6-A0X1-01A-11D-A10G-09 4.814384e+01 2.024595e+01 2.160594e+01
#> 48: TCGA-B6-A1KC-01B-11D-A159-09 1.272953e+01 2.179920e+01 7.508081e-01
#> 49: TCGA-B6-A401-01A-11D-A23C-09 1.081423e+01 1.941480e+01 2.059546e-01
#> 50: TCGA-B6-A40C-01A-11D-A23C-09 1.830003e+01 3.154423e+01 2.647047e+00
#> 51: TCGA-BH-A0AV-01A-31D-A10Y-09 6.587096e+01 2.929251e+00 1.572687e+01
#> 52: TCGA-BH-A0BT-01A-11D-A12Q-09 6.568456e+00 3.775611e+01 2.752085e+00
#> 53: TCGA-BH-A0DL-01A-11D-A10Y-09 2.910845e+01 6.775503e+01 2.920173e+01
#> 54: TCGA-BH-A0DO-01B-11D-A12B-09 1.010194e+01 1.427067e+01 5.289001e+00
#> 55: TCGA-BH-A0DT-01A-21D-A12B-09 3.335864e+00 1.048536e+01 2.024691e+00
#> 56: TCGA-BH-A0GY-01A-11W-A071-09 1.586565e+01 2.609193e+01 7.451332e-171
#> 57: TCGA-BH-A0H6-01A-21W-A071-09 1.075555e+01 1.584751e+01 6.033999e-182
#> 58: TCGA-BH-A18K-01A-11D-A12B-09 2.201843e+01 1.341495e+01 3.649113e+01
#> 59: TCGA-BH-A1FU-01A-11D-A14G-09 6.503440e+01 2.615015e+01 2.435476e+01
#> 60: TCGA-BH-A202-01A-11D-A14K-09 9.499339e+00 1.718580e+01 8.725869e-01
#> 61: TCGA-BH-A5IZ-01A-11D-A27P-09 1.272274e+02 1.895424e+01 3.599703e+01
#> 62: TCGA-BH-A6R8-01A-21D-A33E-09 1.333218e+01 2.247599e+01 5.410627e+00
#> 63: TCGA-BH-A8G0-01A-11D-A351-09 5.329473e-61 2.044391e+01 0.000000e+00
#> 64: TCGA-C8-A131-01A-11D-A10Y-09 2.892229e+01 4.000734e+01 4.478124e+00
#> 65: TCGA-D8-A147-01A-11D-A10Y-09 1.170598e+02 3.708374e+00 7.345449e+01
#> 66: TCGA-D8-A1JG-01B-11D-A13L-09 8.510523e+01 6.270187e+01 2.084379e+01
#> 67: TCGA-D8-A1JH-01A-11D-A188-09 1.123474e+01 1.153185e+01 9.875530e-01
#> 68: TCGA-D8-A1JJ-01A-31D-A14K-09 1.266815e+01 3.477698e+01 5.170826e+01
#> 69: TCGA-D8-A1JT-01A-31D-A13L-09 2.184767e+01 2.034265e+01 0.000000e+00
#> 70: TCGA-D8-A1JU-01A-11D-A13L-09 9.975756e+00 9.782078e+00 6.083460e-02
#> 71: TCGA-D8-A1X7-01A-11D-A14K-09 1.298671e+01 1.517123e+01 3.439192e+00
#> 72: TCGA-D8-A1X8-01A-11D-A14K-09 1.460678e+01 1.557185e+01 2.380506e+00
#> 73: TCGA-D8-A1XL-01A-11D-A14K-09 7.763404e+01 2.055349e+01 7.951866e+00
#> 74: TCGA-D8-A27V-01A-12D-A17D-09 3.443723e-45 3.089542e+01 1.885408e+02
#> 75: TCGA-E2-A108-01A-13D-A10M-09 3.697383e+01 1.205971e+01 1.342740e+02
#> 76: TCGA-E2-A10F-01A-11D-A10M-09 1.114497e-181 2.407471e+01 5.142756e+00
#> 77: TCGA-E2-A14T-01A-11D-A10Y-09 1.818548e+01 1.757560e+01 7.623413e+00
#> 78: TCGA-E2-A152-01A-11D-A12B-09 3.004158e-05 1.904189e+01 1.453822e+02
#> 79: TCGA-E2-A15D-01A-11D-A10Y-09 1.239561e+01 2.367115e+01 1.043255e-01
#> 80: TCGA-E2-A15L-01A-11D-A12B-09 8.316254e-163 3.696156e+01 7.198468e+00
#> 81: TCGA-E2-A1BD-01A-11D-A12Q-09 1.325256e+01 2.837122e+01 2.099082e-01
#> 82: TCGA-E2-A1IH-01A-11D-A188-09 1.809753e+01 2.554536e+01 4.060186e+01
#> 83: TCGA-E2-A1II-01A-11D-A142-09 3.354037e+01 8.480730e-01 3.013800e-322
#> 84: TCGA-E2-A1IJ-01A-11D-A142-09 6.756677e+00 1.299390e+01 2.269978e-182
#> 85: TCGA-E2-A1L6-01A-11D-A13L-09 3.756995e+00 2.424052e+01 2.129975e-01
#> 86: TCGA-E2-A9RU-01A-11D-A41F-09 2.127087e+01 6.202103e+01 2.115210e+01
#> 87: TCGA-E9-A1NE-01A-21D-A14K-09 5.089464e+01 3.340837e+00 6.326526e+00
#> 88: TCGA-E9-A22A-01A-11D-A159-09 1.977296e+01 6.638223e+00 1.027258e+01
#> 89: TCGA-E9-A22E-01A-11D-A159-09 9.379149e+01 8.398077e+00 4.506802e+01
#> 90: TCGA-E9-A3QA-01A-61D-A228-09 3.295112e+01 3.947921e+00 2.814997e+01
#> 91: TCGA-E9-A5FL-01A-11D-A27P-09 9.173188e+01 1.165295e+01 2.949295e+01
#> 92: TCGA-EW-A1PA-01A-11D-A142-09 2.061895e+01 1.721686e+01 4.530208e+00
#> 93: TCGA-EW-A1PH-01A-11D-A14K-09 2.609288e+01 3.002828e+01 8.280088e+00
#> 94: TCGA-GM-A2DB-01A-31D-A19Y-09 8.511084e+01 2.993667e+01 1.374699e+01
#> 95: TCGA-LD-A9QF-01A-32D-A41F-09 1.531207e+01 2.246112e+01 1.427438e+00
#> 96: TCGA-LL-A5YP-01A-21D-A28B-09 7.894136e+01 6.635690e+00 2.079587e+01
#> 97: TCGA-LL-A73Z-01A-11D-A32I-09 2.442986e+01 2.675728e+01 2.533581e+00
#> 98: TCGA-OL-A5RY-01A-21D-A28B-09 7.440440e+00 7.048267e+00 1.414288e+00
#> 99: TCGA-PE-A5DD-01A-12D-A27P-09 5.246539e+00 3.625459e+01 4.294177e+01
#> 100: TCGA-S3-AA17-01A-11D-A41F-09 9.522553e+01 2.687635e+01 2.082856e+01
#> sample Sig1 Sig2 Sig3
For plotting signature profile and exposure profile, please go to chapter 8.
For optimizing signature exposure, please use sig_fit()
.