Exploring copy number signatures with recently developed approach have been described at The repertoire of copy number alteration signatures in human cancer.

A more general introduction please read Extract, Analyze and Visualize Mutational Signatures with Sigminer.

library(sigminer)
#> Registered S3 method overwritten by 'sigminer':
#>   method      from
#>   print.bytes Rcpp
#> sigminer version 2.3.2
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.

For this analysis, data with six columns are required.

  • Chromosome
  • Start.bp
  • End.bp
  • modal_cn (i.e. total copy number, integer)
  • minor_cn (i.e. copy number for minor allele, integer)
  • sample

Generate allele-specific copy number profile

load(system.file("extdata", "toy_segTab.RData",
  package = "sigminer", mustWork = TRUE
))

set.seed(1234)
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)
cn <- read_copynumber(segTabs,
  seg_cols = c("chromosome", "start", "end", "segVal"),
  genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#>  [2024-08-04 14:45:58.716777]: Started.
#>  [2024-08-04 14:45:58.728173]: Genome build  : hg19.
#>  [2024-08-04 14:45:58.729709]: Genome measure: wg.
#>  [2024-08-04 14:45:58.731094]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#>  [2024-08-04 14:45:58.751582]: Chromosome size database for build obtained.
#>  [2024-08-04 14:45:58.753443]: Reading input.
#>  [2024-08-04 14:45:58.75492]: A data frame as input detected.
#>  [2024-08-04 14:45:58.756641]: Column names checked.
#>  [2024-08-04 14:45:58.758524]: Column order set.
#>  [2024-08-04 14:45:58.766277]: Chromosomes unified.
#>  [2024-08-04 14:45:58.78163]: Value 2 (normal copy) filled to uncalled chromosomes.
#>  [2024-08-04 14:45:58.786494]: Data imported.
#>  [2024-08-04 14:45:58.78807]: Segments info:
#>  [2024-08-04 14:45:58.789525]:     Keep - 477
#>  [2024-08-04 14:45:58.790912]:     Drop - 0
#>  [2024-08-04 14:45:58.792772]: Segments sorted.
#>  [2024-08-04 14:45:58.794201]: Adding LOH labels...
#>  [2024-08-04 14:45:58.79635]: Joining adjacent segments with same copy number value. Be patient...
#>  [2024-08-04 14:45:58.916107]: 410 segments left after joining.
#>  [2024-08-04 14:45:58.918333]: Segmental table cleaned.
#>  [2024-08-04 14:45:58.919989]: Annotating.
#>  [2024-08-04 14:45:58.935855]: Annotation done.
#>  [2024-08-04 14:45:58.937619]: Summarizing per sample.
#>  [2024-08-04 14:45:58.958541]: Summarized.
#>  [2024-08-04 14:45:58.960272]: Generating CopyNumber object.
#>  [2024-08-04 14:45:58.962438]: Generated.
#>  [2024-08-04 14:45:58.963997]: Validating object.
#>  [2024-08-04 14:45:58.965596]: Done.
#>  [2024-08-04 14:45:58.967371]: 0.251 secs elapsed.
cn
#> An object of class CopyNumber 
#> =============================
#>                           sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr
#>                           <char>    <int>    <int>    <int>    <int>     <int>
#>  1: TCGA-DF-A2KN-01A-11D-A17U-01       34        6        5        1         4
#>  2: TCGA-19-2621-01B-01D-0911-01       34        8        5        3         5
#>  3: TCGA-B6-A0X5-01A-21D-A107-01       29        8        4        4         2
#>  4: TCGA-A8-A07S-01A-11D-A036-01       39       11        2        9         4
#>  5: TCGA-26-6174-01A-21D-1842-01       44       13        8        5         8
#>  6: TCGA-CV-7432-01A-11D-2128-01       41       16        7        9         9
#>  7: TCGA-06-0644-01A-02D-0310-01       47       19        5       14         8
#>  8: TCGA-A5-A0G2-01A-11D-A042-01       40       21        5       16        10
#>  9: TCGA-99-7458-01A-11D-2035-01       49       26       10       16        13
#> 10: TCGA-05-4417-01A-22D-1854-01       53       37       33        4        17
#>     n_loh cna_burden
#>     <int>      <num>
#>  1:    15      0.000
#>  2:    20      0.095
#>  3:    18      0.083
#>  4:    21      0.106
#>  5:    24      0.113
#>  6:    24      0.188
#>  7:    33      0.158
#>  8:    23      0.375
#>  9:    33      0.304
#> 10:    29      0.617
cn@data
#>      chromosome     start       end segVal                       sample
#>          <char>     <num>     <num>  <int>                       <char>
#>   1:       chr1   3218923 116319008      2 TCGA-05-4417-01A-22D-1854-01
#>   2:       chr1 116324707 120523902      1 TCGA-05-4417-01A-22D-1854-01
#>   3:       chr1 149879545 247812431      4 TCGA-05-4417-01A-22D-1854-01
#>   4:      chr10    423671 135224372      3 TCGA-05-4417-01A-22D-1854-01
#>   5:      chr11    458784  19461653      3 TCGA-05-4417-01A-22D-1854-01
#>  ---                                                                   
#> 406:       chr6   1016984 170898549      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 407:       chr7    746917 158385118      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 408:       chr8    617885 145225107      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 409:       chr9    790234 140938075      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 410:       chrX         1 155270560      2 TCGA-DF-A2KN-01A-11D-A17U-01
#>       minor_cn    loh .loh_frac
#>          <num> <lgcl>     <num>
#>   1: 1.0000000  FALSE        NA
#>   2: 0.0000000   TRUE        NA
#>   3: 0.5000000   TRUE 0.1175943
#>   4: 1.0000000  FALSE        NA
#>   5: 1.0000000  FALSE        NA
#>  ---                           
#> 406: 0.3333333   TRUE 0.9979494
#> 407: 1.0000000  FALSE        NA
#> 408: 1.0000000  FALSE        NA
#> 409: 0.5000000   TRUE 0.8328715
#> 410:        NA  FALSE        NA

Classify the segments with Steele et al method

If you want to try other type of copy number signatures, change the method argument.

tally_s <- sig_tally(cn, method = "S")
#>  [2024-08-04 14:45:59.11103]: Started.
#>  [2024-08-04 14:45:59.115752]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!
#>  [2024-08-04 14:45:59.135223]: Matrix generated.
#>  [2024-08-04 14:45:59.137442]: 0.026 secs elapsed.

str(tally_s$all_matrices, max.level = 1)
#> List of 2
#>  $ CN_40: int [1:10, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ CN_48: int [1:10, 1:48] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2

Find de novo signatures

sig_denovo = sig_auto_extract(tally_s$all_matrices$CN_48)
#> Select Run 3, which K = 2 as best solution.
head(sig_denovo$Signature)
#>                         Sig1          Sig2
#> 0:homdel:0-100Kb    0.000000  0.000000e+00
#> 0:homdel:100Kb-1Mb  0.000000  0.000000e+00
#> 0:homdel:>1Mb       0.000000  0.000000e+00
#> 1:LOH:0-100Kb       3.609460 3.819129e-242
#> 1:LOH:100Kb-1Mb     6.316554 2.814800e-127
#> 1:LOH:1Mb-10Mb     13.535473 2.784288e-190

Refit (19) reference signatures

This directly calculates the contribution of 19 reference signatures.

act_refit = sig_fit(t(tally_s$all_matrices$CN_48), sig_index = "ALL", sig_db = "CNS_TCGA")
#>  [2024-08-04 14:45:59.927964]: Started.
#>  [2024-08-04 14:45:59.929946]: Signature index detected.
#>  [2024-08-04 14:45:59.9315]: Checking signature database in package.
#>  [2024-08-04 14:45:59.934048]: Checking signature index.
#>  [2024-08-04 14:45:59.935546]: Valid index for db 'CNS_TCGA':
#> CN1 CN2 CN3 CN4 CN5 CN6 CN7 CN8 CN9 CN10 CN11 CN12 CN13 CN14 CN15 CN16 CN17 CN18 CN19
#>  [2024-08-04 14:45:59.937105]: Database and index checked.
#>  [2024-08-04 14:45:59.938807]: Signature normalized.
#>  [2024-08-04 14:45:59.940317]: Checking row number for catalog matrix and signature matrix.
#>  [2024-08-04 14:45:59.941766]: Checked.
#>  [2024-08-04 14:45:59.943239]: Checking rownames for catalog matrix and signature matrix.
#>  [2024-08-04 14:45:59.94469]: Checked.
#>  [2024-08-04 14:45:59.946114]: Method 'QP' detected.
#>  [2024-08-04 14:45:59.950082]: Corresponding function generated.
#>  [2024-08-04 14:45:59.95166]: Calling function.
#>  [2024-08-04 14:45:59.953614]: Fitting sample: TCGA-05-4417-01A-22D-1854-01
#>  [2024-08-04 14:45:59.955453]: Fitting sample: TCGA-06-0644-01A-02D-0310-01
#>  [2024-08-04 14:45:59.957003]: Fitting sample: TCGA-19-2621-01B-01D-0911-01
#>  [2024-08-04 14:45:59.958566]: Fitting sample: TCGA-26-6174-01A-21D-1842-01
#>  [2024-08-04 14:45:59.960084]: Fitting sample: TCGA-99-7458-01A-11D-2035-01
#>  [2024-08-04 14:45:59.961614]: Fitting sample: TCGA-A5-A0G2-01A-11D-A042-01
#>  [2024-08-04 14:45:59.963142]: Fitting sample: TCGA-A8-A07S-01A-11D-A036-01
#>  [2024-08-04 14:45:59.964627]: Fitting sample: TCGA-B6-A0X5-01A-21D-A107-01
#>  [2024-08-04 14:45:59.966164]: Fitting sample: TCGA-CV-7432-01A-11D-2128-01
#>  [2024-08-04 14:45:59.96765]: Fitting sample: TCGA-DF-A2KN-01A-11D-A17U-01
#>  [2024-08-04 14:45:59.969163]: Done.
#>  [2024-08-04 14:45:59.970613]: Generating output signature exposures.
#>  [2024-08-04 14:45:59.972791]: Done.
#>  [2024-08-04 14:45:59.974369]: 0.046 secs elapsed.

We can use some threshold to keep really contributed signautres.

act_refit2 = act_refit[apply(act_refit, 1, function(x) sum(x) > 0.1),]

rownames(act_refit2)
#>  [1] "CN1"  "CN2"  "CN3"  "CN4"  "CN9"  "CN11" "CN12" "CN13" "CN14" "CN19"

Plot signatures

For de novo signatures:

show_sig_profile(sig_denovo, mode = "copynumber", method = "S", style = "cosmic")

Show the activity/exposure.

show_sig_exposure(sig_denovo)

For reference signatures, you can just select what you want:

show_sig_profile(
  get_sig_db("CNS_TCGA")$db[, rownames(act_refit2)],
  style = "cosmic", 
  mode = "copynumber", method = "S", check_sig_names = FALSE)

Similarly for showing activity.

show_sig_exposure(act_refit2)

NOTE that this case shows relatively large difference with different approaches, so you need to pick based on your data size/quality and double-check the results. In general, for small-size data set, the refitting approach is recommended.

Signature assignment

To assign the de-novo signatures to reference signatures, we use cosine similarity.

get_sig_similarity(sig_denovo, sig_db = "CNS_TCGA")
#> -Comparing against COSMIC signatures
#> ------------------------------------
#> --Found Sig1 most similar to CN1
#>    Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.706]
#> --Found Sig2 most similar to CN2
#>    Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.771]
#> ------------------------------------
#> Return result invisiblely.