One of key results from signature analysis is to cluster samples into different groups. This function takes Signature object as input and return the membership in each cluster.

get_groups(
  Signature,
  method = c("consensus", "k-means", "exposure", "samples"),
  n_cluster = NULL,
  match_consensus = TRUE
)

Arguments

Signature

a Signature object obtained either from sig_extract or sig_auto_extract. Now it can be used to relative exposure result in data.table format from sig_fit.

method

grouping method, more see details, could be one of the following:

  • 'consensus' - returns the cluster membership based on the hierarchical clustering of the consensus matrix, it can only be used for the result obtained by sig_extract() with multiple runs using NMF package.

  • 'k-means' - returns the clusters by k-means.

  • 'exposure' - assigns a sample into a group whose signature exposure is dominant.

  • 'samples' - returns the cluster membership based on the contribution of signature to each sample, it can only be used for the result obtained by sig_extract() using NMF package.

n_cluster

only used when the method is 'k-means'.

match_consensus

only used when the method is 'consensus'. If TRUE, the result will match order as shown in consensus map.

Value

a data.table object

Details

Users may find there are bigger differences between using method 'samples' and 'exposure' but they use a similar idear to find dominant signature, here goes the reason:

Method 'samples' using data directly from NMF decomposition, this means the two matrix W (basis matrix or signature matrix) and H (coefficient matrix or exposure matrix) are the results of NMF. For method 'exposure', it uses the signature exposure loading matrix. In this situation, each signture represents a number of mutations (alterations) about implementation please see source code of sig_extract() function.

Examples

# \donttest{
# Load copy number prepare object
load(system.file("extdata", "toy_copynumber_tally_W.RData",
  package = "sigminer", mustWork = TRUE
))
# Extract copy number signatures
library(NMF)
#> Loading required package: registry
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#>   To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
sig <- sig_extract(cn_tally_W$nmf_matrix, 2,
  nrun = 10
)
#> NMF algorithm: 'brunet'
#> Multiple runs: 10
#> Mode: sequential [foreach:doParallelMC]
#> 
Runs: |                                                        
Runs: |                                                  |   0%
Runs: |                                                        
Runs: |=====                                             |   9%
Runs: |                                                        
Runs: |=========                                         |  18%
Runs: |                                                        
Runs: |==============                                    |  27%
Runs: |                                                        
Runs: |==================                                |  36%
Runs: |                                                        
Runs: |=======================                           |  45%
Runs: |                                                        
Runs: |===========================                       |  55%
Runs: |                                                        
Runs: |================================                  |  64%
Runs: |                                                        
Runs: |====================================              |  73%
Runs: |                                                        
Runs: |=========================================         |  82%
Runs: |                                                        
Runs: |=============================================     |  91%
Runs: |                                                        
Runs: |==================================================| 100%
#> System time:
#>    user  system elapsed 
#>   4.491   0.000   4.490 

# Methods 'consensus' and 'samples' are from NMF::predict()
g1 <- get_groups(sig, method = "consensus", match_consensus = TRUE)
#>  [2024-08-04 14:38:58.265139]: Started.
#>  [2024-08-04 14:38:58.26683]: 'Signature' object detected.
#>  [2024-08-04 14:38:58.268319]: Obtaining clusters from the hierarchical clustering of the consensus matrix...
#>  [2024-08-04 14:38:58.285673]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>    
#>     Sig1 Sig2
#>   1    0    2
#>   2    8    0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#>  [2024-08-04 14:38:58.30013]: Summarizing...
#> 	group #1: 2 samples with Sig2 enriched.
#> 	group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.302022]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#>  [2024-08-04 14:38:58.303385]: 0.038 secs elapsed.
g1
#>                           sample  group silhouette_width enrich_sig
#>                           <char> <char>            <num>     <char>
#>  1: TCGA-05-4417-01A-22D-1854-01      1            1.000       Sig2
#>  2: TCGA-99-7458-01A-11D-2035-01      1            0.986       Sig2
#>  3: TCGA-CV-7432-01A-11D-2128-01      2            0.986       Sig1
#>  4: TCGA-DF-A2KN-01A-11D-A17U-01      2            0.986       Sig1
#>  5: TCGA-B6-A0X5-01A-21D-A107-01      2            0.986       Sig1
#>  6: TCGA-A8-A07S-01A-11D-A036-01      2            0.986       Sig1
#>  7: TCGA-A5-A0G2-01A-11D-A042-01      2            0.986       Sig1
#>  8: TCGA-26-6174-01A-21D-1842-01      2            0.986       Sig1
#>  9: TCGA-06-0644-01A-02D-0310-01      2            1.000       Sig1
#> 10: TCGA-19-2621-01B-01D-0911-01      2            0.889       Sig1
g2 <- get_groups(sig, method = "samples")
#>  [2024-08-04 14:38:58.307234]: Started.
#>  [2024-08-04 14:38:58.308626]: 'Signature' object detected.
#>  [2024-08-04 14:38:58.309989]: Obtaining clusters by the contribution of signature to each sample...
#>  [2024-08-04 14:38:58.312675]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>    
#>     Sig1 Sig2
#>   1    0    2
#>   2    8    0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#>  [2024-08-04 14:38:58.326211]: Summarizing...
#> 	group #1: 2 samples with Sig2 enriched.
#> 	group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.328055]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#>  [2024-08-04 14:38:58.329458]: 0.022 secs elapsed.
g2
#>                           sample  group silhouette_width  prob enrich_sig
#>                           <char> <char>            <num> <num>     <char>
#>  1: TCGA-05-4417-01A-22D-1854-01      1                1 1.000       Sig2
#>  2: TCGA-06-0644-01A-02D-0310-01      2                1 0.787       Sig1
#>  3: TCGA-19-2621-01B-01D-0911-01      2                1 1.000       Sig1
#>  4: TCGA-26-6174-01A-21D-1842-01      2                1 1.000       Sig1
#>  5: TCGA-99-7458-01A-11D-2035-01      1                1 0.679       Sig2
#>  6: TCGA-A5-A0G2-01A-11D-A042-01      2                1 0.598       Sig1
#>  7: TCGA-A8-A07S-01A-11D-A036-01      2                1 0.975       Sig1
#>  8: TCGA-B6-A0X5-01A-21D-A107-01      2                1 1.000       Sig1
#>  9: TCGA-CV-7432-01A-11D-2128-01      2                1 0.544       Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01      2                1 1.000       Sig1

# Use k-means clustering
g3 <- get_groups(sig, method = "k-means")
#>  [2024-08-04 14:38:58.333421]: Started.
#>  [2024-08-04 14:38:58.334835]: 'Signature' object detected.
#>  [2024-08-04 14:38:58.338967]: Running k-means with 2 clusters...
#>  [2024-08-04 14:38:58.34221]: Generating a table of group and signature contribution (stored in 'map_table' attr):
#>        Sig1      Sig2
#> 1 0.2097559 0.7901116
#> 2 0.8964984 0.1035016
#>  [2024-08-04 14:38:58.34429]: Assigning a group to a signature with the maximum fraction...
#>  [2024-08-04 14:38:58.34854]: Summarizing...
#> 	group #1: 2 samples with Sig2 enriched.
#> 	group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.350393]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#>  [2024-08-04 14:38:58.351754]: 0.018 secs elapsed.
g3
#> Key: <group>
#>                           sample  group silhouette_width enrich_sig
#>                           <char> <char>            <num>     <char>
#>  1: TCGA-05-4417-01A-22D-1854-01      1            0.532       Sig2
#>  2: TCGA-99-7458-01A-11D-2035-01      1            0.121       Sig2
#>  3: TCGA-06-0644-01A-02D-0310-01      2            0.755       Sig1
#>  4: TCGA-19-2621-01B-01D-0911-01      2            0.850       Sig1
#>  5: TCGA-26-6174-01A-21D-1842-01      2            0.850       Sig1
#>  6: TCGA-A5-A0G2-01A-11D-A042-01      2            0.493       Sig1
#>  7: TCGA-A8-A07S-01A-11D-A036-01      2            0.847       Sig1
#>  8: TCGA-B6-A0X5-01A-21D-A107-01      2            0.850       Sig1
#>  9: TCGA-CV-7432-01A-11D-2128-01      2            0.341       Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01      2            0.850       Sig1
# }