Get Sample Groups from Signature Decomposition Information

One of key results from signature analysis is to cluster samples into different groups. This function takes Signature object as input and return the membership in each cluster.

get_groups(
  Signature,
  method = c("consensus", "k-means", "exposure", "samples"),
  n_cluster = NULL,
  match_consensus = TRUE
)

Arguments

Signature

a Signature object obtained either from sig_extract or sig_auto_extract. Now it can be used to relative exposure result in data.table format from sig_fit.

method

grouping method, more see details, could be one of the following:

'consensus' - returns the cluster membership based on the hierarchical clustering of the consensus matrix, it can only be used for the result obtained by sig_extract() with multiple runs using NMF package.
'k-means' - returns the clusters by k-means.
'exposure' - assigns a sample into a group whose signature exposure is dominant.
'samples' - returns the cluster membership based on the contribution of signature to each sample, it can only be used for the result obtained by sig_extract() using NMF package.

n_cluster

only used when the method is 'k-means'.

match_consensus

only used when the method is 'consensus'. If TRUE, the result will match order as shown in consensus map.

Value

a data.table object

Details

Users may find there are bigger differences between using method 'samples' and 'exposure' but they use a similar idear to find dominant signature, here goes the reason:

Method 'samples' using data directly from NMF decomposition, this means the two matrix W (basis matrix or signature matrix) and H (coefficient matrix or exposure matrix) are the results of NMF. For method 'exposure', it uses the signature exposure loading matrix. In this situation, each signture represents a number of mutations (alterations) about implementation please see source code of sig_extract() function.

Examples

# \donttest{
# Load copy number prepare object
load(system.file("extdata", "toy_copynumber_tally_W.RData",
  package = "sigminer", mustWork = TRUE
))
# Extract copy number signatures
library(NMF)
#> Loading required package: registry
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#>   To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
sig <- sig_extract(cn_tally_W$nmf_matrix, 2,
  nrun = 10
)
#> NMF algorithm: 'brunet'
#> Multiple runs: 10
#> Mode: sequential [foreach:doParallelMC]
#> 
Runs: |                                                        
Runs: |                                                  |   0%
Runs: |                                                        
Runs: |=====                                             |   9%
Runs: |                                                        
Runs: |=========                                         |  18%
Runs: |                                                        
Runs: |==============                                    |  27%
Runs: |                                                        
Runs: |==================                                |  36%
Runs: |                                                        
Runs: |=======================                           |  45%
Runs: |                                                        
Runs: |===========================                       |  55%
Runs: |                                                        
Runs: |================================                  |  64%
Runs: |                                                        
Runs: |====================================              |  73%
Runs: |                                                        
Runs: |=========================================         |  82%
Runs: |                                                        
Runs: |=============================================     |  91%
Runs: |                                                        
Runs: |==================================================| 100%
#> System time:
#>    user  system elapsed 
#>   4.491   0.000   4.490 

# Methods 'consensus' and 'samples' are from NMF::predict()
g1 <- get_groups(sig, method = "consensus", match_consensus = TRUE)
#> ℹ [2024-08-04 14:38:58.265139]: Started.
#> ✔ [2024-08-04 14:38:58.26683]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.268319]: Obtaining clusters from the hierarchical clustering of the consensus matrix...
#> ℹ [2024-08-04 14:38:58.285673]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>    
#>     Sig1 Sig2
#>   1    0    2
#>   2    8    0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> ℹ [2024-08-04 14:38:58.30013]: Summarizing...
#> 	group #1: 2 samples with Sig2 enriched.
#> 	group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.302022]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.303385]: 0.038 secs elapsed.
g1
#>                           sample  group silhouette_width enrich_sig
#>                           <char> <char>            <num>     <char>
#>  1: TCGA-05-4417-01A-22D-1854-01      1            1.000       Sig2
#>  2: TCGA-99-7458-01A-11D-2035-01      1            0.986       Sig2
#>  3: TCGA-CV-7432-01A-11D-2128-01      2            0.986       Sig1
#>  4: TCGA-DF-A2KN-01A-11D-A17U-01      2            0.986       Sig1
#>  5: TCGA-B6-A0X5-01A-21D-A107-01      2            0.986       Sig1
#>  6: TCGA-A8-A07S-01A-11D-A036-01      2            0.986       Sig1
#>  7: TCGA-A5-A0G2-01A-11D-A042-01      2            0.986       Sig1
#>  8: TCGA-26-6174-01A-21D-1842-01      2            0.986       Sig1
#>  9: TCGA-06-0644-01A-02D-0310-01      2            1.000       Sig1
#> 10: TCGA-19-2621-01B-01D-0911-01      2            0.889       Sig1
g2 <- get_groups(sig, method = "samples")
#> ℹ [2024-08-04 14:38:58.307234]: Started.
#> ✔ [2024-08-04 14:38:58.308626]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.309989]: Obtaining clusters by the contribution of signature to each sample...
#> ℹ [2024-08-04 14:38:58.312675]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>    
#>     Sig1 Sig2
#>   1    0    2
#>   2    8    0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> ℹ [2024-08-04 14:38:58.326211]: Summarizing...
#> 	group #1: 2 samples with Sig2 enriched.
#> 	group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.328055]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.329458]: 0.022 secs elapsed.
g2
#>                           sample  group silhouette_width  prob enrich_sig
#>                           <char> <char>            <num> <num>     <char>
#>  1: TCGA-05-4417-01A-22D-1854-01      1                1 1.000       Sig2
#>  2: TCGA-06-0644-01A-02D-0310-01      2                1 0.787       Sig1
#>  3: TCGA-19-2621-01B-01D-0911-01      2                1 1.000       Sig1
#>  4: TCGA-26-6174-01A-21D-1842-01      2                1 1.000       Sig1
#>  5: TCGA-99-7458-01A-11D-2035-01      1                1 0.679       Sig2
#>  6: TCGA-A5-A0G2-01A-11D-A042-01      2                1 0.598       Sig1
#>  7: TCGA-A8-A07S-01A-11D-A036-01      2                1 0.975       Sig1
#>  8: TCGA-B6-A0X5-01A-21D-A107-01      2                1 1.000       Sig1
#>  9: TCGA-CV-7432-01A-11D-2128-01      2                1 0.544       Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01      2                1 1.000       Sig1

# Use k-means clustering
g3 <- get_groups(sig, method = "k-means")
#> ℹ [2024-08-04 14:38:58.333421]: Started.
#> ✔ [2024-08-04 14:38:58.334835]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.338967]: Running k-means with 2 clusters...
#> ℹ [2024-08-04 14:38:58.34221]: Generating a table of group and signature contribution (stored in 'map_table' attr):
#>        Sig1      Sig2
#> 1 0.2097559 0.7901116
#> 2 0.8964984 0.1035016
#> ℹ [2024-08-04 14:38:58.34429]: Assigning a group to a signature with the maximum fraction...
#> ℹ [2024-08-04 14:38:58.34854]: Summarizing...
#> 	group #1: 2 samples with Sig2 enriched.
#> 	group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.350393]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.351754]: 0.018 secs elapsed.
g3
#> Key: <group>
#>                           sample  group silhouette_width enrich_sig
#>                           <char> <char>            <num>     <char>
#>  1: TCGA-05-4417-01A-22D-1854-01      1            0.532       Sig2
#>  2: TCGA-99-7458-01A-11D-2035-01      1            0.121       Sig2
#>  3: TCGA-06-0644-01A-02D-0310-01      2            0.755       Sig1
#>  4: TCGA-19-2621-01B-01D-0911-01      2            0.850       Sig1
#>  5: TCGA-26-6174-01A-21D-1842-01      2            0.850       Sig1
#>  6: TCGA-A5-A0G2-01A-11D-A042-01      2            0.493       Sig1
#>  7: TCGA-A8-A07S-01A-11D-A036-01      2            0.847       Sig1
#>  8: TCGA-B6-A0X5-01A-21D-A107-01      2            0.850       Sig1
#>  9: TCGA-CV-7432-01A-11D-2128-01      2            0.341       Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01      2            0.850       Sig1
# }

Get Sample Groups from Signature Decomposition Information

Arguments

Value

Details

See also

Examples