One of key results from signature analysis is to cluster samples into different
groups. This function takes Signature
object as input
and return the membership in each cluster.
get_groups(
Signature,
method = c("consensus", "k-means", "exposure", "samples"),
n_cluster = NULL,
match_consensus = TRUE
)
a Signature
object obtained either from sig_extract or sig_auto_extract.
Now it can be used to relative exposure result in data.table
format from sig_fit.
grouping method, more see details, could be one of the following:
'consensus' - returns the cluster membership based on the hierarchical clustering of the consensus matrix,
it can only be used for the result obtained by sig_extract()
with multiple runs using NMF package.
'k-means' - returns the clusters by k-means.
'exposure' - assigns a sample into a group whose signature exposure is dominant.
'samples' - returns the cluster membership based on the contribution of signature to each sample,
it can only be used for the result obtained by sig_extract()
using NMF package.
only used when the method
is 'k-means'.
only used when the method
is 'consensus'.
If TRUE
, the result will match order as shown in consensus map.
a data.table
object
Users may find there are bigger differences between using method 'samples' and 'exposure' but they use a similar idear to find dominant signature, here goes the reason:
Method 'samples' using data directly from NMF decomposition, this means the two matrix
W
(basis matrix or signature matrix) and H
(coefficient matrix or exposure matrix) are
the results of NMF. For method 'exposure', it uses the signature exposure loading matrix.
In this situation, each signture represents a number of mutations (alterations)
about implementation please see source code of sig_extract()
function.
# \donttest{
# Load copy number prepare object
load(system.file("extdata", "toy_copynumber_tally_W.RData",
package = "sigminer", mustWork = TRUE
))
# Extract copy number signatures
library(NMF)
#> Loading required package: registry
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#> To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
sig <- sig_extract(cn_tally_W$nmf_matrix, 2,
nrun = 10
)
#> NMF algorithm: 'brunet'
#> Multiple runs: 10
#> Mode: sequential [foreach:doParallelMC]
#>
Runs: |
Runs: | | 0%
Runs: |
Runs: |===== | 9%
Runs: |
Runs: |========= | 18%
Runs: |
Runs: |============== | 27%
Runs: |
Runs: |================== | 36%
Runs: |
Runs: |======================= | 45%
Runs: |
Runs: |=========================== | 55%
Runs: |
Runs: |================================ | 64%
Runs: |
Runs: |==================================== | 73%
Runs: |
Runs: |========================================= | 82%
Runs: |
Runs: |============================================= | 91%
Runs: |
Runs: |==================================================| 100%
#> System time:
#> user system elapsed
#> 4.491 0.000 4.490
# Methods 'consensus' and 'samples' are from NMF::predict()
g1 <- get_groups(sig, method = "consensus", match_consensus = TRUE)
#> ℹ [2024-08-04 14:38:58.265139]: Started.
#> ✔ [2024-08-04 14:38:58.26683]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.268319]: Obtaining clusters from the hierarchical clustering of the consensus matrix...
#> ℹ [2024-08-04 14:38:58.285673]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>
#> Sig1 Sig2
#> 1 0 2
#> 2 8 0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> ℹ [2024-08-04 14:38:58.30013]: Summarizing...
#> group #1: 2 samples with Sig2 enriched.
#> group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.302022]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.303385]: 0.038 secs elapsed.
g1
#> sample group silhouette_width enrich_sig
#> <char> <char> <num> <char>
#> 1: TCGA-05-4417-01A-22D-1854-01 1 1.000 Sig2
#> 2: TCGA-99-7458-01A-11D-2035-01 1 0.986 Sig2
#> 3: TCGA-CV-7432-01A-11D-2128-01 2 0.986 Sig1
#> 4: TCGA-DF-A2KN-01A-11D-A17U-01 2 0.986 Sig1
#> 5: TCGA-B6-A0X5-01A-21D-A107-01 2 0.986 Sig1
#> 6: TCGA-A8-A07S-01A-11D-A036-01 2 0.986 Sig1
#> 7: TCGA-A5-A0G2-01A-11D-A042-01 2 0.986 Sig1
#> 8: TCGA-26-6174-01A-21D-1842-01 2 0.986 Sig1
#> 9: TCGA-06-0644-01A-02D-0310-01 2 1.000 Sig1
#> 10: TCGA-19-2621-01B-01D-0911-01 2 0.889 Sig1
g2 <- get_groups(sig, method = "samples")
#> ℹ [2024-08-04 14:38:58.307234]: Started.
#> ✔ [2024-08-04 14:38:58.308626]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.309989]: Obtaining clusters by the contribution of signature to each sample...
#> ℹ [2024-08-04 14:38:58.312675]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>
#> Sig1 Sig2
#> 1 0 2
#> 2 8 0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> ℹ [2024-08-04 14:38:58.326211]: Summarizing...
#> group #1: 2 samples with Sig2 enriched.
#> group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.328055]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.329458]: 0.022 secs elapsed.
g2
#> sample group silhouette_width prob enrich_sig
#> <char> <char> <num> <num> <char>
#> 1: TCGA-05-4417-01A-22D-1854-01 1 1 1.000 Sig2
#> 2: TCGA-06-0644-01A-02D-0310-01 2 1 0.787 Sig1
#> 3: TCGA-19-2621-01B-01D-0911-01 2 1 1.000 Sig1
#> 4: TCGA-26-6174-01A-21D-1842-01 2 1 1.000 Sig1
#> 5: TCGA-99-7458-01A-11D-2035-01 1 1 0.679 Sig2
#> 6: TCGA-A5-A0G2-01A-11D-A042-01 2 1 0.598 Sig1
#> 7: TCGA-A8-A07S-01A-11D-A036-01 2 1 0.975 Sig1
#> 8: TCGA-B6-A0X5-01A-21D-A107-01 2 1 1.000 Sig1
#> 9: TCGA-CV-7432-01A-11D-2128-01 2 1 0.544 Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01 2 1 1.000 Sig1
# Use k-means clustering
g3 <- get_groups(sig, method = "k-means")
#> ℹ [2024-08-04 14:38:58.333421]: Started.
#> ✔ [2024-08-04 14:38:58.334835]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.338967]: Running k-means with 2 clusters...
#> ℹ [2024-08-04 14:38:58.34221]: Generating a table of group and signature contribution (stored in 'map_table' attr):
#> Sig1 Sig2
#> 1 0.2097559 0.7901116
#> 2 0.8964984 0.1035016
#> ℹ [2024-08-04 14:38:58.34429]: Assigning a group to a signature with the maximum fraction...
#> ℹ [2024-08-04 14:38:58.34854]: Summarizing...
#> group #1: 2 samples with Sig2 enriched.
#> group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.350393]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.351754]: 0.018 secs elapsed.
g3
#> Key: <group>
#> sample group silhouette_width enrich_sig
#> <char> <char> <num> <char>
#> 1: TCGA-05-4417-01A-22D-1854-01 1 0.532 Sig2
#> 2: TCGA-99-7458-01A-11D-2035-01 1 0.121 Sig2
#> 3: TCGA-06-0644-01A-02D-0310-01 2 0.755 Sig1
#> 4: TCGA-19-2621-01B-01D-0911-01 2 0.850 Sig1
#> 5: TCGA-26-6174-01A-21D-1842-01 2 0.850 Sig1
#> 6: TCGA-A5-A0G2-01A-11D-A042-01 2 0.493 Sig1
#> 7: TCGA-A8-A07S-01A-11D-A036-01 2 0.847 Sig1
#> 8: TCGA-B6-A0X5-01A-21D-A107-01 2 0.850 Sig1
#> 9: TCGA-CV-7432-01A-11D-2128-01 2 0.341 Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01 2 0.850 Sig1
# }