The goal of VSHunter is to capture variation signature from genomic data. For now, we decode copy number pattern from absolute copy number profile. This package collects R code from paper Copy number signatures and mutational processes in ovarian carcinoma and tidy them as a open source R package for bioinformatics community.

Before you use this tool, you have to obtain absolute copy number profile for samples via software like ABSOLUTE v2, QDNASeq etc..

Procedure

  1. summarise copy-number profile using a number of different feature distributions:
    • Sgement size
    • Breakpoint number (per ten megabases)
    • change-point copy-number
    • Breakpoint number (per chromosome arm)
    • Length of segments with oscillating copy-number
  2. apply mixture modelling to breakdown each feature distribution into mixtures of Gaussian or mixtures of Poisson distributions using the flexmix package.
  3. generate a sample-by-component matrix representing the sum of posterior probabilities of each copy-number event being assigned to each component.
  4. use NMF package to factorise the sample-by-component matrix into a signature-by-sample matrix and component-by signature-matrix.
Copy number signature identification, Macintyre, Geoff, et al.(2018)

Copy number signature identification, Macintyre, Geoff, et al.(2018)

Installation

You can install UCSCXenaTools from github with:

# install.packages("devtools")
devtools::install_github("ShixiangWang/VSHunter", build_vignettes = TRUE)

Load package.

Usage

Load example data:

tcga_segTabs is a list contain absolute copy number profile for multiple samples, each sample is a data.frame in the list.

Derive feature distributions

tcga_features = cnv_derivefeatures(CN_data = tcga_segTabs, cores = 1, genome_build = "hg19")

Fit model components

tcga_components = cnv_fitMixModels(CN_features = tcga_features, cores = 4)

Generate a sample-by-component matrix

Generate a sample-by-component matrix representing the sum of posterior probabilities of each copy-number event being assigned to each component.

tcga_sample_component_matrix = cnv_generateSbCMatrix(tcga_features, tcga_components, cores = 4)

Choose optimal number of signatures.

tcga_sig_choose = cnv_chooseSigNumber(tcga_sample_component_matrix, nrun = 10, cores = 4)

Do not test a randomise data (save time).

tcga_sig_choose2 = cnv_chooseSigNumber(tcga_sample_component_matrix, nrun = 10, cores = 4, testRandom = FALSE)

Extract signatures

tcga_signatures = cnv_extractSignatures(tcga_sample_component_matrix, nsig = 3, cores = 4)

Quantify exposure for samples

w = NMF::basis(tcga_signatures)
#h = NMF::coef(tcga_signatures)
tcga_exposure = cnv_quantifySigExposure(sample_by_component = tcga_sample_component_matrix, component_by_signature = w)

Auto-capture signatures

Function cnv_autoCaptureSignatures() finish three steps (choose best number of signatures, extract signatures and quantify exposure) above in an antomated way. The arguments of this function are same as cnv_chooseSigNumber().

tcga_results = cnv_autoCaptureSignatures(tcga_sample_component_matrix, nrun=10, cores = 4)

The result object is a list which contains all results need fro downstream analysis, include NMF result related to best rank value, signature matrix, absolute and relative exposure (contribution) and best rank survey etc..

Citation

  • Macintyre, Geoff, et al. “Copy number signatures and mutational processes in ovarian carcinoma.” Nature genetics 50.9 (2018): 1262.

If you wanna thank my work for this package, you can also cite (and inlucde link of this package - https://github.com/ShixiangWang/VSHunter):

  • Wang, Shixiang, et al. “APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer.” Oncogene (2018).