• Fixed a bug that generating a wrong data type when only a sample is handled (#463). Thanks to @selkamand.
  • Supported human T2T genome and corresponding annotation data.
  • Updated COSMIC database to v3.4. SV and RNA-SBS signatures are included.
get_sig_db("latest_RNA-SBS_GRCh37")
get_sig_db("latest_SV_GRCh38")
  • Fixed a bug in generating matrix for variation categories with strand bias due to problematic counting. (#445)
  • Updated pkg doc following the new CRAN feature (thanks to K from the CRAN team).
  • Added samps option to show_sig_exposure().

Example:

load(system.file("extdata", "toy_mutational_signature.RData",
                 package = "sigminer", mustWork = TRUE
))
# Show signature exposure
p1 <- show_sig_exposure(sig2, rm_space = TRUE)
p1

expo = sig_exposure(sig2)
show_sig_exposure(expo,
                  rm_space = TRUE,
                  samps = colnames(expo)[order(colSums(expo))])
  • Fixed the error in generating SBS matrix when only one sample input (#432).
  • Removed package ‘copynumber’ from suggests filed.
  • Supported Ziyu Tao et al approach for copy number segment classification.
  • Supported ce11 genome in read_vcf().
  • Added read_maf_minimal() to support a minimal MAF-like data as input.
  • Fixed the issue about the latest CN signatures from COSMIC have inconsistent labels with built-in CN signatures (#421).
  • Fixed the bug about plotting CN chromosome distribution (#420, thanks to @jrcodina96).
  • Added a vignette to introduce the analysis of copy number signatures.
  • Updated CNS_TCGA.
  • Enhanced group_enrichment() with reference group support.

Example:

set.seed(1234)
df <- dplyr::tibble(
  g1 = rep(LETTERS[1:3], c(50, 40, 10)),
  g2 = rep(c("AA", "VV", "XX"), c(50, 40, 10)),
  e1 = sample(c("P", "N"), 100, replace = TRUE),
  e2 = rnorm(100)
)

x1 = group_enrichment(df, grp_vars = c("g1", "g2"), 
                      enrich_vars = c("e1", "e2"), 
                      ref_group = c("B", "VV"))
x1
  • Added option for reading ASCAT objects in parallel.
  • Fixed error in extracting invalid regions (#396, thanks to @KirsieMin).
  • Added sig_unify_extract() as an unified signature extractor.
  • Fixed error showing reference signature profile for CNS_TCGA database.
  • Implemented Cohen-Sharir method-like Aneuploidy Score.
  • Enhanced error handling in show_sig_feature_corrplot() (#376).
  • Fixed INDEL classification.
  • Fixed end position determination in read_vcf().
  • Updated INDEL adjustment.
  • Included TCGA copy number signatures from SigProfiler.
  • Updated docs.
  • Fixed output_sig() error in handling exposure plot with >9 signatures (#366).
  • Added limitsize = FALSE for ggsave() or ggsave2() for handling big figure.
  • Supported mm9 genome build.
  • Removed FTP link as CRAN suggested (#359).
  • Updated README.

BUG REPORTS

  • Fixed the SigProfiler installation error due to Python version in conda environment.
  • Fixed classification bug due to repeated function name call_component.
  • Fixed the bug when read_vcf() with ## commented VCF files.

ENHANCEMENTS

  • Added support for latest COSMIC v3.2 as reference signatures. You can obtain them by
for (i in c("latest_SBS_GRCh37", "latest_DBS_GRCh37", "latest_ID_GRCh37",
            "latest_SBS_GRCh38", "latest_DBS_GRCh38",
            "latest_SBS_mm9", "latest_DBS_mm9",
            "latest_SBS_mm10", "latest_DBS_mm10",
            "latest_SBS_rn6", "latest_DBS_rn6")) {
  message(i)
  get_sig_db(i)
}

NEW FUNCTIONS

DEPRECATED

  • Dropped copy number “M”” method to avoid misguiding user to use/read wrong signature profile and keep code simple.

BUG REPORTS

ENHANCEMENTS

NEW FUNCTIONS

DEPRECATED

BUG REPORTS

  • Fixed the assign problem about match pair in bp_extract_signatures() with lpSolve package instead of using my problematic code.

ENHANCEMENTS

  • Supported mm10 in read_vcf().
  • Removed large data files and store them in Zenodo to reduce package size.
  • Added cores check.
  • Upgraded SP to v1.1.0 (need test).
  • Tried installing Torch before SP (need test).

NEW FUNCTIONS

DEPRECATED

BUG REPORTS

ENHANCEMENTS

  • Subset signatures to plot is available by sig_names option.
  • sigminer is available in bioconda channel: https://anaconda.org/bioconda/r-sigminer/
  • Updated ms strategy in sig_auto_extract() by assigning each signature to its best matched reference signatures.
  • Added get_shannon_diversity_index() to get diversity index for signatures (#333).
  • Added new method “S” (from Steele et al. 2019) for tallying copy number data (#329).
  • Included new (RS) reference signatures (related to #331).
  • Updated the internal code for getting relative activity in get_sig_exposure().

NEW FUNCTIONS

DEPRECATED

  • Updated author list.

BUG REPORTS

ENHANCEMENTS

NEW FUNCTIONS

DEPRECATED

BUG REPORTS

ENHANCEMENTS

  • A new option cut_p_value is added to show_group_enrichment() to cut continous p values as binned regions.
  • A Python backend for sig_extract() is provided.
  • User now can directly use sig_extract() and sig_auto_extract() instead of loading NMF package firstly.
  • Added benchmark results for different extraction approaches in README.
  • The threshold for auto_reduce in sig_fit() is modified from 0.99 to 0.95 and similarity update threshold updated from >0 to >=0.01.
  • Removed pConstant option from sig_extract() and sig_estimate(). Now a auto-check function is created for avoiding the error from NMF package due to no contribution of a component in all samples.

NEW FUNCTIONS

  • bp_show_survey2() to plot a simplified version for signature number survey (#330).
  • read_xena_variants() to read variant data from UCSC Xena as a MAF object for signature analysis.
  • get_sig_rec_similarity() for getting reconstructed profile similarity for Signature object (#293).
  • Added functions start with bp_ which are combined to provide a best practice for extracting signatures in cancer researches. See more details, run ?bp in your R console.

DEPRECATED

  • Fixed bugs when outputing only 1 signatures.
  • Fixed label inverse bug in add_labels(), thanks to TaoTao for reporting.
  • Added auto_reduce option in sig_fit* functions to improve signature fitting.
  • Return cosine similarity for sample profile in sig_fit().
  • Set default strategy in sig_auto_extract() to ‘optimal’.
  • Supported search reference signature index in get_sig_cancer_type_index().
  • Outputed legacy COSMIC similarity for SBS signatures.
  • Added new option in sigprofiler_extract() to reduce failure in when refit is enabled.
  • Outputed both relative and absolute signature exposure in output_sig().
  • Updated background color in show_group_distribution().
  • Modified the default theme for signature profile in COSMIC style.
  • Updated the copy number classification method.
  • Handled null catalogue.
  • Supported ordering the signatures for results from SigProfiler.
  • Supported importing refit results from SigProfiler.
  • Set optimize option in sig_extract() and sig_auto_extract().
  • Supported BSgenome.Hsapiens.1000genomes.hs37d5 in sig_tally().
  • Remove changing MT to M in mutation data.
  • Fixed bug in extract numeric signature names and signature orderings in show_sig_exposure().
  • Added letter_colors as an unexported discrete palette.
  • Added option to control the SigProfilerExtractor to avoid issue in docker image build.
  • Some updates.
  • Compatible with SigProfiler 1.0.15
  • Tried to speed up joining adjacent segments in read_copynumber(), got 200% improvement.
  • Fixed bug in OsCN feature calculation.
  • Removed useless options in read_maf().
  • Modify method ‘LS’ in sig_fit() to ‘NNLS’ and implement it with pracma package (#216).
  • Made use_all option in read_copynumber() working correctly.
  • Fixed potential problem raised by unordered copy number segments (#217).
  • Fixed a typo, correct MRSE to RMSE.
  • Added feature in show_sig_bootstrap_*() for plotting aggregated values.
  • Fixed bug when use get_groups() for clustering.
  • Fixed bug about using reference components from NatGen 2018 paper.
  • Added option highlight_size for show_sig_bootstrap_*().
  • Fixed bug about signature profile plotting for method ‘M’.
  • Added “scatter” in sig_fit() function to better visualize a few samples.
  • Added “highlight” option.
  • lsei package was removed from CRAN, here I reset default method to ‘QP’ and tried best to keep the LS usage in sigminer (#189).
  • Made consistent copy number labels in show_sig_profile() and added input checking for this function.
  • Fixed unconsistent bootstrap when use furrr, solution is from https://github.com/DavisVaughan/furrr/issues/107.
  • Properly handled null-count sample in sig_fit() for methods QP and SA.
  • Supported boxplot or violin in show_sig_fit() and show_sig_bootstrap_* functions.
  • Added job mode for sig_fit_bootstrap_batch for more useful in practice.
  • Added show_groups() to show the signature contribution in each group from get_groups().
  • Expanded clustering in get_groups() to result of sig_fit().
  • Properly handled null-count samples in sig_fit_bootstrap_batch().
  • Added strand bias labeling for INDEL.
  • Added COSMIC TSB signatures.
  • Exported APOBEC result when the mode is ‘ALL’ in sig_tally().
  • Added batch bootstrap analysis feature (#158).
  • Supported all common signature plotting.
  • Added strand feature to signature profile.
  • Added profile plot for DBS and INDEL.
  • Fixed error for signature extraction in mode ‘DBS’ or ‘ID’.
  • Fixed method ‘M’ for CN tally cannot work when cores > 1 (#161).
  • Added multiple methods for sig_fit().
  • Added feature sig_fit_bootstrap() for bootstrap results.
  • Added multiple classification method for SBS signature.
  • Added strand bias enrichment analysis for SBS signature.
  • Moved multiple packages from field Imports to Suggests.
  • Added feature report_bootstrap_p_value() to report p values.
  • Added common DBS and ID signature.
  • Updated citation.
  • Added merged transcript info for hg19 and hg38 build, this is availabe by data().
  • Added gene info for hg19 and hg38 build to extdata directory.
  • Removed fuzzyjoin package from dependency.
  • Moved ggalluvial package to field suggsets.

All users, this is a break-through version of sigminer, most of functions have been modified, more features are implemented. Please read the reference list to see the function groups and their functionalities.

Please read the vignette for usage.

I Hope it helps your research work and makes a new contribution to the scientific community.