GCAP workflow for gene-level amplicon prediction

gcap.workflow(
  tumourseqfile,
  normalseqfile,
  tumourname,
  normalname,
  jobname = tumourname,
  extra_info = NULL,
  include_type = FALSE,
  genome_build = c("hg38", "hg19"),
  model = "XGB11",
  tightness = 1L,
  gap_cn = 3L,
  overlap = 1,
  only_oncogenes = FALSE,
  outdir = getwd(),
  result_file_prefix = paste0("gcap_", uuid::UUIDgenerate(TRUE)),
  allelecounter_exe = "~/miniconda3/envs/cancerit/bin/alleleCounter",
  g1000allelesprefix = file.path("~/data/snp/1000G_loci_hg38",
    "1kg.phase3.v5a_GRCh38nounref_allele_index_chr"),
  g1000lociprefix = file.path("~/data/snp/1000G_loci_hg38",
    "1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr"),
  GCcontentfile = "~/data/snp/GC_correction_hg38.txt",
  replictimingfile = "~/data/snp/RT_correction_hg38.txt",
  nthreads = 22,
  minCounts = 10,
  BED_file = NA,
  probloci_file = NA,
  chrom_names = 1:22,
  min_base_qual = 20,
  min_map_qual = 35,
  penalty = 70,
  skip_finished_ASCAT = TRUE,
  skip_ascat_call = FALSE
)

Arguments

tumourseqfile: Full path to the tumour BAM file.
normalseqfile: Full path to the normal BAM file.
tumourname: Identifier to be used for tumour output files.
normalname: Identifier to be used for normal output files.
jobname: job name, typically an unique name for a tumor-normal pair.
extra_info: (optional) a (file containing) data.frame with 3 columns 'sample' (must identical to the setting of parameter jobname), 'age' and 'gender'. For gender, should be 'XX' or 'XY', also could be 0 for 'XX' and 1 for 'XY'.
include_type: if TRUE, a fourth column named 'type' should be included in extra_info, the supported cancer type should be described with TCGA cancer type abbr..
genome_build: genome build version, should be one of 'hg38', 'hg19'.
model: model name ("XGB11", "XGB32", "XGB56") or a custom model from input. 'toy' can be used for test.
tightness: a coefficient to times to TCGA somatic CN to set a more strict threshold as a circular amplicon. If the value is larger, it is more likely a fCNA assigned to noncircular instead of circular. When it is NA, we don't use TCGA somatic CN data as reference.
gap_cn: a gap copy number value. A gene with copy number above background (ploidy + gap_cn in general) would be treated as focal amplicon. Smaller, more amplicons.
overlap: the overlap percentage on gene.
only_oncogenes: if TRUE, only known oncogenes are kept for circular prediction.
outdir: result output path.
result_file_prefix: file name prefix (without directory path) for storing final model prediction file in CSV format. Default a unique file name is generated by UUID approach.
allelecounter_exe: Path to the allele counter executable.
g1000allelesprefix: Prefix path to the 1000 Genomes alleles reference files.
g1000lociprefix: Prefix path to the 1000 Genomes SNP reference files.
GCcontentfile: File containing the GC content around every SNP for increasing window sizes
replictimingfile: File containing replication timing at every SNP for various cell lines (optional)
nthreads: The number of parallel processes for getting allele counts (optional, default=1).
minCounts: Minimum depth required in the normal for a SNP to be considered (optional, default=10).
BED_file: A BED file for only looking at SNPs within specific intervals (optional, default=NA).
probloci_file: A file (chromosome <tab> position; no header) containing specific loci to ignore (optional, default=NA).
chrom_names: A vector containing the names of chromosomes to be considered (optional, default=1:22).
min_base_qual: Minimum base quality required for a read to be counted (optional, default=20).
min_map_qual: Minimum mapping quality required for a read to be counted (optional, default=35).
penalty: penalty of introducing an additional ASPCF breakpoint (expert parameter, don't adapt unless you know what you're doing)
skip_finished_ASCAT: if TRUE, skipped finished ASCAT calls to save time.
skip_ascat_call: if TRUE, skip calling ASCAT. This is useful when you have done this step and just want to run next steps.

Value

a list of invisible data.table and corresponding files saved to local machine.