GCAP workflow for gene-level amplicon prediction

gcap.workflow(
  tumourseqfile,
  normalseqfile,
  tumourname,
  normalname,
  jobname = tumourname,
  extra_info = NULL,
  include_type = FALSE,
  genome_build = c("hg38", "hg19"),
  model = "XGB11",
  tightness = 1L,
  gap_cn = 3L,
  overlap = 1,
  only_oncogenes = FALSE,
  outdir = getwd(),
  result_file_prefix = paste0("gcap_", uuid::UUIDgenerate(TRUE)),
  allelecounter_exe = "~/miniconda3/envs/cancerit/bin/alleleCounter",
  g1000allelesprefix = file.path("~/data/snp/1000G_loci_hg38",
    "1kg.phase3.v5a_GRCh38nounref_allele_index_chr"),
  g1000lociprefix = file.path("~/data/snp/1000G_loci_hg38",
    "1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr"),
  GCcontentfile = "~/data/snp/GC_correction_hg38.txt",
  replictimingfile = "~/data/snp/RT_correction_hg38.txt",
  nthreads = 22,
  minCounts = 10,
  BED_file = NA,
  probloci_file = NA,
  chrom_names = 1:22,
  min_base_qual = 20,
  min_map_qual = 35,
  penalty = 70,
  skip_finished_ASCAT = TRUE,
  skip_ascat_call = FALSE
)

Arguments

tumourseqfile

Full path to the tumour BAM file.

normalseqfile

Full path to the normal BAM file.

tumourname

Identifier to be used for tumour output files.

normalname

Identifier to be used for normal output files.

jobname

job name, typically an unique name for a tumor-normal pair.

extra_info

(optional) a (file containing) data.frame with 3 columns 'sample' (must identical to the setting of parameter jobname), 'age' and 'gender'. For gender, should be 'XX' or 'XY', also could be 0 for 'XX' and 1 for 'XY'.

include_type

if TRUE, a fourth column named 'type' should be included in extra_info, the supported cancer type should be described with TCGA cancer type abbr..

genome_build

genome build version, should be one of 'hg38', 'hg19'.

model

model name ("XGB11", "XGB32", "XGB56") or a custom model from input. 'toy' can be used for test.

tightness

a coefficient to times to TCGA somatic CN to set a more strict threshold as a circular amplicon. If the value is larger, it is more likely a fCNA assigned to noncircular instead of circular. When it is NA, we don't use TCGA somatic CN data as reference.

gap_cn

a gap copy number value. A gene with copy number above background (ploidy + gap_cn in general) would be treated as focal amplicon. Smaller, more amplicons.

overlap

the overlap percentage on gene.

only_oncogenes

if TRUE, only known oncogenes are kept for circular prediction.

outdir

result output path.

result_file_prefix

file name prefix (without directory path) for storing final model prediction file in CSV format. Default a unique file name is generated by UUID approach.

allelecounter_exe

Path to the allele counter executable.

g1000allelesprefix

Prefix path to the 1000 Genomes alleles reference files.

g1000lociprefix

Prefix path to the 1000 Genomes SNP reference files.

GCcontentfile

File containing the GC content around every SNP for increasing window sizes

replictimingfile

File containing replication timing at every SNP for various cell lines (optional)

nthreads

The number of parallel processes for getting allele counts (optional, default=1).

minCounts

Minimum depth required in the normal for a SNP to be considered (optional, default=10).

BED_file

A BED file for only looking at SNPs within specific intervals (optional, default=NA).

probloci_file

A file (chromosome <tab> position; no header) containing specific loci to ignore (optional, default=NA).

chrom_names

A vector containing the names of chromosomes to be considered (optional, default=1:22).

min_base_qual

Minimum base quality required for a read to be counted (optional, default=20).

min_map_qual

Minimum mapping quality required for a read to be counted (optional, default=35).

penalty

penalty of introducing an additional ASPCF breakpoint (expert parameter, don't adapt unless you know what you're doing)

skip_finished_ASCAT

if TRUE, skipped finished ASCAT calls to save time.

skip_ascat_call

if TRUE, skip calling ASCAT. This is useful when you have done this step and just want to run next steps.

Value

a list of invisible data.table and corresponding files saved to local machine.