GCAP FACETS workflow for gene-level amplicon prediction

gcap.workflow.facets(
  tumourseqfile,
  normalseqfile,
  jobname,
  extra_info = NULL,
  include_type = FALSE,
  genome_build = c("mm10", "hg38", "hg19"),
  model = "XGB11",
  tightness = 1L,
  gap_cn = 3L,
  overlap = 1,
  pro_cval = 100,
  only_oncogenes = FALSE,
  snp_file = "path/to/genome_build_responding.vcf.gz",
  outdir = getwd(),
  result_file_prefix = paste0("gcap_", uuid::UUIDgenerate(TRUE)),
  util_exe = system.file("extcode", "snp-pileup", package = "facets"),
  nthreads = 1,
  skip_finished_facets = TRUE,
  skip_facets_call = FALSE
)

Arguments

tumourseqfile

Full path to the tumour BAM file.

normalseqfile

Full path to the normal BAM file.

jobname

job name, typically an unique name for a tumor-normal pair.

extra_info

(optional) a (file containing) data.frame with 3 columns 'sample' (must identical to the setting of parameter jobname), 'age' and 'gender'. For gender, should be 'XX' or 'XY', also could be 0 for 'XX' and 1 for 'XY'.

include_type

if TRUE, a fourth column named 'type' should be included in extra_info, the supported cancer type should be described with TCGA cancer type abbr..

genome_build

genome build version, should be one of 'hg38', 'hg19' and 'mm10'.

model

model name ("XGB11", "XGB32", "XGB56") or a custom model from input. 'toy' can be used for test.

tightness

a coefficient to times to TCGA somatic CN to set a more strict threshold as a circular amplicon. If the value is larger, it is more likely a fCNA assigned to noncircular instead of circular. When it is NA, we don't use TCGA somatic CN data as reference.

gap_cn

a gap copy number value. A gene with copy number above background (ploidy + gap_cn in general) would be treated as focal amplicon. Smaller, more amplicons.

overlap

the overlap percentage on gene.

pro_cval

critical value for segmentation used in facets::procSample().

only_oncogenes

if TRUE, only known oncogenes are kept for circular prediction.

snp_file

a file path to SNP file of genome, should be consistent with genome_build option.

outdir

result output path.

result_file_prefix

file name prefix (without directory path) for storing final model prediction file in CSV format. Default a unique file name is generated by UUID approach.

util_exe

the path to snp-pileup.

nthreads

The number of parallel processes for getting allele counts (optional, default=1).

skip_finished_facets

if TRUE, skip finished FACETS runs.

skip_facets_call

if TRUE, skip calling FACETS. This is useful when you have done this step and just want to run next steps.

Value

a list of invisible data.table and corresponding files saved to local machine.

Details

For generating the snp-pileup program, reference commands given here. You need modify corresponding path to fit your own machine.

cd /data3/wsx/R/x86_64-pc-linux-gnu-library/4.2/facets/extcode/

g++ -std=c++11 -I/data3/wsx/miniconda3/envs/circlemap/include snp-pileup.cpp -L/data3/wsx/miniconda3/envs/circlemap/lib -lhts -Wl,-rpath=/data3/wsx/miniconda3/envs/circlemap/lib -o snp-pileup