R/facets_pipeline.R
gcap.workflow.facets.Rd
GCAP FACETS workflow for gene-level amplicon prediction
gcap.workflow.facets(
tumourseqfile,
normalseqfile,
jobname,
extra_info = NULL,
include_type = FALSE,
genome_build = c("mm10", "hg38", "hg19"),
model = "XGB11",
tightness = 1L,
gap_cn = 3L,
overlap = 1,
pro_cval = 100,
only_oncogenes = FALSE,
snp_file = "path/to/genome_build_responding.vcf.gz",
outdir = getwd(),
result_file_prefix = paste0("gcap_", uuid::UUIDgenerate(TRUE)),
util_exe = system.file("extcode", "snp-pileup", package = "facets"),
nthreads = 1,
skip_finished_facets = TRUE,
skip_facets_call = FALSE
)
Full path to the tumour BAM file.
Full path to the normal BAM file.
job name, typically an unique name for a tumor-normal pair.
(optional) a (file containing) data.frame
with 3 columns 'sample'
(must identical to the setting of parameter jobname
),
'age' and 'gender'. For gender, should be 'XX' or 'XY',
also could be 0
for 'XX' and 1
for 'XY'.
if TRUE
, a fourth column named 'type'
should be included in extra_info
, the supported cancer
type should be described with TCGA cancer type abbr..
genome build version, should be one of 'hg38', 'hg19' and 'mm10'.
model name ("XGB11", "XGB32", "XGB56") or a custom model from input. 'toy' can be used for test.
a coefficient to times to TCGA somatic CN to set a more strict threshold
as a circular amplicon.
If the value is larger, it is more likely a fCNA assigned to noncircular
instead of circular
. When it is NA
, we don't use TCGA somatic CN data as reference.
a gap copy number value.
A gene with copy number above background (ploidy + gap_cn
in general) would be treated as focal amplicon.
Smaller, more amplicons.
the overlap percentage on gene.
critical value for segmentation used in facets::procSample()
.
if TRUE
, only known oncogenes are kept for circular prediction.
a file path to SNP file of genome, should be consistent with genome_build
option.
result output path.
file name prefix (without directory path) for storing final model prediction file in CSV format. Default a unique file name is generated by UUID approach.
the path to snp-pileup
.
The number of parallel processes for getting allele counts (optional, default=1).
if TRUE
, skip finished FACETS runs.
if TRUE
, skip calling FACETS.
This is useful when you have done this step and just want
to run next steps.
a list of invisible data.table
and corresponding files saved to local machine.
For generating the snp-pileup
program, reference commands given here.
You need modify corresponding path to fit your own machine.
cd /data3/wsx/R/x86_64-pc-linux-gnu-library/4.2/facets/extcode/
g++ -std=c++11 -I/data3/wsx/miniconda3/envs/circlemap/include snp-pileup.cpp -L/data3/wsx/miniconda3/envs/circlemap/lib -lhts -Wl,-rpath=/data3/wsx/miniconda3/envs/circlemap/lib -o snp-pileup