GCAP workflow for gene-level amplicon prediction from ASCN input

Unlike gcap.workflow, this function directly uses the allele-specific copy number data along with some extra sample information to infer ecDNA genes.

gcap.ASCNworkflow(
  data,
  genome_build = c("hg38", "hg19"),
  model = "XGB11",
  tightness = 1L,
  gap_cn = 3L,
  overlap = 1,
  only_oncogenes = FALSE,
  outdir = getwd(),
  result_file_prefix = paste0("gcap_", uuid::UUIDgenerate(TRUE))
)

Arguments

data

a data.frame with following columns. The key columns can be obtained from common allele specific CNV calling software, e.g., ASCAT, Sequenza, FACETS.

chromosome: chromosome names starts with 'chr'.
start: start position of the segment.
end: end position of the segment.
total_cn: total integer copy number of the segment.
minor_cn: minor allele integer copy number of the segment. Set it to NA if you don't have this data.
sample: sample identifier.
purity: tumor purity of the sample. Set to 1 if you don't know.
ploidy (optinal): ploidy value of the sample tumor genome.
age (optional): age of the case, use along with gender.
gender (optional): gender of the case, use along with age.
type (optional): cancer type of the case, use along with age and gender. Please refer to gcap.collapse2Genes to see the supported cancer types. This info is only used in 'XGB56' model. If you don't use this model, you don't need to set it.

genome_build

genome build version, should be one of 'hg38', 'hg19'.

model

model name ("XGB11", "XGB32", "XGB56") or a custom model from input. 'toy' can be used for test.

tightness

a coefficient to times to TCGA somatic CN to set a more strict threshold as a circular amplicon. If the value is larger, it is more likely a fCNA assigned to noncircular instead of circular. When it is NA, we don't use TCGA somatic CN data as reference.

gap_cn

a gap copy number value. A gene with copy number above background (ploidy + gap_cn in general) would be treated as focal amplicon. Smaller, more amplicons.

overlap

the overlap percentage on gene.

only_oncogenes

if TRUE, only known oncogenes are kept for circular prediction.

outdir

result output path.

result_file_prefix

file name prefix (without directory path) for storing final model prediction file in CSV format. Default a unique file name is generated by UUID approach.

Value

a list of invisible data.table and corresponding files saved to local machine.

Examples

data("ascn")
data <- ascn
rv <- gcap.ASCNworkflow(data, outdir = tempdir(), model = "XGB11")
data$purity <- 1
rv2 <- gcap.ASCNworkflow(data, outdir = tempdir(), model = "XGB11")
data$age <- 60
data$gender <- "XY"
rv3 <- gcap.ASCNworkflow(data, outdir = tempdir(), model = "XGB32")
# If you want to use 'XGB56', you should include 'type' column
data$type <- "LUAD"
rv4 <- gcap.ASCNworkflow(data, outdir = tempdir(), model = "XGB56")
# If you only have total integer copy number
data$minor_cn <- NA
rv5 <- gcap.ASCNworkflow(data, outdir = tempdir(), model = "XGB11")

# R6 class fCNA --------------------------------
print(rv)
print(rv$data)
print(rv$sample_summary)
print(rv$gene_summary)
print(rv$cytoband_summary)

# Create a subset fCNA
rv_subset <- rv$subset(total_cn > 10)
nrow(rv$data)
nrow(rv_subset$data)

rv_subset2 <- rv$subset(sample == "TCGA-02-2485-01")
nrow(rv_subset2$data)
unique(rv_subset2$data$sample)

sum_gene <- rv$getGeneSummary()
sum_gene
mat_gene <- rv$getGeneSummary(return_mat = TRUE)
mat_gene

sum_cytoband <- rv$getCytobandSummary()
sum_cytoband
mat_cytoband <- rv$getCytobandSummary(return_mat = TRUE)
mat_cytoband