Read absolute copy number profile for preparing CNV signature
analysis. See detail part of sig_tally()
to see how to handle sex to get correct
summary.
read_copynumber(
input,
pattern = NULL,
ignore_case = FALSE,
seg_cols = c("Chromosome", "Start.bp", "End.bp", "modal_cn"),
samp_col = "sample",
add_loh = FALSE,
loh_min_len = 10000,
loh_min_frac = 0.05,
join_adj_seg = TRUE,
skip_annotation = FALSE,
use_all = add_loh,
min_segnum = 0L,
max_copynumber = 20L,
genome_build = c("hg19", "hg38", "T2T", "mm10", "mm9", "ce11"),
genome_measure = c("called", "wg"),
complement = FALSE,
...
)
a data.frame
or a file or a directory contains copy number profile.
an optional regular expression used to select part of files if
input
is a directory, more detail please see list.files()
function.
logical. Should pattern-matching be case-insensitive?
four strings used to specify chromosome, start position,
end position and copy number value in input
, respectively.
Default use names from ABSOLUTE calling result.
a character used to specify the sample column name. If input
is a directory and cannot find samp_col
, sample names will use file names
(set this parameter to NULL
is recommended in this case).
if TRUE
, add LOH labels to segments. NOTE a column
'minor_cn' must exist to indicate minor allele copy number value.
Sex chromosome will not be labeled.
The length cut-off for labeling a segment as 'LOH'.
Default is 10Kb
.
When join_adj_seg
set to TRUE
, only the length fraction
of LOH region is larger than this value will be labeled as 'LOH'.
Default is 30%.
if TRUE
(default), join adjacent segments with
same copy number value. This is helpful for precisely count the number of breakpoint.
When set use_all=TRUE
, the mean function will be applied to extra numeric columns
and unique string columns will be pasted by comma for joined records.
if TRUE
, skip annotation step, it may affect some analysis
and visualization functionality, but speed up reading data.
default is FALSE
. If True
, use all columns from raw input.
minimal number of copy number segments within a sample.
bigger copy number within a sample will be reset to this value.
genome build version, should be 'hg19', 'hg38', 'mm9' or 'mm10'.
default is 'called', can be 'wg' or 'called'. Set 'called' will use called segments size to compute total size for CNA burden calculation, this option is useful for WES and target sequencing. Set 'wg' will use autosome size from genome build, this option is useful for WGS, SNP etc..
if TRUE
, complement chromosome (except 'Y') does not show in input data
with normal copy 2.
other parameters pass to data.table::fread()
a CopyNumber object.
# Load toy dataset of absolute copynumber profile
load(system.file("extdata", "toy_segTab.RData",
package = "sigminer", mustWork = TRUE
))
# \donttest{
cn <- read_copynumber(segTabs,
seg_cols = c("chromosome", "start", "end", "segVal"),
genome_build = "hg19", complement = FALSE
)
#> ℹ [2024-08-04 14:39:03.550448]: Started.
#> ℹ [2024-08-04 14:39:03.552015]: Genome build : hg19.
#> ℹ [2024-08-04 14:39:03.553429]: Genome measure: called.
#> ✔ [2024-08-04 14:39:03.557513]: Chromosome size database for build obtained.
#> ℹ [2024-08-04 14:39:03.558913]: Reading input.
#> ✔ [2024-08-04 14:39:03.560308]: A data frame as input detected.
#> ✔ [2024-08-04 14:39:03.561791]: Column names checked.
#> ✔ [2024-08-04 14:39:03.563206]: Column order set.
#> ✔ [2024-08-04 14:39:03.566232]: Chromosomes unified.
#> ✔ [2024-08-04 14:39:03.57062]: Data imported.
#> ℹ [2024-08-04 14:39:03.572499]: Segments info:
#> ℹ [2024-08-04 14:39:03.574356]: Keep - 467
#> ℹ [2024-08-04 14:39:03.576182]: Drop - 0
#> ✔ [2024-08-04 14:39:03.578001]: Segments sorted.
#> ℹ [2024-08-04 14:39:03.579342]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-08-04 14:39:03.663701]: 400 segments left after joining.
#> ✔ [2024-08-04 14:39:03.665631]: Segmental table cleaned.
#> ℹ [2024-08-04 14:39:03.667023]: Annotating.
#> ✔ [2024-08-04 14:39:03.680146]: Annotation done.
#> ℹ [2024-08-04 14:39:03.681587]: Summarizing per sample.
#> ✔ [2024-08-04 14:39:03.691719]: Summarized.
#> ℹ [2024-08-04 14:39:03.693116]: Generating CopyNumber object.
#> ✔ [2024-08-04 14:39:03.694746]: Generated.
#> ℹ [2024-08-04 14:39:03.696145]: Validating object.
#> ✔ [2024-08-04 14:39:03.697548]: Done.
#> ℹ [2024-08-04 14:39:03.698939]: 0.148 secs elapsed.
cn
#> An object of class CopyNumber
#> =============================
#> sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr
#> <char> <int> <int> <int> <int> <int>
#> 1: TCGA-DF-A2KN-01A-11D-A17U-01 33 6 5 1 4
#> 2: TCGA-19-2621-01B-01D-0911-01 33 8 5 3 5
#> 3: TCGA-B6-A0X5-01A-21D-A107-01 28 8 4 4 2
#> 4: TCGA-A8-A07S-01A-11D-A036-01 38 11 2 9 4
#> 5: TCGA-26-6174-01A-21D-1842-01 43 13 8 5 8
#> 6: TCGA-CV-7432-01A-11D-2128-01 40 16 7 9 9
#> 7: TCGA-06-0644-01A-02D-0310-01 46 19 5 14 8
#> 8: TCGA-A5-A0G2-01A-11D-A042-01 39 21 5 16 10
#> 9: TCGA-99-7458-01A-11D-2035-01 48 26 10 16 13
#> 10: TCGA-05-4417-01A-22D-1854-01 52 37 33 4 17
#> cna_burden
#> <num>
#> 1: 0.000
#> 2: 0.099
#> 3: 0.087
#> 4: 0.112
#> 5: 0.119
#> 6: 0.198
#> 7: 0.165
#> 8: 0.393
#> 9: 0.318
#> 10: 0.654
cn_subset <- subset(cn, sample == "TCGA-DF-A2KN-01A-11D-A17U-01")
# Add LOH
set.seed(1234)
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)
cn <- read_copynumber(segTabs,
seg_cols = c("chromosome", "start", "end", "segVal"),
genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#> ℹ [2024-08-04 14:39:03.714381]: Started.
#> ℹ [2024-08-04 14:39:03.715808]: Genome build : hg19.
#> ℹ [2024-08-04 14:39:03.717183]: Genome measure: wg.
#> ℹ [2024-08-04 14:39:03.718592]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#> ✔ [2024-08-04 14:39:03.722253]: Chromosome size database for build obtained.
#> ℹ [2024-08-04 14:39:03.723663]: Reading input.
#> ✔ [2024-08-04 14:39:03.725068]: A data frame as input detected.
#> ✔ [2024-08-04 14:39:03.726549]: Column names checked.
#> ✔ [2024-08-04 14:39:03.727987]: Column order set.
#> ✔ [2024-08-04 14:39:03.730906]: Chromosomes unified.
#> ✔ [2024-08-04 14:39:03.74379]: Value 2 (normal copy) filled to uncalled chromosomes.
#> ✔ [2024-08-04 14:39:03.74794]: Data imported.
#> ℹ [2024-08-04 14:39:03.749941]: Segments info:
#> ℹ [2024-08-04 14:39:03.751773]: Keep - 477
#> ℹ [2024-08-04 14:39:03.753639]: Drop - 0
#> ✔ [2024-08-04 14:39:03.755456]: Segments sorted.
#> ℹ [2024-08-04 14:39:03.756835]: Adding LOH labels...
#> ℹ [2024-08-04 14:39:03.758755]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-08-04 14:39:03.867731]: 410 segments left after joining.
#> ✔ [2024-08-04 14:39:03.869741]: Segmental table cleaned.
#> ℹ [2024-08-04 14:39:03.871141]: Annotating.
#> ✔ [2024-08-04 14:39:03.884459]: Annotation done.
#> ℹ [2024-08-04 14:39:03.885927]: Summarizing per sample.
#> ✔ [2024-08-04 14:39:03.898564]: Summarized.
#> ℹ [2024-08-04 14:39:03.900021]: Generating CopyNumber object.
#> ✔ [2024-08-04 14:39:03.901697]: Generated.
#> ℹ [2024-08-04 14:39:03.903093]: Validating object.
#> ✔ [2024-08-04 14:39:03.90449]: Done.
#> ℹ [2024-08-04 14:39:03.906021]: 0.192 secs elapsed.
# Use tally method "S" (Steele et al.)
tally_s <- sig_tally(cn, method = "S")
#> ℹ [2024-08-04 14:39:03.908281]: Started.
#> ℹ [2024-08-04 14:39:03.911828]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!
#> ✔ [2024-08-04 14:39:03.92669]: Matrix generated.
#> ℹ [2024-08-04 14:39:03.928669]: 0.02 secs elapsed.
tab_file <- system.file("extdata", "metastatic_tumor.segtab.txt",
package = "sigminer", mustWork = TRUE
)
cn2 <- read_copynumber(tab_file)
#> ℹ [2024-08-04 14:39:03.932728]: Started.
#> ℹ [2024-08-04 14:39:03.934232]: Genome build : hg19.
#> ℹ [2024-08-04 14:39:03.935629]: Genome measure: called.
#> ✔ [2024-08-04 14:39:03.939448]: Chromosome size database for build obtained.
#> ℹ [2024-08-04 14:39:03.940861]: Reading input.
#> ✔ [2024-08-04 14:39:03.942288]: A file as input detected.
#> ✔ [2024-08-04 14:39:03.944443]: Column names checked.
#> ✔ [2024-08-04 14:39:03.946028]: Column order set.
#> ✔ [2024-08-04 14:39:03.948624]: Chromosomes unified.
#> ✔ [2024-08-04 14:39:03.952705]: Data imported.
#> ℹ [2024-08-04 14:39:03.954639]: Segments info:
#> ℹ [2024-08-04 14:39:03.956498]: Keep - 201
#> ℹ [2024-08-04 14:39:03.965385]: Drop - 0
#> ✔ [2024-08-04 14:39:03.967795]: Segments sorted.
#> ℹ [2024-08-04 14:39:03.96924]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-08-04 14:39:03.985294]: 180 segments left after joining.
#> ✔ [2024-08-04 14:39:03.98684]: Segmental table cleaned.
#> ℹ [2024-08-04 14:39:03.988198]: Annotating.
#> ✔ [2024-08-04 14:39:04.000528]: Annotation done.
#> ℹ [2024-08-04 14:39:04.001954]: Summarizing per sample.
#> ✔ [2024-08-04 14:39:04.01163]: Summarized.
#> ℹ [2024-08-04 14:39:04.013008]: Generating CopyNumber object.
#> ✔ [2024-08-04 14:39:04.014611]: Generated.
#> ℹ [2024-08-04 14:39:04.015975]: Validating object.
#> ✔ [2024-08-04 14:39:04.017364]: Done.
#> ℹ [2024-08-04 14:39:04.018803]: 0.086 secs elapsed.
cn2
#> An object of class CopyNumber
#> =============================
#> sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr cna_burden
#> <char> <int> <int> <int> <int> <int> <num>
#> 1: metastatic_tumor 180 110 92 18 21 0.361
# }