Chapter 11 Datasets
11.1 Reference Annotation
sigminer stores many reference annotation datasets for internal calculation. It can be exported for other usage either by data()
or get_genome_annotation()
.
Currently, there are the following datasets:
centromeres.hg19
centromeres.hg38
chromsize.hg19
chromsize.hg38
cytobands.hg19
cytobands.hg38
An example is given as below:
data("centromeres.hg19")
head(centromeres.hg19)
#> chrom left.base right.base
#> 1 chr1 121535434 124535434
#> 2 chr2 92326171 95326171
#> 3 chr3 90504854 93504854
#> 4 chr4 49660117 52660117
#> 5 chr5 46405641 49405641
#> 6 chr6 58830166 61830166
get_genome_annotation()
can better control the returned data.frame
.
get_genome_annotation(
data_type = "chr_size",
chrs = c("chr1", "chr10", "chr20"),
genome_build = "hg19"
)#> chrom size
#> 1 chr1 249250621
#> 2 chr10 135534747
#> 3 chr20 63025520
More see ?get_genome_annotation
.
11.2 Copy Number components setting
Dataset CN.features
is a predefined component data table for identifying copy number signatures by method “Wang.”
Users can define a custom table with similar structure and pass it to function like sig_tally()
.
Detail about how to generate this dataset can be viewed at https://github.com/ShixiangWang/sigminer/blob/master/data-raw/CN-features.R.
CN.features#> feature component label min max
#> 1: BP10MB BP10MB[0] point 0 0
#> 2: BP10MB BP10MB[1] point 1 1
#> 3: BP10MB BP10MB[2] point 2 2
#> 4: BP10MB BP10MB[3] point 3 3
#> 5: BP10MB BP10MB[4] point 4 4
#> 6: BP10MB BP10MB[5] point 5 5
#> 7: BP10MB BP10MB[>5] range 5 Inf
#> 8: BPArm BPArm[0] point 0 0
#> 9: BPArm BPArm[1] point 1 1
#> 10: BPArm BPArm[2] point 2 2
#> 11: BPArm BPArm[3] point 3 3
#> 12: BPArm BPArm[4] point 4 4
#> 13: BPArm BPArm[5] point 5 5
#> 14: BPArm BPArm[6] point 6 6
#> 15: BPArm BPArm[7] point 7 7
#> 16: BPArm BPArm[8] point 8 8
#> 17: BPArm BPArm[9] point 9 9
#> 18: BPArm BPArm[10] point 10 10
#> 19: BPArm BPArm[>10 & <=20] range 10 20
#> 20: BPArm BPArm[>20 & <=30] range 20 30
#> 21: BPArm BPArm[>30] range 30 Inf
#> 22: CN CN[0] point 0 0
#> 23: CN CN[1] point 1 1
#> 24: CN CN[2] point 2 2
#> 25: CN CN[3] point 3 3
#> 26: CN CN[4] point 4 4
#> 27: CN CN[>4 & <=8] range 4 8
#> 28: CN CN[>8] range 8 Inf
#> 29: CNCP CNCP[0] point 0 0
#> 30: CNCP CNCP[1] point 1 1
#> 31: CNCP CNCP[2] point 2 2
#> 32: CNCP CNCP[3] point 3 3
#> 33: CNCP CNCP[4] point 4 4
#> 34: CNCP CNCP[>4 & <=8] range 4 8
#> 35: CNCP CNCP[>8] range 8 Inf
#> 36: OsCN OsCN[0] point 0 0
#> 37: OsCN OsCN[1] point 1 1
#> 38: OsCN OsCN[2] point 2 2
#> 39: OsCN OsCN[3] point 3 3
#> 40: OsCN OsCN[4] point 4 4
#> 41: OsCN OsCN[>4 & <=10] range 4 10
#> 42: OsCN OsCN[>10] range 10 Inf
#> 43: SS SS[<=2] range -Inf 2
#> 44: SS SS[>2 & <=3] range 2 3
#> 45: SS SS[>3 & <=4] range 3 4
#> 46: SS SS[>4 & <=5] range 4 5
#> 47: SS SS[>5 & <=6] range 5 6
#> 48: SS SS[>6 & <=7] range 6 7
#> 49: SS SS[>7 & <=8] range 7 8
#> 50: SS SS[>8] range 8 Inf
#> 51: NC50 NC50[<=2] range -Inf 2
#> 52: NC50 NC50[3] point 3 3
#> 53: NC50 NC50[4] point 4 4
#> 54: NC50 NC50[5] point 5 5
#> 55: NC50 NC50[6] point 6 6
#> 56: NC50 NC50[7] point 7 7
#> 57: NC50 NC50[>7] range 7 Inf
#> 58: BoChr BoChr[1] point 1 1
#> 59: BoChr BoChr[2] point 2 2
#> 60: BoChr BoChr[3] point 3 3
#> 61: BoChr BoChr[4] point 4 4
#> 62: BoChr BoChr[5] point 5 5
#> 63: BoChr BoChr[6] point 6 6
#> 64: BoChr BoChr[7] point 7 7
#> 65: BoChr BoChr[8] point 8 8
#> 66: BoChr BoChr[9] point 9 9
#> 67: BoChr BoChr[10] point 10 10
#> 68: BoChr BoChr[11] point 11 11
#> 69: BoChr BoChr[12] point 12 12
#> 70: BoChr BoChr[13] point 13 13
#> 71: BoChr BoChr[14] point 14 14
#> 72: BoChr BoChr[15] point 15 15
#> 73: BoChr BoChr[16] point 16 16
#> 74: BoChr BoChr[17] point 17 17
#> 75: BoChr BoChr[18] point 18 18
#> 76: BoChr BoChr[19] point 19 19
#> 77: BoChr BoChr[20] point 20 20
#> 78: BoChr BoChr[21] point 21 21
#> 79: BoChr BoChr[22] point 22 22
#> 80: BoChr BoChr[23] point 23 23
#> feature component label min max