9 Datasets

9.1 Reference annotation

sigminer stores many reference annotation datasets for internal calculation. It can be exported for other usage either by data() or get_genome_annotation().

Currently, there are the following datasets:

  • centromeres.hg19
  • centromeres.hg38
  • chromsize.hg19
  • chromsize.hg38
  • cytobands.hg19
  • cytobands.hg38

An example is given as below:

data("centromeres.hg19")
head(centromeres.hg19)
##   chrom left.base right.base
## 1  chr1 121535434  124535434
## 2  chr2  92326171   95326171
## 3  chr3  90504854   93504854
## 4  chr4  49660117   52660117
## 5  chr5  46405641   49405641
## 6  chr6  58830166   61830166

get_genome_annotation() can better control the returned data.frame.

get_genome_annotation(
  data_type = "chr_size",
  chrs = c("chr1", "chr10", "chr20"),
  genome_build = "hg19"
)
##   chrom      size
## 1  chr1 249250621
## 2 chr10 135534747
## 3 chr20  63025520

More see ?get_genome_annotation.

9.2 Copy number components setting

Dataset CN.features is a predefined component data table for identifying copy number signatures by method โ€œWangโ€. Users can define a custom table with similar structure and pass it to function like sig_tally().

Detail about how to generate this dataset can be viewed at https://github.com/ShixiangWang/sigminer/blob/master/data-raw/CN-features.R.

CN.features
##     feature         component label  min max
##  1:  BP10MB         BP10MB[0] point    0   0
##  2:  BP10MB         BP10MB[1] point    1   1
##  3:  BP10MB         BP10MB[2] point    2   2
##  4:  BP10MB         BP10MB[3] point    3   3
##  5:  BP10MB         BP10MB[4] point    4   4
##  6:  BP10MB         BP10MB[5] point    5   5
##  7:  BP10MB        BP10MB[>5] range    5 Inf
##  8:   BPArm          BPArm[0] point    0   0
##  9:   BPArm          BPArm[1] point    1   1
## 10:   BPArm          BPArm[2] point    2   2
## 11:   BPArm          BPArm[3] point    3   3
## 12:   BPArm          BPArm[4] point    4   4
## 13:   BPArm          BPArm[5] point    5   5
## 14:   BPArm          BPArm[6] point    6   6
## 15:   BPArm          BPArm[7] point    7   7
## 16:   BPArm          BPArm[8] point    8   8
## 17:   BPArm          BPArm[9] point    9   9
## 18:   BPArm         BPArm[10] point   10  10
## 19:   BPArm BPArm[>10 & <=20] range   10  20
## 20:   BPArm BPArm[>20 & <=30] range   20  30
## 21:   BPArm        BPArm[>30] range   30 Inf
## 22:      CN             CN[0] point    0   0
## 23:      CN             CN[1] point    1   1
## 24:      CN             CN[2] point    2   2
## 25:      CN             CN[3] point    3   3
## 26:      CN             CN[4] point    4   4
## 27:      CN      CN[>4 & <=8] range    4   8
## 28:      CN            CN[>8] range    8 Inf
## 29:    CNCP           CNCP[0] point    0   0
## 30:    CNCP           CNCP[1] point    1   1
## 31:    CNCP           CNCP[2] point    2   2
## 32:    CNCP           CNCP[3] point    3   3
## 33:    CNCP           CNCP[4] point    4   4
## 34:    CNCP    CNCP[>4 & <=8] range    4   8
## 35:    CNCP          CNCP[>8] range    8 Inf
## 36:    OsCN           OsCN[0] point    0   0
## 37:    OsCN           OsCN[1] point    1   1
## 38:    OsCN           OsCN[2] point    2   2
## 39:    OsCN           OsCN[3] point    3   3
## 40:    OsCN           OsCN[4] point    4   4
## 41:    OsCN   OsCN[>4 & <=10] range    4  10
## 42:    OsCN         OsCN[>10] range   10 Inf
## 43:      SS           SS[<=2] range -Inf   2
## 44:      SS      SS[>2 & <=3] range    2   3
## 45:      SS      SS[>3 & <=4] range    3   4
## 46:      SS      SS[>4 & <=5] range    4   5
## 47:      SS      SS[>5 & <=6] range    5   6
## 48:      SS      SS[>6 & <=7] range    6   7
## 49:      SS      SS[>7 & <=8] range    7   8
## 50:      SS            SS[>8] range    8 Inf
## 51:    NC50         NC50[<=2] range -Inf   2
## 52:    NC50           NC50[3] point    3   3
## 53:    NC50           NC50[4] point    4   4
## 54:    NC50           NC50[5] point    5   5
## 55:    NC50           NC50[6] point    6   6
## 56:    NC50           NC50[7] point    7   7
## 57:    NC50          NC50[>7] range    7 Inf
## 58:   BoChr          BoChr[1] point    1   1
## 59:   BoChr          BoChr[2] point    2   2
## 60:   BoChr          BoChr[3] point    3   3
## 61:   BoChr          BoChr[4] point    4   4
## 62:   BoChr          BoChr[5] point    5   5
## 63:   BoChr          BoChr[6] point    6   6
## 64:   BoChr          BoChr[7] point    7   7
## 65:   BoChr          BoChr[8] point    8   8
## 66:   BoChr          BoChr[9] point    9   9
## 67:   BoChr         BoChr[10] point   10  10
## 68:   BoChr         BoChr[11] point   11  11
## 69:   BoChr         BoChr[12] point   12  12
## 70:   BoChr         BoChr[13] point   13  13
## 71:   BoChr         BoChr[14] point   14  14
## 72:   BoChr         BoChr[15] point   15  15
## 73:   BoChr         BoChr[16] point   16  16
## 74:   BoChr         BoChr[17] point   17  17
## 75:   BoChr         BoChr[18] point   18  18
## 76:   BoChr         BoChr[19] point   19  19
## 77:   BoChr         BoChr[20] point   20  20
## 78:   BoChr         BoChr[21] point   21  21
## 79:   BoChr         BoChr[22] point   22  22
## 80:   BoChr         BoChr[23] point   23  23
##     feature         component label  min max