Get Copy Number Sequence Similarity or Distance Matrix

get_score_matrix(
  x,
  sub_mat = NULL,
  simple_version = FALSE,
  block_size = NULL,
  dislike = FALSE,
  cores = 1L,
  verbose = FALSE
)

Arguments

x	a coding copy number sequence (valid letters are A to X).
sub_mat	default is `NULL`, use longest common substring method. It can be a substitution matrix, each element indicates a score to plus. See `build_sub_matrix()`.
simple_version	if `TRUE`, just use segmental copy number value.
block_size	a block size to aggregrate, this is designed for big data, it means results from adjacent sequences will be aggregrate by means to reduce the size of result matrix.
dislike	if `TRUE`, returns a dissimilarity matrix instead of a similarity matrix.
cores	computer cores, default is `1`, note it is super fast already, set more cores typically do not speed up the computation.
verbose	if `TRUE`, print extra message, note it will slower the computation.

Value

a score matrix.

Examples

load(system.file("extdata", "toy_segTab.RData",
  package = "CNVMotif", mustWork = TRUE
))
x <- transform_seqs(segTabs)
x
seqs <- extract_seqs(x$dt)
seqs
seqs2 <- extract_seqs(x$dt, flexible_approach = TRUE)
seqs2

mat <- get_score_matrix(seqs$keep, x$mat, verbose = TRUE)
mat

mat2 <- get_score_matrix(seqs$keep, x$mat, dislike = TRUE)
identical(mat2, 120L - mat)

mat_b <- get_score_matrix(seqs$keep, x$mat, block_size = 2L)
## block1 represents the first 2 sequences
## block2 represents the 3rd, 4th sequences
## ...
mat_b

mat_c <- get_score_matrix(seqs$keep)
mat_c
mat_d <- get_score_matrix(seqs$keep, dislike = TRUE)
mat_d
# \donttest{
if (requireNamespace("doParallel")) {
  mock_seqs <- sapply(1:10000, function(x) {
    paste(sample(LETTERS[1:24], 5, replace = TRUE), collapse = "")
  })

  system.time(
    y1 <- get_score_matrix(mock_seqs, x$mat, cores = 1)
  )

  system.time(
    y2 <- get_score_matrix(mock_seqs, x$mat, cores = 2)
  )

  all.equal(y1, y2)
}
# }