In this vignette, we would like to apply ezcox to a gene list containing oncogenes and tumor suppressors with TCGA datasets. All core features provided in ezcox will be presented. To focus on the presentation of ezcox features, we would only use mRNA expression data here. Without doubt, the features can be applied to any suitable variables, including:
- mRNA expression
- miRNA expression
- transcript expression
- protein expression
- mutation status
- methylation status
- promoter activity
- APOBEC activity
- clinical features
- …
All molecular profile data for public databases like TCGA can be obtained from UCSC Xena with UCSCXenaTools or UCSCXenaShiny.
library(UCSCXenaTools)
## =========================================================================================
## UCSCXenaTools version 1.4.7
## Project URL: https://github.com/ropensci/UCSCXenaTools
## Usages: https://cran.r-project.org/web/packages/UCSCXenaTools/vignettes/USCSXenaTools.html
##
## If you use it in published research, please cite:
## Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
## from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
## Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
## =========================================================================================
## --Enjoy it--
library(UCSCXenaShiny)
## =========================================================================================
## UCSCXenaShiny version 1.1.1
## Project URL: https://github.com/openbiox/UCSCXenaShiny
## Usages: https://openbiox.github.io/UCSCXenaShiny/
##
## If you use it in published research, please cite:
## Shixiang Wang, Yi Xiong, Longfei Zhao, Kai Gu, Yin Li, Fei Zhao, Jianfeng Li,
## Mingjie Wang, Haitao Wang, Ziyu Tao, Tao Wu, Yichao Zheng, Xuejun Li, Xue-Song Liu,
## UCSCXenaShiny: An R/CRAN Package for Interactive Analysis of UCSC Xena Data,
## Bioinformatics, 2021;, btab561, https://doi.org/10.1093/bioinformatics/btab561.
## =========================================================================================
## --Enjoy it--
library(ezcox)
## Welcome to 'ezcox' package!
## =======================================================================
## You are using ezcox version 1.0.0
##
## Project home : https://github.com/ShixiangWang/ezcox
## Documentation: https://shixiangwang.github.io/ezcox
##
## Run citation("ezcox") to see how to cite 'ezcox'.
## =======================================================================
##
library(readr)
library(dplyr)
##
## 载入程辑包:'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Data preparation
Read gene list
Firstly, we obtain driver gene list from IntOGen (referece: A compendium of mutational cancer driver genes).
<- read_tsv("IntOGen-DriverGenes.tsv", col_types = cols())
gene_table <- gene_table$Symbol gene_list
Read gene expression
Next, we obtain the gene expression data in TCGA.
For just one gene, we can use the query_pancan_value()
in UCSCXenaShiny
package.
<- query_pancan_value(gene_list[1])
expr ## Try querying data #1
## -> Checking if the dataset has probeMap...
## -> Done. ProbeMap is found.
## More info about dataset please run following commands:
## library(UCSCXenaTools)
## XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_rsem_gene_tpm") %>% XenaBrowse()
$expression[1:5]
expr## GTEX-S4Q7-0003-SM-3NM8M TCGA-19-1787-01 TCGA-S9-A7J2-01
## 4.785 5.887 5.517
## GTEX-QV31-1626-SM-2S1QC TCGA-G3-A3CH-11
## 4.431 2.382
For a few genes, fetch_dense_values()
in UCSCXenaTools
is more convenient. However, you need to previously confirm the UCSC Xena host and dataset ID (you can obtain this from UCSC Xena or UCSCXenaShiny
Shiny interface).
<- "https://toil.xenahubs.net"
host <- "tcga_RSEM_gene_tpm"
dataset
<- fetch_dense_values(host, dataset, gene_list, use_probeMap = TRUE)
expr
save(expr, file = "expr.RData")
-> Obtaining gene symbols...
-> Checking identifiers...
://toil.xenahubs.net dataset tcga_RSEM_gene_tpm
The following identifiers have been removed from host https1] "CNOT9" "NSD2"
[-> Done.
-> Checking samples...
-> Done.
-> Checking if the dataset has probeMap...
-> Done. ProbeMap is found.
-> Query done.
Merge data
Here, we merge TCGA clinical data, survival data, expression data with inner join.
load("expr.RData")
<- setdiff(gene_list, c("CNOT9", "NSD2"))
gene_list
<- expr %>%
df t() %>%
as.data.frame() %>%
::rownames_to_column("sample") %>%
tibbleas_tibble() %>%
inner_join(tcga_surv, by = "sample") %>%
inner_join(tcga_clinical, by = "sample") %>%
rename(age = age_at_initial_pathologic_diagnosis)
head(df)
## # A tibble: 6 × 595
## sample TP53 KRAS PIK3CA BRAF LRP1B KMT2D APC ARID1A KMT2C PTEN NRAS
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TCGA-19-… 5.89 3.95 2.75 3.71 0.537 4.00 3.01 5.09 3.90 4.59 5.06
## 2 TCGA-S9-… 5.52 3.98 3.91 3.59 3.86 4.77 5.8 4.51 4.95 4.37 3.61
## 3 TCGA-G3-… 2.38 1.82 1.05 1.00 -4.61 1.25 0.576 2.28 1.55 3.86 2.55
## 4 TCGA-EK-… 4.59 3.61 2.48 2.15 -9.97 3.98 2.16 4.24 2.18 5.16 4.56
## 5 TCGA-44-… 5.30 4.06 3.35 3.47 -4.04 5.12 2.78 5.67 4.17 4.79 5.14
## 6 TCGA-F4-… 6.86 3.39 2.44 2.90 -9.97 4.66 1.18 4.34 3.94 5.28 4.44
## # … with 583 more variables: CTNNB1 <dbl>, CDKN2A <dbl>, IDH1 <dbl>, RB1 <dbl>,
## # ATM <dbl>, NF1 <dbl>, SMAD4 <dbl>, FAT1 <dbl>, FBXW7 <dbl>, NOTCH1 <dbl>,
## # FAT4 <dbl>, EGFR <dbl>, ARID2 <dbl>, CREBBP <dbl>, KDM6A <dbl>, FAT3 <dbl>,
## # ATRX <dbl>, SETD2 <dbl>, ERBB2 <dbl>, NFE2L2 <dbl>, SMARCA4 <dbl>,
## # CDH1 <dbl>, SF3B1 <dbl>, VHL <dbl>, KEAP1 <dbl>, SPOP <dbl>, PBRM1 <dbl>,
## # EP300 <dbl>, FOXA1 <dbl>, PIK3R1 <dbl>, AR <dbl>, MAP3K1 <dbl>,
## # TRRAP <dbl>, ERBB3 <dbl>, GATA3 <dbl>, FGFR3 <dbl>, HRAS <dbl>, …
Scaling
To obtain pretty HR value, here we scale gene expression to 1-5, so every 1 increase in gene expression indicates 20% expression increase. As for HR, e.g., HR = 2
means hazard ratio would double when a gene expression per 20% increase.
<- function(x, by = 0.2) {
scaling <- quantile(x, seq(0, 1, by)) # some variables may have same value for 10% 20% etc.
y <- y[!c(FALSE, diff(y) == 0)]
y as.integer(cut(x, breaks = y, include.lowest = TRUE))
}
<- df %>%
df mutate_at(gene_list, scaling)
Simple usage
In this section, we will show simple usage of ezcox by constructing single unvariable and multivariable Cox models.
Single unvariable model
Here we simply construct a Cox model for overall survival (OS) only with TP53
.
<- ezcox(df, covariates = "TP53", time = "OS.time", status = "OS")
rv ## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
rv## # A tibble: 1 × 12
## Variable is_control contrast_level ref_level n_contrast n_ref beta HR
## <chr> <lgl> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 TP53 FALSE TP53 TP53 10496 10496 0.0393 1.04
## # … with 4 more variables: lower_95 <dbl>, upper_95 <dbl>, p.value <dbl>,
## # global.pval <dbl>
Useful information about the result model has been output as a data.frame
.
If you just want to show a forest plot for this model, you can use the following command. You only need to change the function name!
show_forest(df, covariates = "TP53", time = "OS.time", status = "OS")
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Resized limits to included dashed line in forest panel
To simplify, use:
show_forest(df,
covariates = "TP53", time = "OS.time", status = "OS",
merge_models = TRUE, add_caption = FALSE
)## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Resized limits to included dashed line in forest panel
The data shows when TP53 gene expression increases, the patient would have poor clinical outcome.
Actually, you may want to obtain the model behind the plot, you can use the command to return model firstly:
<- ezcox(df, covariates = "TP53", time = "OS.time", status = "OS", return_models = TRUE)
rv ## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
rv## $res
## # A tibble: 1 × 12
## Variable is_control contrast_level ref_level n_contrast n_ref beta HR
## <chr> <lgl> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 TP53 FALSE TP53 TP53 10496 10496 0.0393 1.04
## # … with 4 more variables: lower_95 <dbl>, upper_95 <dbl>, p.value <dbl>,
## # global.pval <dbl>
##
## $models
## # A tibble: 1 × 5
## Variable control model_file model status
## <chr> <chr> <chr> <list> <lgl>
## 1 TP53 <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox/ezcox_… <coxp… TRUE
##
## attr(,"class")
## [1] "ezcox" "list"
## attr(,"controls")
## character(0)
Then you can get it:
get_models(rv)
## $`Surv ~ TP53`
## Call:
## coxph(formula = fm, data = data)
##
## coef exp(coef) se(coef) z p
## TP53 0.03933 1.04012 0.01252 3.142 0.00168
##
## Likelihood ratio test=9.88 on 1 df, p=0.001669
## n= 10434, number of events= 3251
## (因为不存在,62个观察量被删除了)
##
## attr(,"class")
## [1] "ezcox_models" "list"
## attr(,"has_control")
## [1] FALSE
Single multivariable model
It’s very easy to construct a multivariable model by introducing another parameter controls
. For example, if we want to explore the TP53
expression with patients’ age and clinical stage also taking into consideration.
Let’s clean the data firstly:
<- df %>%
df mutate(
stage = case_when(
%in% c("I", "Stage I", "Stage IA1", "Stage IA2", "Stage IB", "Stage IB1", "Stage IB2", "Stage IC") ~ "I",
clinical_stage %in% c("IIa", "IIb", "Stage II", "Stage IIA", "Stage IIA1", "Stage IIA2", "Stage IIB", "Stage IIC") ~ "II",
clinical_stage %in% c("III", "Stage III", "Stage IIIA", "Stage IIIB", "Stage IIIC", "Stage IIIC1", "Stage IIIC2") ~ "III",
clinical_stage %in% c("IVa", "IVb", "Stage IV", "Stage IVA", "Stage IVB", "Stage IVC") ~ "IV",
clinical_stage TRUE ~ NA_character_
),stage = factor(stage, levels = c("I", "II", "III", "IV"))
)
<- ezcox(df,
rv covariates = "TP53", controls = c("age", "stage"),
time = "OS.time", status = "OS", return_models = TRUE
)## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
rv## $res
## # A tibble: 5 × 12
## Variable is_control contrast_level ref_level n_contrast n_ref beta HR
## <chr> <lgl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 TP53 FALSE TP53 TP53 10496 10496 -0.0404 0.96
## 2 TP53 TRUE age age 10445 10445 0.022 1.02
## 3 TP53 TRUE II I 423 463 0.457 1.58
## 4 TP53 TRUE III I 711 463 1.1 3
## 5 TP53 TRUE IV I 445 463 1.37 3.93
## # … with 4 more variables: lower_95 <dbl>, upper_95 <dbl>, p.value <dbl>,
## # global.pval <dbl>
##
## $models
## # A tibble: 1 × 5
## Variable control model_file model status
## <chr> <chr> <chr> <list> <lgl>
## 1 TP53 age,stage /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox/ezco… <coxp… TRUE
##
## attr(,"class")
## [1] "ezcox" "list"
## attr(,"controls")
## [1] "age" "stage"
The code constructed a model as below:
names(get_models(rv))
## [1] "Surv ~ TP53 + age + stage"
For single model, we can exchange the role of covariable and control variables, they are the same model. The key point is how what we want to explore and how we treat and explain the model.
# If you passing the parameters below to ezcox() will return the same model
= "TP53", controls = c("age", "stage")
covariates = "age", controls = c("TP53", "stage")
covariates = "stage", controls = c("age", "TP53") covariates
Still, let’s show the result more clearly with forest plot:
show_forest(df,
covariates = "TP53", controls = c("age", "stage"),
time = "OS.time", status = "OS", merge_models = TRUE,
show_global_p = "bottom", add_caption = FALSE
)## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Warning in recalculate_width_panels(panel_positions, mapped_text =
## mapped_text, : Unable to resize forest panel to be smaller than its heading;
## consider a smaller text size
Ooops! Why TP53 has no significant association with OS when the clinical features have been included?
There are some reasons:
- The first reason we can read from the forest plots above. TP53 is not an independent variable, it has association with patient’s age and clinical stages. You could explore this with correlation analysis and box plots.
- In the model above, we ignored a very strong factor, i.e., “cancer type”. We already know that different tissues have different gene expression status. Actually, TP53 plays different roles in different cancer types. You can read some papers for more details.
Batch processing
Now, we are moving to the most powerful part of ezcox. Cox analysis is a common analysis technique to link variable to clinical outcome. In the omics era, Cox model batch processing is a basic strategy for screening clinical relevant variables, biomarker discovery and gene signature identification. However, all such analyses have been implemented with homebrew code in research community, thus lack of transparency and reproducibility. ezcox is born to address this issue for open science.
In batch processing, we provide execution in either sequential or parallel ways.
Considering only ~2,000 patients have stage status. The following code will only include age, cancer type, gender as control variables in the model construction, so most of the data (~10,000) can be utilized.
Sequential execution
The only parameter we need to change is covariates
, instead one variable, we pass a vector for batch processing.
<- ezcox(df,
bt covariates = gene_list, controls = c("age", "gender", "type"),
time = "OS.time", status = "OS",
keep_models = TRUE, return_models = FALSE
)# don't return models if you have many models in processing
That’s it.
::datatable(bt$res, caption = "Cox batch processing result for ~500 genes in TCGA dataset")
DT## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
::datatable(bt$models, caption = "The models have been stored in local files") DT
To note, we did not directly store all models from the batch processing in the result object bt
. Instead, we store the models to local file and link it to each covariates. On the one hand, we can store the modeling result for reproducible research and avoid repetitive calculation. On the other hand, we can filter model by simply filtering the data table bt$res
and retrieve specific models we required for downstream analysis and visualization.
Parallel execution
For parallel execution, we only need to change the function from ezcox()
to ezcox_parallel()
. At default, it would use multiple sessions to process the task. batch_size
is used to set the workload of each session.
<- ezcox_parallel(df,
bt2 covariates = gene_list, controls = c("age", "gender", "type"),
time = "OS.time", status = "OS",
keep_models = TRUE, return_models = FALSE, batch_size = 50
)
all.equal(bt$res, bt2$res)
## [1] TRUE
Visualization filtering
There are so many variables, we can’t visualize them in one forest plot. Next we need to filter the result before plotting.
Firstly, we quickly filter out all controls.
<- filter_ezcox(bt) bt3
Sort the result by p.value
and then show it with table.
$res <- bt3$res %>% arrange(p.value)
bt3::datatable(bt3$res, caption = "Core result from batch processing") DT
How many variables are statistical significant?
$res %>%
bt3filter(p.value < 0.05) %>%
nrow()
## [1] 248
That’s so many. In practice, we generally limit the final gene number <10. So for illustration, here we just keep the top ten.
<- get_models(bt3, variables = bt3$res$Variable[1:10]) mds
Next visualize the top 2:
show_models(mds[1:2],
format_options = forestmodel::forest_model_format_options(point_size = 2)
)
The plot is very big, such figure is only suitable for one or two target variables.
If we want to focus the gene list and include all of them in such a plot, how could we do that?
show_models(
mds,merge_models = TRUE,
drop_controls = TRUE
)## covariates=NULL but drop_controls=TRUE, detecting controls...
## Yes. Setting variables to keep...
## Done.
## Resized limits to included dashed line in forest panel
Group analysis
In our research experience, sometimes we need to explore one variable with Cox in different groups. Here, for example, could be sex, stage, cancer types. By this approach, we don’t need to adjust the group variable in Cox model any more, so we can focus on the variable importance in each subgroup.
Take KRAS as an example, here we draw how it affect survival in different cancer types with ezcox_group()
.
ezcox_group(df,
grp_var = "type", covariate = "KRAS",
time = "OS.time", status = "OS",
format_options = forestmodel::forest_model_format_options(point_size = 2)
)## => Processing variable ACC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BLCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BRCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable CESC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable CHOL
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable COAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable DLBC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable ESCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable GBM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable HNSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KICH
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRP
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LAML
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LGG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LIHC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable MESO
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable OV
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PAAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PCPG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PRAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable READ
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SARC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SKCM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable STAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable TGCT
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable THCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable THYM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UCEC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UCS
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UVM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## covariates=NULL but drop_controls=TRUE, detecting controls...
## No. Skipping...
## Done.
## Returns a list containing data and ggplot.
## $data
## $stats
## # A tibble: 33 × 13
## Group Variable is_control contrast_level ref_level n_contrast n_ref beta
## <chr> <chr> <lgl> <chr> <chr> <int> <int> <dbl>
## 1 ACC KRAS FALSE ACC ACC 77 77 0.611
## 2 BLCA KRAS FALSE BLCA BLCA 426 426 0.0675
## 3 BRCA KRAS FALSE BRCA BRCA 1211 1211 0.0419
## 4 CESC KRAS FALSE CESC CESC 309 309 0.147
## 5 CHOL KRAS FALSE CHOL CHOL 45 45 -0.0833
## 6 COAD KRAS FALSE COAD COAD 329 329 -0.0532
## 7 DLBC KRAS FALSE DLBC DLBC 47 47 0.106
## 8 ESCA KRAS FALSE ESCA ESCA 195 195 -0.076
## 9 GBM KRAS FALSE GBM GBM 165 165 -0.00971
## 10 HNSC KRAS FALSE HNSC HNSC 564 564 -0.0124
## # … with 23 more rows, and 5 more variables: HR <dbl>, lower_95 <dbl>,
## # upper_95 <dbl>, p.value <dbl>, global.pval <dbl>
##
## $models
## # A tibble: 33 × 6
## Group Variable control model_file model status
## <chr> <chr> <chr> <chr> <list> <lgl>
## 1 ACC KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 2 BLCA KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 3 BRCA KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 4 CESC KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 5 CHOL KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 6 COAD KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 7 DLBC KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 8 ESCA KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 9 GBM KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## 10 HNSC KRAS <NA> /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE
## # … with 23 more rows
##
## attr(,"class")
## [1] "ezcox"
##
## $plot
If you want to adjust other variables, that’s same as we did above.
ezcox_group(df,
grp_var = "type",
covariate = "KRAS", controls = c("age", "gender"),
time = "OS.time", status = "OS",
format_options = forestmodel::forest_model_format_options(point_size = 2)
)## => Processing variable ACC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BLCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BRCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable CESC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable CESC
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable CHOL
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable COAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable DLBC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable ESCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable GBM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable HNSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KICH
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRP
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LAML
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LGG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LIHC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable MESO
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable OV
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable OV
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable PAAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PCPG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PRAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable PRAD
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable READ
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SARC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SKCM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable STAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable TGCT
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable TGCT
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable THCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable THYM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UCEC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable UCEC
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable UCS
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable UCS
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable UVM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Skipping the following failed variables:
## KRAS, KRAS, KRAS, KRAS, KRAS, KRAS
## covariates=NULL but drop_controls=TRUE, detecting controls...
## Yes. Setting variables to keep...
## Done.
## Returns a list containing data and ggplot.
## $data
## $stats
## # A tibble: 93 × 13
## Group Variable is_control contrast_level ref_level n_contrast n_ref beta
## <chr> <chr> <lgl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 ACC KRAS FALSE ACC ACC 77 77 0.611
## 2 ACC KRAS TRUE age age 77 77 0.00895
## 3 ACC KRAS TRUE MALE FEMALE 31 46 0.2
## 4 BLCA KRAS FALSE BLCA BLCA 426 426 0.0847
## 5 BLCA KRAS TRUE age age 426 426 0.0323
## 6 BLCA KRAS TRUE MALE FEMALE 311 115 -0.0917
## 7 BRCA KRAS FALSE BRCA BRCA 1211 1211 0.0365
## 8 BRCA KRAS TRUE age age 1210 1210 0.0289
## 9 BRCA KRAS TRUE MALE FEMALE 13 1198 -0.731
## 10 CESC KRAS FALSE CESC CESC 309 309 NA
## # … with 83 more rows, and 5 more variables: HR <dbl>, lower_95 <dbl>,
## # upper_95 <dbl>, p.value <dbl>, global.pval <dbl>
##
## $models
## # A tibble: 33 × 6
## Group Variable control model_file model status
## <chr> <chr> <chr> <chr> <list> <lgl>
## 1 ACC KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 2 BLCA KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 3 BRCA KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 4 CESC KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <NULL> FALSE
## 5 CHOL KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 6 COAD KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 7 DLBC KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 8 ESCA KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 9 GBM KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## 10 HNSC KRAS age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE
## # … with 23 more rows
##
## attr(,"class")
## [1] "ezcox"
##
## $plot
If you don’t check the caption in the two forest plots above. They seems same! So please note when you do such analysis, we recommend you add the caption and describe how you apply ezcox in figure legend and method part, otherwise it’s very easy to misguide the readers.