ezcox: an R Package for Cox Model Batch Processing and Visualization

In this vignette, we would like to apply ezcox to a gene list containing oncogenes and tumor suppressors with TCGA datasets. All core features provided in ezcox will be presented. To focus on the presentation of ezcox features, we would only use mRNA expression data here. Without doubt, the features can be applied to any suitable variables, including:

mRNA expression
miRNA expression
transcript expression
protein expression
mutation status
methylation status
promoter activity
APOBEC activity
clinical features
…

All molecular profile data for public databases like TCGA can be obtained from UCSC Xena with UCSCXenaTools or UCSCXenaShiny.

library(UCSCXenaTools)
## =========================================================================================
## UCSCXenaTools version 1.4.7
## Project URL: https://github.com/ropensci/UCSCXenaTools
## Usages: https://cran.r-project.org/web/packages/UCSCXenaTools/vignettes/USCSXenaTools.html
## 
## If you use it in published research, please cite:
## Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
##   from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
##   Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
## =========================================================================================
##                               --Enjoy it--
library(UCSCXenaShiny)
## =========================================================================================
## UCSCXenaShiny version 1.1.1
## Project URL: https://github.com/openbiox/UCSCXenaShiny
## Usages: https://openbiox.github.io/UCSCXenaShiny/
## 
## If you use it in published research, please cite:
##   Shixiang Wang, Yi Xiong, Longfei Zhao, Kai Gu, Yin Li, Fei Zhao, Jianfeng Li,
##   Mingjie Wang, Haitao Wang, Ziyu Tao, Tao Wu, Yichao Zheng, Xuejun Li, Xue-Song Liu,
##   UCSCXenaShiny: An R/CRAN Package for Interactive Analysis of UCSC Xena Data, 
##   Bioinformatics, 2021;, btab561, https://doi.org/10.1093/bioinformatics/btab561.
## =========================================================================================
##                               --Enjoy it--
library(ezcox)
## Welcome to 'ezcox' package!
## =======================================================================
## You are using ezcox version 1.0.0
## 
## Project home : https://github.com/ShixiangWang/ezcox
## Documentation: https://shixiangwang.github.io/ezcox
## 
## Run citation("ezcox") to see how to cite 'ezcox'.
## =======================================================================
## 
library(readr)
library(dplyr)
## 
## 载入程辑包：'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data preparation

Read gene list

Firstly, we obtain driver gene list from IntOGen (referece: A compendium of mutational cancer driver genes).

gene_table <- read_tsv("IntOGen-DriverGenes.tsv", col_types = cols())
gene_list <- gene_table$Symbol

Read gene expression

Next, we obtain the gene expression data in TCGA.

For just one gene, we can use the query_pancan_value() in UCSCXenaShiny package.

expr <- query_pancan_value(gene_list[1])
## Try querying data #1
## -> Checking if the dataset has probeMap...
## -> Done. ProbeMap is found.
## More info about dataset please run following commands:
##   library(UCSCXenaTools)
##   XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_rsem_gene_tpm") %>% XenaBrowse()
expr$expression[1:5]
## GTEX-S4Q7-0003-SM-3NM8M         TCGA-19-1787-01         TCGA-S9-A7J2-01 
##                   4.785                   5.887                   5.517 
## GTEX-QV31-1626-SM-2S1QC         TCGA-G3-A3CH-11 
##                   4.431                   2.382

For a few genes, fetch_dense_values() in UCSCXenaTools is more convenient. However, you need to previously confirm the UCSC Xena host and dataset ID (you can obtain this from UCSC Xena or UCSCXenaShiny Shiny interface).

host <- "https://toil.xenahubs.net"
dataset <- "tcga_RSEM_gene_tpm"

expr <- fetch_dense_values(host, dataset, gene_list, use_probeMap = TRUE)

save(expr, file = "expr.RData")

-> Obtaining gene symbols...
-> Checking identifiers...                                                                                                                             
The following identifiers have been removed from host https://toil.xenahubs.net dataset tcga_RSEM_gene_tpm
[1] "CNOT9" "NSD2" 
-> Done.
-> Checking samples...
-> Done.
-> Checking if the dataset has probeMap...
-> Done. ProbeMap is found.
-> Query done.

Merge data

Here, we merge TCGA clinical data, survival data, expression data with inner join.

load("expr.RData")

gene_list <- setdiff(gene_list, c("CNOT9", "NSD2"))

df <- expr %>%
  t() %>%
  as.data.frame() %>%
  tibble::rownames_to_column("sample") %>%
  as_tibble() %>%
  inner_join(tcga_surv, by = "sample") %>%
  inner_join(tcga_clinical, by = "sample") %>%
  rename(age = age_at_initial_pathologic_diagnosis)

head(df)
## # A tibble: 6 × 595
##   sample     TP53  KRAS PIK3CA  BRAF  LRP1B KMT2D   APC ARID1A KMT2C  PTEN  NRAS
##   <chr>     <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 TCGA-19-…  5.89  3.95   2.75  3.71  0.537  4.00 3.01    5.09  3.90  4.59  5.06
## 2 TCGA-S9-…  5.52  3.98   3.91  3.59  3.86   4.77 5.8     4.51  4.95  4.37  3.61
## 3 TCGA-G3-…  2.38  1.82   1.05  1.00 -4.61   1.25 0.576   2.28  1.55  3.86  2.55
## 4 TCGA-EK-…  4.59  3.61   2.48  2.15 -9.97   3.98 2.16    4.24  2.18  5.16  4.56
## 5 TCGA-44-…  5.30  4.06   3.35  3.47 -4.04   5.12 2.78    5.67  4.17  4.79  5.14
## 6 TCGA-F4-…  6.86  3.39   2.44  2.90 -9.97   4.66 1.18    4.34  3.94  5.28  4.44
## # … with 583 more variables: CTNNB1 <dbl>, CDKN2A <dbl>, IDH1 <dbl>, RB1 <dbl>,
## #   ATM <dbl>, NF1 <dbl>, SMAD4 <dbl>, FAT1 <dbl>, FBXW7 <dbl>, NOTCH1 <dbl>,
## #   FAT4 <dbl>, EGFR <dbl>, ARID2 <dbl>, CREBBP <dbl>, KDM6A <dbl>, FAT3 <dbl>,
## #   ATRX <dbl>, SETD2 <dbl>, ERBB2 <dbl>, NFE2L2 <dbl>, SMARCA4 <dbl>,
## #   CDH1 <dbl>, SF3B1 <dbl>, VHL <dbl>, KEAP1 <dbl>, SPOP <dbl>, PBRM1 <dbl>,
## #   EP300 <dbl>, FOXA1 <dbl>, PIK3R1 <dbl>, AR <dbl>, MAP3K1 <dbl>,
## #   TRRAP <dbl>, ERBB3 <dbl>, GATA3 <dbl>, FGFR3 <dbl>, HRAS <dbl>, …

Scaling

To obtain pretty HR value, here we scale gene expression to 1-5, so every 1 increase in gene expression indicates 20% expression increase. As for HR, e.g., HR = 2 means hazard ratio would double when a gene expression per 20% increase.

scaling <- function(x, by = 0.2) {
  y <- quantile(x, seq(0, 1, by)) # some variables may have same value for 10% 20% etc.
  y <- y[!c(FALSE, diff(y) == 0)]
  as.integer(cut(x, breaks = y, include.lowest = TRUE))
}

df <- df %>%
  mutate_at(gene_list, scaling)

Simple usage

In this section, we will show simple usage of ezcox by constructing single unvariable and multivariable Cox models.

Single unvariable model

Here we simply construct a Cox model for overall survival (OS) only with TP53.

rv <- ezcox(df, covariates = "TP53", time = "OS.time", status = "OS")
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
rv
## # A tibble: 1 × 12
##   Variable is_control contrast_level ref_level n_contrast n_ref   beta    HR
##   <chr>    <lgl>      <chr>          <chr>          <int> <int>  <dbl> <dbl>
## 1 TP53     FALSE      TP53           TP53           10496 10496 0.0393  1.04
## # … with 4 more variables: lower_95 <dbl>, upper_95 <dbl>, p.value <dbl>,
## #   global.pval <dbl>

Useful information about the result model has been output as a data.frame.

If you just want to show a forest plot for this model, you can use the following command. You only need to change the function name!

show_forest(df, covariates = "TP53", time = "OS.time", status = "OS")
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Resized limits to included dashed line in forest panel

To simplify, use:

show_forest(df,
  covariates = "TP53", time = "OS.time", status = "OS",
  merge_models = TRUE, add_caption = FALSE
)
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Resized limits to included dashed line in forest panel

The data shows when TP53 gene expression increases, the patient would have poor clinical outcome.

Actually, you may want to obtain the model behind the plot, you can use the command to return model firstly:

rv <- ezcox(df, covariates = "TP53", time = "OS.time", status = "OS", return_models = TRUE)
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
rv
## $res
## # A tibble: 1 × 12
##   Variable is_control contrast_level ref_level n_contrast n_ref   beta    HR
##   <chr>    <lgl>      <chr>          <chr>          <int> <int>  <dbl> <dbl>
## 1 TP53     FALSE      TP53           TP53           10496 10496 0.0393  1.04
## # … with 4 more variables: lower_95 <dbl>, upper_95 <dbl>, p.value <dbl>,
## #   global.pval <dbl>
## 
## $models
## # A tibble: 1 × 5
##   Variable control model_file                                      model  status
##   <chr>    <chr>   <chr>                                           <list> <lgl> 
## 1 TP53     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox/ezcox_… <coxp… TRUE  
## 
## attr(,"class")
## [1] "ezcox" "list" 
## attr(,"controls")
## character(0)

Then you can get it:

get_models(rv)
## $`Surv ~ TP53`
## Call:
## coxph(formula = fm, data = data)
## 
##         coef exp(coef) se(coef)     z       p
## TP53 0.03933   1.04012  0.01252 3.142 0.00168
## 
## Likelihood ratio test=9.88  on 1 df, p=0.001669
## n= 10434, number of events= 3251 
##    (因为不存在，62个观察量被删除了)
## 
## attr(,"class")
## [1] "ezcox_models" "list"        
## attr(,"has_control")
## [1] FALSE

Single multivariable model

It’s very easy to construct a multivariable model by introducing another parameter controls. For example, if we want to explore the TP53 expression with patients’ age and clinical stage also taking into consideration.

Let’s clean the data firstly:

df <- df %>%
  mutate(
    stage = case_when(
      clinical_stage %in% c("I", "Stage I", "Stage IA1", "Stage IA2", "Stage IB", "Stage IB1", "Stage IB2", "Stage IC") ~ "I",
      clinical_stage %in% c("IIa", "IIb", "Stage II", "Stage IIA", "Stage IIA1", "Stage IIA2", "Stage IIB", "Stage IIC") ~ "II",
      clinical_stage %in% c("III", "Stage III", "Stage IIIA", "Stage IIIB", "Stage IIIC", "Stage IIIC1", "Stage IIIC2") ~ "III",
      clinical_stage %in% c("IVa", "IVb", "Stage IV", "Stage IVA", "Stage IVB", "Stage IVC") ~ "IV",
      TRUE ~ NA_character_
    ),
    stage = factor(stage, levels = c("I", "II", "III", "IV"))
  )

rv <- ezcox(df,
  covariates = "TP53", controls = c("age", "stage"),
  time = "OS.time", status = "OS", return_models = TRUE
)
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
rv
## $res
## # A tibble: 5 × 12
##   Variable is_control contrast_level ref_level n_contrast n_ref    beta    HR
##   <chr>    <lgl>      <chr>          <chr>          <dbl> <dbl>   <dbl> <dbl>
## 1 TP53     FALSE      TP53           TP53           10496 10496 -0.0404  0.96
## 2 TP53     TRUE       age            age            10445 10445  0.022   1.02
## 3 TP53     TRUE       II             I                423   463  0.457   1.58
## 4 TP53     TRUE       III            I                711   463  1.1     3   
## 5 TP53     TRUE       IV             I                445   463  1.37    3.93
## # … with 4 more variables: lower_95 <dbl>, upper_95 <dbl>, p.value <dbl>,
## #   global.pval <dbl>
## 
## $models
## # A tibble: 1 × 5
##   Variable control   model_file                                    model  status
##   <chr>    <chr>     <chr>                                         <list> <lgl> 
## 1 TP53     age,stage /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox/ezco… <coxp… TRUE  
## 
## attr(,"class")
## [1] "ezcox" "list" 
## attr(,"controls")
## [1] "age"   "stage"

The code constructed a model as below:

names(get_models(rv))
## [1] "Surv ~ TP53 + age + stage"

For single model, we can exchange the role of covariable and control variables, they are the same model. The key point is how what we want to explore and how we treat and explain the model.

# If you passing the parameters below to ezcox() will return the same model
covariates = "TP53", controls = c("age", "stage")
covariates = "age", controls = c("TP53", "stage")
covariates = "stage", controls = c("age", "TP53")

Still, let’s show the result more clearly with forest plot:

show_forest(df,
  covariates = "TP53", controls = c("age", "stage"),
  time = "OS.time", status = "OS", merge_models = TRUE,
  show_global_p = "bottom", add_caption = FALSE
)
## => Processing variable TP53
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Warning in recalculate_width_panels(panel_positions, mapped_text =
## mapped_text, : Unable to resize forest panel to be smaller than its heading;
## consider a smaller text size

Ooops! Why TP53 has no significant association with OS when the clinical features have been included?

There are some reasons:

The first reason we can read from the forest plots above. TP53 is not an independent variable, it has association with patient’s age and clinical stages. You could explore this with correlation analysis and box plots.
In the model above, we ignored a very strong factor, i.e., “cancer type”. We already know that different tissues have different gene expression status. Actually, TP53 plays different roles in different cancer types. You can read some papers for more details.

source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5352183/

Batch processing

Now, we are moving to the most powerful part of ezcox. Cox analysis is a common analysis technique to link variable to clinical outcome. In the omics era, Cox model batch processing is a basic strategy for screening clinical relevant variables, biomarker discovery and gene signature identification. However, all such analyses have been implemented with homebrew code in research community, thus lack of transparency and reproducibility. ezcox is born to address this issue for open science.

In batch processing, we provide execution in either sequential or parallel ways.

Considering only ~2,000 patients have stage status. The following code will only include age, cancer type, gender as control variables in the model construction, so most of the data (~10,000) can be utilized.

Sequential execution

The only parameter we need to change is covariates, instead one variable, we pass a vector for batch processing.

bt <- ezcox(df,
  covariates = gene_list, controls = c("age", "gender", "type"),
  time = "OS.time", status = "OS",
  keep_models = TRUE, return_models = FALSE
)
# don't return models if you have many models in processing

That’s it.

DT::datatable(bt$res, caption = "Cox batch processing result for ~500 genes in TCGA dataset")
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

DT::datatable(bt$models, caption = "The models have been stored in local files")

To note, we did not directly store all models from the batch processing in the result object bt. Instead, we store the models to local file and link it to each covariates. On the one hand, we can store the modeling result for reproducible research and avoid repetitive calculation. On the other hand, we can filter model by simply filtering the data table bt$res and retrieve specific models we required for downstream analysis and visualization.

Parallel execution

For parallel execution, we only need to change the function from ezcox() to ezcox_parallel(). At default, it would use multiple sessions to process the task. batch_size is used to set the workload of each session.

bt2 <- ezcox_parallel(df,
  covariates = gene_list, controls = c("age", "gender", "type"),
  time = "OS.time", status = "OS",
  keep_models = TRUE, return_models = FALSE, batch_size = 50
)

all.equal(bt$res, bt2$res)
## [1] TRUE

Visualization filtering

There are so many variables, we can’t visualize them in one forest plot. Next we need to filter the result before plotting.

Firstly, we quickly filter out all controls.

bt3 <- filter_ezcox(bt)

Sort the result by p.value and then show it with table.

bt3$res <- bt3$res %>% arrange(p.value)
DT::datatable(bt3$res, caption = "Core result from batch processing")

How many variables are statistical significant?

bt3$res %>%
  filter(p.value < 0.05) %>%
  nrow()
## [1] 248

That’s so many. In practice, we generally limit the final gene number <10. So for illustration, here we just keep the top ten.

mds <- get_models(bt3, variables = bt3$res$Variable[1:10])

Next visualize the top 2:

show_models(mds[1:2],
  format_options = forestmodel::forest_model_format_options(point_size = 2)
)

The plot is very big, such figure is only suitable for one or two target variables.

If we want to focus the gene list and include all of them in such a plot, how could we do that?

show_models(
  mds,
  merge_models = TRUE,
  drop_controls = TRUE
)
## covariates=NULL but drop_controls=TRUE, detecting controls...
## Yes. Setting variables to keep...
## Done.
## Resized limits to included dashed line in forest panel

Group analysis

In our research experience, sometimes we need to explore one variable with Cox in different groups. Here, for example, could be sex, stage, cancer types. By this approach, we don’t need to adjust the group variable in Cox model any more, so we can focus on the variable importance in each subgroup.

Take KRAS as an example, here we draw how it affect survival in different cancer types with ezcox_group().

ezcox_group(df,
  grp_var = "type", covariate = "KRAS",
  time = "OS.time", status = "OS",
  format_options = forestmodel::forest_model_format_options(point_size = 2)
)
## => Processing variable ACC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BLCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BRCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable CESC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable CHOL
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable COAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable DLBC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable ESCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable GBM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable HNSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KICH
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRP
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LAML
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LGG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LIHC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable MESO
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable OV
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PAAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PCPG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PRAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable READ
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SARC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SKCM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable STAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable TGCT
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable THCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable THYM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UCEC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UCS
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UVM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## covariates=NULL but drop_controls=TRUE, detecting controls...
## No. Skipping...
## Done.
## Returns a list containing data and ggplot.
## $data
## $stats
## # A tibble: 33 × 13
##    Group Variable is_control contrast_level ref_level n_contrast n_ref     beta
##    <chr> <chr>    <lgl>      <chr>          <chr>          <int> <int>    <dbl>
##  1 ACC   KRAS     FALSE      ACC            ACC               77    77  0.611  
##  2 BLCA  KRAS     FALSE      BLCA           BLCA             426   426  0.0675 
##  3 BRCA  KRAS     FALSE      BRCA           BRCA            1211  1211  0.0419 
##  4 CESC  KRAS     FALSE      CESC           CESC             309   309  0.147  
##  5 CHOL  KRAS     FALSE      CHOL           CHOL              45    45 -0.0833 
##  6 COAD  KRAS     FALSE      COAD           COAD             329   329 -0.0532 
##  7 DLBC  KRAS     FALSE      DLBC           DLBC              47    47  0.106  
##  8 ESCA  KRAS     FALSE      ESCA           ESCA             195   195 -0.076  
##  9 GBM   KRAS     FALSE      GBM            GBM              165   165 -0.00971
## 10 HNSC  KRAS     FALSE      HNSC           HNSC             564   564 -0.0124 
## # … with 23 more rows, and 5 more variables: HR <dbl>, lower_95 <dbl>,
## #   upper_95 <dbl>, p.value <dbl>, global.pval <dbl>
## 
## $models
## # A tibble: 33 × 6
##    Group Variable control model_file                               model  status
##    <chr> <chr>    <chr>   <chr>                                    <list> <lgl> 
##  1 ACC   KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  2 BLCA  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  3 BRCA  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  4 CESC  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  5 CHOL  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  6 COAD  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  7 DLBC  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  8 ESCA  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
##  9 GBM   KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
## 10 HNSC  KRAS     <NA>    /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ezcox… <coxp… TRUE  
## # … with 23 more rows
## 
## attr(,"class")
## [1] "ezcox"
## 
## $plot

If you want to adjust other variables, that’s same as we did above.

ezcox_group(df,
  grp_var = "type",
  covariate = "KRAS", controls = c("age", "gender"),
  time = "OS.time", status = "OS",
  format_options = forestmodel::forest_model_format_options(point_size = 2)
)
## => Processing variable ACC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BLCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable BRCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable CESC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable CESC
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable CHOL
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable COAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable DLBC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable ESCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable GBM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable HNSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KICH
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable KIRP
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LAML
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LGG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LIHC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable LUSC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable MESO
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable OV
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable OV
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable PAAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PCPG
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable PRAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable PRAD
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable READ
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SARC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable SKCM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable STAD
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable TGCT
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable TGCT
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable THCA
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable THYM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## => Processing variable UCEC
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable UCEC
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable UCS
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Something wrong with variable UCS
## ====> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
## ==> Done.
## => Processing variable UVM
## ==> Building Surv object...
## ==> Building Cox model...
## ==> Done.
## Skipping the following failed variables:
##  KRAS, KRAS, KRAS, KRAS, KRAS, KRAS
## covariates=NULL but drop_controls=TRUE, detecting controls...
## Yes. Setting variables to keep...
## Done.
## Returns a list containing data and ggplot.
## $data
## $stats
## # A tibble: 93 × 13
##    Group Variable is_control contrast_level ref_level n_contrast n_ref     beta
##    <chr> <chr>    <lgl>      <chr>          <chr>          <dbl> <dbl>    <dbl>
##  1 ACC   KRAS     FALSE      ACC            ACC               77    77  0.611  
##  2 ACC   KRAS     TRUE       age            age               77    77  0.00895
##  3 ACC   KRAS     TRUE       MALE           FEMALE            31    46  0.2    
##  4 BLCA  KRAS     FALSE      BLCA           BLCA             426   426  0.0847 
##  5 BLCA  KRAS     TRUE       age            age              426   426  0.0323 
##  6 BLCA  KRAS     TRUE       MALE           FEMALE           311   115 -0.0917 
##  7 BRCA  KRAS     FALSE      BRCA           BRCA            1211  1211  0.0365 
##  8 BRCA  KRAS     TRUE       age            age             1210  1210  0.0289 
##  9 BRCA  KRAS     TRUE       MALE           FEMALE            13  1198 -0.731  
## 10 CESC  KRAS     FALSE      CESC           CESC             309   309 NA      
## # … with 83 more rows, and 5 more variables: HR <dbl>, lower_95 <dbl>,
## #   upper_95 <dbl>, p.value <dbl>, global.pval <dbl>
## 
## $models
## # A tibble: 33 × 6
##    Group Variable control    model_file                            model  status
##    <chr> <chr>    <chr>      <chr>                                 <list> <lgl> 
##  1 ACC   KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  2 BLCA  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  3 BRCA  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  4 CESC  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <NULL> FALSE 
##  5 CHOL  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  6 COAD  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  7 DLBC  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  8 ESCA  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
##  9 GBM   KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
## 10 HNSC  KRAS     age,gender /Volumes/Extra/R/Rtmp//RtmpCWQCzZ/ez… <coxp… TRUE  
## # … with 23 more rows
## 
## attr(,"class")
## [1] "ezcox"
## 
## $plot

If you don’t check the caption in the two forest plots above. They seems same! So please note when you do such analysis, we recommend you add the caption and describe how you apply ezcox in figure legend and method part, otherwise it’s very easy to misguide the readers.