cumulative-incidence

The goal of the cumulative-incidence project is to provide a tutorial for JHS-HWG members who want to create cumulative incidence plots using ggplot2. To demonstrate the code in this repository, we use the flchain data from the survival package. These data are loaded and printed here.

This is a stratified random sample containing 1/2 of the subjects from a study of the relationship between serum free light chain (FLC) and mortality. The original sample contains samples on approximately 2/3 of the residents of Olmsted County aged 50 or greater.

# the flchain data are in the survival package
library(survival)
# tidyverse is used for convenience
library(tidyverse)

# take a look at the data, 
# each column is printed with a few values to the right of its column name
glimpse(flchain)
#> Rows: 7,874
#> Columns: 11
#> $ age        <dbl> 97, 92, 94, 92, 93, 90, 90, 90, 93, 91, 96, 90, 90, 97, ...
#> $ sex        <fct> F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,...
#> $ sample.yr  <dbl> 1997, 2000, 1997, 1996, 1996, 1997, 1996, 1999, 1996, 19...
#> $ kappa      <dbl> 5.700, 0.870, 4.360, 2.420, 1.320, 2.010, 0.430, 2.470, ...
#> $ lambda     <dbl> 4.860, 0.683, 3.850, 2.220, 1.690, 1.860, 0.880, 2.700, ...
#> $ flc.grp    <dbl> 10, 1, 10, 9, 6, 9, 1, 10, 9, 6, 9, 10, 3, 10, 9, 6, 7, ...
#> $ creatinine <dbl> 1.7, 0.9, 1.4, 1.0, 1.1, 1.0, 0.8, 1.2, 1.2, 0.8, 1.3, 1...
#> $ mgus       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
#> $ futime     <int> 85, 1281, 69, 115, 1039, 1355, 2851, 372, 3309, 1326, 27...
#> $ death      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
#> $ chapter    <fct> Circulatory, Neoplasms, Circulatory, Circulatory, Circul...

Create cumulative incidence data

The cuminc() function in the cmprsk package can generate data on the cumulative incidence of a censored outcome, accounting for competing risks if necessary.

# loading the cmprsk package to gain access to cuminc() function
library(cmprsk)

cml_inc = cuminc(
 # the first argument is the failure times, i.e., the time to event variable
 ftime   = flchain$futime, 
 # the second argument is the status variable (1 = event, 0 = censored)
 # (if you have competing risks, you will have values of 1, 2, 3, ... etc 
 #  for the different event types)
 fstatus = flchain$death,
 # the third argument is the group variable, which will be used to create 
 # one cumulative incidence curve per group.
 group   = flchain$creatinine > 1
)
#> 1350 cases omitted due to missing values

Note here that we have 1350 observations as they had missing values for creatinine. You should be aware of this mechanic in case you have missing values in your grouping variable.

Next we use the ggcompetingrisks package to create a dataset with cumulative incidence estimates. ggcompetingrisks also makes plots, but they are a little harder to customize than a standard ggplot2 plot.

# load the survminer package to gain access to the
# ggcompetingrisks function 
library(survminer)
#> Warning: package 'survminer' was built under R version 4.0.3
#> Loading required package: ggpubr
#> Warning: package 'ggpubr' was built under R version 4.0.3

ggdat <- cml_inc %>%
  ggcompetingrisks(conf.int = T) %>%
  getElement('data') %>%
  as_tibble() %>%
  select(
    time,
    group, # same group as in cuminc(),
    est, # est is cumulative incidence,
    std # std is the standard deviation of est
  )

print(ggdat)
#> # A tibble: 3,538 x 4
#>     time group      est      std
#>    <dbl> <chr>    <dbl>    <dbl>
#>  1     0 FALSE 0        0       
#>  2     1 FALSE 0        0       
#>  3     1 FALSE 0.000873 0.000504
#>  4     2 FALSE 0.000873 0.000504
#>  5     2 FALSE 0.00116  0.000582
#>  6     3 FALSE 0.00116  0.000582
#>  7     3 FALSE 0.00175  0.000713
#>  8     4 FALSE 0.00175  0.000713
#>  9     4 FALSE 0.00204  0.000770
#> 10     6 FALSE 0.00204  0.000770
#> # ... with 3,528 more rows

one figure, two groups, no standard errors

This code will generate a figure that directly compares the incidence of an event between two or more groups. The code utilizes the data that we created using cuminc() above.

ggplot(ggdat, aes(x = time, y = est, col = group))+
 geom_line(size=0.9)+
 # This is a theme that is consistent with many journals expectations
 # for figures
 theme_bw()+
 theme(
  panel.grid = element_blank(),
  text = element_text(size = 13, color = 'black', face = 'bold')
 ) +
 scale_y_continuous(label=scales::percent) +
 labs(x='Time, years', y = 'Cumulative Incidence, stroke or CHD')

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
cumulative-incidence.Rproj		cumulative-incidence.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cumulative-incidence

Create cumulative incidence data

one figure, two groups, no standard errors

About

Releases

Packages

Contributors 2

Languages

jhs-hwg/cumulative-incidence

Folders and files

Latest commit

History

Repository files navigation

cumulative-incidence

Create cumulative incidence data

one figure, two groups, no standard errors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages