Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survival probabilities for stratified Cox model #63

Closed
hfrick opened this issue Jun 8, 2021 · 2 comments · Fixed by #66
Closed

Survival probabilities for stratified Cox model #63

hfrick opened this issue Jun 8, 2021 · 2 comments · Fixed by #66
Labels
bug an unexpected problem or unintended behavior

Comments

@hfrick
Copy link
Member

hfrick commented Jun 8, 2021

Something is wrong with the survival probabilities for the stratified Cox model (for both the "survival" and the "glmnet" engine): The prediction for time = 20 for the first observation is missing.

library(censored)
#> Loading required package: parsnip
library(survival)
library(tidyr)

# survival engine
spec_survival <- proportional_hazards() %>%
  set_mode("censored regression") %>%
  set_engine("survival")

set.seed(14)
fit_survival <- fit(spec_survival,
                    Surv(stop, event) ~ rx + size + number + strata(enum),
                    data = bladder)
#> Warning in .recacheSubclasses(def@className, def, env): undefined subclass
#> "numericVector" of class "Mnumeric"; definition not updated

pred_survival <- predict(fit_survival, new_data = bladder[1:2, ], 
                         type = "survival", time = c(10, 20))
pred_survival
#> # A tibble: 2 x 1
#>   .pred               
#>   <list>              
#> 1 <tibble[,2] [1 × 2]>
#> 2 <tibble[,2] [2 × 2]>
unnest(pred_survival, cols = c(".pred"))
#> # A tibble: 3 x 2
#>   .time .pred_survival
#>   <dbl>          <dbl>
#> 1    10          0.636
#> 2    10          0.934
#> 3    20          0.726

# glmnet engine
spec_glmnet <- proportional_hazards(penalty = 0.123) %>%
  set_mode("censored regression") %>%
  set_engine("glmnet")

set.seed(14)
fit_glmnet <- fit(spec_glmnet,
                  Surv(stop, event) ~ rx + size + number + strata(enum),
                  data = bladder)

pred_glmnet <- predict(fit_glmnet, new_data = bladder[1:2, ], type = "survival",
                       time = c(10, 20), penalty = 0.1)
pred_glmnet
#> # A tibble: 2 x 1
#>   .pred               
#>   <list>              
#> 1 <tibble[,2] [1 × 2]>
#> 2 <tibble[,2] [2 × 2]>
unnest(pred_glmnet, cols = c(".pred"))
#> # A tibble: 3 x 2
#>   .time .pred_survival
#>   <dbl>          <dbl>
#> 1    10          0.623
#> 2    10          0.929
#> 3    20          0.713

Created on 2021-06-08 by the reprex package (v2.0.0)

@hfrick hfrick added the bug an unexpected problem or unintended behavior label Jun 8, 2021
@hfrick
Copy link
Member Author

hfrick commented Jun 9, 2021

interpolate_km_values() relies on the event times (.time) in the KM estimator to be the same for all observations in new_data so that the join via .cuts works properly. For the stratified Cox model that is the case only within each stratum, thus we need to group/nest by strata.

For the glmnet engine, we already have the info of which observation in new_data is in which stratum and it speeds things up a lot compared to going back to the (slow but correct) nesting by observation. For the survival engine, we still need to access/generate that info.

@github-actions
Copy link

github-actions bot commented Nov 5, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant