Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Predict survival time #37

Closed
hfrick opened this issue Jan 8, 2024 · 9 comments
Closed

Feature request: Predict survival time #37

hfrick opened this issue Jan 8, 2024 · 9 comments

Comments

@hfrick
Copy link

hfrick commented Jan 8, 2024

This issue showed someone trying to predict survival time with aorsf via tidymodels. We currently only have predictions of the survival probability implemented in censored. Looking around aorsf I didn't see any prediction type that we could wrap for "survival time". Is that correct? Would you consider implementing that? 🙌

@bcjaeger
Copy link
Collaborator

bcjaeger commented Jan 8, 2024

Thank you! I'd be happy to implement this. The biggest obstacle on my end is deciding how to do it. There are a few ways that could work:

  1. Compute median time-to-event in each predicted leaf and then aggregate (similar to bag_tree-rpart.R file in censored)
  2. Compute probability of censored weights (PCW), then fit a regression forest with those weights (similar to how you compute C-stat/Brier score using inverse PCW, building on ideas in this paper)
  3. Compute predicted mortality with aorsf and then use one of the existing survival time prediction methods to convert the predicted mortality to predicted time to event.

My thoughts on these:

  • I'd estimate that option 1. would take the most time to develop, followed by option 2, and then option 3.
  • I think 1. would have to be implemented in aorsf, 2. could be implemented in either aorsf or censored, and so could 3.
  • I have no idea which method would actually work best! That's not ideal because I'm tempted to develop all three and then compare them, but I realize you may not want to wait that long =/

@hfrick, do you have thoughts or preferences on how I should proceed? My initial impression is that I like option 1 because it would be the most efficient computationally. However, it would also take me a little while to get it working and then run it through proper tests to make sure it's right.

@hfrick
Copy link
Author

hfrick commented Jan 10, 2024

Option 1 of median time-to-event is, I think, the most common option and sounds the most straightforward in terms of definition. It'd be great to see that feature live in aorsf given that I think it'd be attractive for users both of aorsf directly and via a framework. Re time: no particular rush. We are currently actively working on survival analysis in tidymodels and want to release a whole lot of new features across the framework in Q1 but we can integrate survival time prediction via aorsf in censored at any time.

@bcjaeger
Copy link
Collaborator

Thank you! I appreciate your thoughts on this very much. I will move ahead using median time-to-event and keep you updated.

@hfrick
Copy link
Author

hfrick commented Jan 10, 2024

Thanks so much for your willingness to implement this! 🙏

@bcjaeger
Copy link
Collaborator

Hello @hfrick! I'm happy to share an update. With aorsf version 0.1.3 and higher, models can predict survival time (reprex below). I have done some preliminary assessment of the predicted survival times and they seem to be a little less effective at discriminating high versus low risk cases than the mortality (pred_type = 'mort') option. This makes sense to me. I think mortality predictions do a better job of quantifying observed events.

Do you think it would be feasible for me to propose making predicted mortality the default for aorsf in yardstick::concordance_survival(), instead of predicted time? If so, I'd be happy to work on a PR implementing that change. If not, I'm happy to at least resolve the compatibility issue noted in tidymodels/yardstick#475

library(aorsf)

fit_time <- orsf(pbc_orsf, time + status ~ . - id, 
                 oobag_pred_type = 'time')

predict(fit_time, new_data = pbc_orsf[1:3, ], pred_type = 'time')
#>          [,1]
#> [1,]  360.580
#> [2,] 2555.766
#> [3,] 1195.855

fit_time$eval_oobag$stat_values
#>           [,1]
#> [1,] 0.8360331

fit_mort <- orsf_update(fit_time, oobag_pred_type = 'mort')

fit_mort$eval_oobag$stat_values
#>           [,1]
#> [1,] 0.8435335

Created on 2024-01-22 with reprex v2.1.0

@hfrick
Copy link
Author

hfrick commented Jan 23, 2024

That's awesome, thank you! 🎉 I've opened tidymodels/censored#301 to enable that in censored. Given that there is such a high focus on consistency across tidymodels, I don't think we are likely to change what the default is for any one engine. At that abstraction level, the goal is typically to not have to remember details about an engine. Mortality predictions are also currently not part of tidymodels but that is something that might change in future. If that happens, that would be the opportunity to enable that for aorsf and others and possibly revist defaults.

@bcjaeger
Copy link
Collaborator

I totally understand prioritizing consistency! This is a good incentive for me to investigate more thoughtful ways for aorsf to predict survival time. I will check out tidymodels/censored#301 and prepare a PR. If there is a deadline for that feature being in censored, just let me know and I'll be happy to coordinate.

Thanks for your help improving aorsf! It is great working with you.

@bblodfon
Copy link

bblodfon commented Mar 1, 2024

Hi @bcjaeger! Sorry for intruding in this issue :)

  1. Could we maybe have the survival time in mlr3extralearners as well (this would be a response prediction type, see mlr3proba::.surv_return())? https://github.com/mlr-org/mlr3extralearners/blob/main/R/learner_aorsf_surv_aorsf.R#L178
  2. I was just reading a paper where they the authors calculate survival time from a distribution S(t). In the end, a time-interval weighted approach might be applicable to aorsf and easy to implement as you get the survival matrix S(t) (observations x times) and can implement easily the equation (6) from that paper (I think it should have a denominator of (t_max - t_min) in there as well...). Of course, these type of calculations might not be ideal in cases where the distribution is improper, as was shown in the C-hacking paper, fig 2

@bcjaeger
Copy link
Collaborator

bcjaeger commented Mar 1, 2024

Very nice! I will try this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants