Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of observations with birth_time==death_time #1604

Open
user799595 opened this issue Apr 11, 2024 · 1 comment
Open

Handling of observations with birth_time==death_time #1604

user799595 opened this issue Apr 11, 2024 · 1 comment
Labels

Comments

@user799595
Copy link

kmf = lifelines.KaplanMeierFitter()
kmf.fit([1, 2], event_observed=[1, 0], entry=[1, 0])
print(kmf.survival_function_)

Expected:

          KM_estimate
timeline             
0.0               1.0
1.0               0.5
2.0               0.5

Actual:

          KM_estimate
timeline             
0.0               1.0
1.0               0.0
2.0               0.0

I've read #497 and the corresponding comments

        # Why subtract entrants like this? see https://github.com/CamDavidsonPilon/lifelines/issues/497
        # specifically, we kill people, compute the ratio, and then "add" the entrants.
        # This can cause a problem if there are late entrants that enter but population=0, as
        # then we have log(0 - 0). We later ffill to fix this.
        # The only exception to this rule is the first period, where entrants happen _prior_ to deaths.

But I can't wrap my head around what this is saying. How could entrants not happen prior to deaths? If I have an observation with birth_time==death_time does that mean that it died before it was born?

I thought that the likelihood is

  • $P(T = d | T \ge b)$ for observed events
  • $P(T > d | T \ge b)$ for unobserved events
@CamDavidsonPilon
Copy link
Owner

This is an interesting issue, and I want to agree with your expected case. However, I'm also inclined to reject the case birth_time==death_time as pathological to lifelines. Based on that highlighted comment, it sounds like birth_times is actually birth_time + \epsilon. So if you want a true birth_time==death_time, you would add an epsilon to the death time:

kmf = lifelines.KaplanMeierFitter()
kmf.fit([1+1e-10, 2], event_observed=[1, 0], entry=[1, 0])
print(kmf.survival_function_)
          KM_estimate
timeline
0.0               1.0
1.0               1.0
1.0               0.5
2.0               0.5

This is terrible and not at all how I expect users to fix this. I'll have to think more about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants