-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat unassigned leases (as well as expired ones) as available-to-be-taken #848
base: master
Are you sure you want to change the base?
Treat unassigned leases (as well as expired ones) as available-to-be-taken #848
Conversation
613a96f
to
d0a2b95
Compare
d0a2b95
to
db34266
Compare
99822ea
to
5cdc9b5
Compare
I've rebased this PR on top of the latest The tests that @shanmsac introduced with #861 still pass with this updated PR, though I have tweaked one assertion to cope with the fact that the updated I hope this PR is still worth looking at, though obviously, having already managed to break things once with #847, I can understand why you might not trust this subsequent PR! |
.filter(lease -> !lease.isUnassigned() && !availableLeasesSet.contains(lease)) | ||
.collect(groupingBy(Lease::leaseOwner, summingInt(lease -> 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here filtering for !lease.isUnassigned()
is what now stops the groupingBy Lease::leaseOwner
from failing with the NullPointerException: element cannot be mapped to a null key
error reported in #861 - given that unassigned leases are filtered out, there is no attempt to insert a null
key entry in the map, so the code doesn't crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep this change given that #861 has been merged?
…taken If a lease is 'unassigned' (it has no lease owner) then it should be considered available for taking in `DynamoDBLeaseTaker`. Prior to this change, the only ways `DynamoDBLeaseTaker` could take leases for a scheduler was either by incremental lease stealing, or waiting for the lease to expire by not having been updated in `failoverTimeMillis` - which could be slow if `failoverTimeMillis` was set reasonably high (with it set to just 30s, I've seen new instances take over 3 minutes to take all leases from old instances in a deployment). This would be one half of the a fix for awslabs#845 - the other half of the fix is invoking `evictLease()` (setting the lease owner to null) on graceful shutdown of a scheduler.
5cdc9b5
to
f9d5077
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evicting leases during graceful shutdown would break the assumption here:
Lines 334 to 336 in a3e51d5
if (lease.leaseOwner() == null) { | |
// if this new lease is unowned, it's never been renewed. | |
lease.lastCounterIncrementNanos(0L); |
I'm also wondering if we can make use of resetting lastCounterIncrementNanos
so that we don't need extra checks on leaseOwner
?
.filter(lease -> !lease.isUnassigned() && !availableLeasesSet.contains(lease)) | ||
.collect(groupingBy(Lease::leaseOwner, summingInt(lease -> 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep this change given that #861 has been merged?
If a lease is 'unassigned' (it has no lease owner) then it should be considered available for taking in
DynamoDBLeaseTaker
. Prior to this change, the only waysDynamoDBLeaseTaker
could take leases for a scheduler was either by incremental lease stealing, or waiting for the lease to expire by not having been updated infailoverTimeMillis
- which could be slow iffailoverTimeMillis
was set reasonably high (with it set to just 30s, I've seen new instances take over 3 minutes to take all leases from old instances in a deployment).This would be one half of a fix for #845 - the other half is invoking
evictLease()
(setting the lease owner to null) on graceful shutdown of a scheduler.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.