Skip to content

eds: use-after-free in LEDS callback on subsequent EDS updates#43667

Merged
botengyao merged 2 commits intoenvoyproxy:mainfrom
wdauchy:fix/leds-use-after-free
Mar 2, 2026
Merged

eds: use-after-free in LEDS callback on subsequent EDS updates#43667
botengyao merged 2 commits intoenvoyproxy:mainfrom
wdauchy:fix/leds-use-after-free

Conversation

@wdauchy
Copy link
Contributor

@wdauchy wdauchy commented Feb 26, 2026

Commit Message:
The LEDS subscription callback lambda captured used_load_assignment by value as a raw pointer to the object owned by the cluster_load_assignment_ unique_ptr. When a subsequent EDS update reassigned cluster_load_assignment_, the old object was destroyed but existing LEDS subscriptions (not recreated for unchanged configs) still held the dangling pointer. When the LEDS subscription later fired its callback (e.g. onConfigUpdateFailed), dereferencing this pointer caused a segfault.

Stack trace:

  #0: [0x77b9d6de8330]
  #1: Envoy::Upstream::EdsClusterImpl::BatchUpdateHelper::batchUpdate()
  #2: Envoy::Upstream::PrioritySetImpl::batchHostUpdate()
  #3: std::__1::__function::__func<>::operator()()
  #4: Envoy::Upstream::LedsSubscription::onConfigUpdateFailed()
  #5: Envoy::Config::GrpcSubscriptionImpl::onConfigUpdateFailed()
  #6: event_process_active_single_queue
  #7: event_base_loop
  #8: Envoy::Server::InstanceBase::run()

Fix by capturing this and accessing cluster_load_assignment_ directly, which always reflects the current valid assignment.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #43667 was opened by wdauchy.

see: more, trace.

The LEDS subscription callback lambda captured `used_load_assignment`
by value as a raw pointer to the object owned by the
`cluster_load_assignment_` unique_ptr. When a subsequent EDS update
reassigned `cluster_load_assignment_`, the old object was destroyed
but existing LEDS subscriptions (not recreated for unchanged configs)
still held the dangling pointer. When the LEDS subscription later
fired its callback (e.g. onConfigUpdateFailed), dereferencing this
pointer caused a segfault.

Stack trace:

  #0: [0x77b9d6de8330]
  envoyproxy#1: Envoy::Upstream::EdsClusterImpl::BatchUpdateHelper::batchUpdate()
  envoyproxy#2: Envoy::Upstream::PrioritySetImpl::batchHostUpdate()
  envoyproxy#3: std::__1::__function::__func<>::operator()()
  envoyproxy#4: Envoy::Upstream::LedsSubscription::onConfigUpdateFailed()
  envoyproxy#5: Envoy::Config::GrpcSubscriptionImpl::onConfigUpdateFailed()
  envoyproxy#6: event_process_active_single_queue
  envoyproxy#7: event_base_loop
  envoyproxy#8: Envoy::Server::InstanceBase::run()

Fix by capturing `this` and accessing `cluster_load_assignment_`
directly, which always reflects the current valid assignment.

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
@wdauchy wdauchy force-pushed the fix/leds-use-after-free branch from 70e391e to ee9057a Compare February 27, 2026 08:14
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 27, 2026

/retest transients

1 similar comment
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 27, 2026

/retest transients

@wdauchy wdauchy marked this pull request as ready for review February 27, 2026 09:16
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 28, 2026

cc @botengyao

@botengyao botengyao merged commit a20c0ab into envoyproxy:main Mar 2, 2026
28 checks passed
bmjask pushed a commit to bmjask/envoy that referenced this pull request Mar 14, 2026
…proxy#43667)

Commit Message:
The LEDS subscription callback lambda captured `used_load_assignment` by
value as a raw pointer to the object owned by the
`cluster_load_assignment_` unique_ptr. When a subsequent EDS update
reassigned `cluster_load_assignment_`, the old object was destroyed but
existing LEDS subscriptions (not recreated for unchanged configs) still
held the dangling pointer. When the LEDS subscription later fired its
callback (e.g. onConfigUpdateFailed), dereferencing this pointer caused
a segfault.

Stack trace:
```
  #0: [0x77b9d6de8330]
  envoyproxy#1: Envoy::Upstream::EdsClusterImpl::BatchUpdateHelper::batchUpdate()
  envoyproxy#2: Envoy::Upstream::PrioritySetImpl::batchHostUpdate()
  envoyproxy#3: std::__1::__function::__func<>::operator()()
  envoyproxy#4: Envoy::Upstream::LedsSubscription::onConfigUpdateFailed()
  envoyproxy#5: Envoy::Config::GrpcSubscriptionImpl::onConfigUpdateFailed()
  envoyproxy#6: event_process_active_single_queue
  envoyproxy#7: event_base_loop
  envoyproxy#8: Envoy::Server::InstanceBase::run()
```

Fix by capturing `this` and accessing `cluster_load_assignment_`
directly, which always reflects the current valid assignment.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
Signed-off-by: bjmask <11672696+bjmask@users.noreply.github.com>
bvandewalle pushed a commit to bvandewalle/envoy that referenced this pull request Mar 17, 2026
…proxy#43667)

Commit Message:
The LEDS subscription callback lambda captured `used_load_assignment` by
value as a raw pointer to the object owned by the
`cluster_load_assignment_` unique_ptr. When a subsequent EDS update
reassigned `cluster_load_assignment_`, the old object was destroyed but
existing LEDS subscriptions (not recreated for unchanged configs) still
held the dangling pointer. When the LEDS subscription later fired its
callback (e.g. onConfigUpdateFailed), dereferencing this pointer caused
a segfault.

Stack trace:
```
  #0: [0x77b9d6de8330]
  envoyproxy#1: Envoy::Upstream::EdsClusterImpl::BatchUpdateHelper::batchUpdate()
  envoyproxy#2: Envoy::Upstream::PrioritySetImpl::batchHostUpdate()
  envoyproxy#3: std::__1::__function::__func<>::operator()()
  envoyproxy#4: Envoy::Upstream::LedsSubscription::onConfigUpdateFailed()
  envoyproxy#5: Envoy::Config::GrpcSubscriptionImpl::onConfigUpdateFailed()
  envoyproxy#6: event_process_active_single_queue
  envoyproxy#7: event_base_loop
  envoyproxy#8: Envoy::Server::InstanceBase::run()
```

Fix by capturing `this` and accessing `cluster_load_assignment_`
directly, which always reflects the current valid assignment.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants