Skip to content

Fix iOS termination crash in ProvisionalDispatcher#2056

Closed
jpsim wants to merge 1 commit intomainfrom
fix-ios-termination-crash
Closed

Fix iOS termination crash in ProvisionalDispatcher#2056
jpsim wants to merge 1 commit intomainfrom
fix-ios-termination-crash

Conversation

@jpsim
Copy link
Contributor

@jpsim jpsim commented Feb 14, 2022

When iOS is terminating a process, it may release some static memory, which could cause the event_dispatcher_ variable to point to invalid memory, leading to a crash.

A crash that occurs when an app is already in the process of being terminated isn't too bad, considering it's already shutting down, but this matters to some extent, considering that crash reporters will report this as an app crash, and the OS may choose to penalize your app by not proactively suggesting it.

This should fix a crash with a backtrace such as this one:

EXC_BAD_ACCESS: Attempted to dereference garbage pointer 0x8000000000000010.

0 Envoy::Event::ProvisionalDispatcher::post(std::__1::function<void ()>)
1 Envoy::EngineHandle::runOnEngineDispatcher(long, std::__1::function<void (Envoy::Engine&)>)
2 start_stream
3 -[EnvoyHTTPStreamImpl initWithHandle:callbacks:explicitFlowControl:]
4 -[EnvoyEngineImpl startStreamWithCallbacks:explicitFlowControl:]
5 StreamPrototype.start(queue:)

Signed-off-by: JP Simard jp@jpsim.com

When iOS is terminating a process, it may release some static memory,
which could cause the `event_dispatcher_` variable to point to invalid
memory, leading to a crash.

A crash that occurs when an app is already in the process of being
terminated isn't too bad, considering it's already shutting down, but
this matters to some extent, considering that crash reporters will
report this as an app crash, and the OS may choose to penalize your app
by not proactively suggesting it.

This should fix a crash with a backtrace such as this one:

```
EXC_BAD_ACCESS: Attempted to dereference garbage pointer 0x8000000000000010.

0 Envoy::Event::ProvisionalDispatcher::post(std::__1::function<void ()>)
1 Envoy::EngineHandle::runOnEngineDispatcher(long, std::__1::function<void (Envoy::Engine&)>)
2 start_stream
3 -[EnvoyHTTPStreamImpl initWithHandle:callbacks:explicitFlowControl:]
4 -[EnvoyEngineImpl startStreamWithCallbacks:explicitFlowControl:]
5 StreamPrototype.start(queue:)
```

Signed-off-by: JP Simard <jp@jpsim.com>
@jpsim
Copy link
Contributor Author

jpsim commented Feb 14, 2022

Caveat: I haven't been able to reproduce this crash, nor validate that this would actually fix it. I'm just making an educated guess based on the backtrace at the time of the crash.

@jpsim jpsim changed the title Fix iOS termination crash Fix iOS termination crash in ProvisionalDispatcher Feb 14, 2022
@jpsim jpsim requested a review from snowp February 14, 2022 21:20
if (drained_) {
event_dispatcher_->post(callback);
return ENVOY_SUCCESS;
if (event_dispatcher_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is for shutdown, when does this get set to null? I wouldn't expect the destructor to null this out

If this is post-destruction I think ideally you want to avoid calling post at all since we're in undefined behavior territory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that makes sense, but how could I detect that we're post-destruction at the call site here?

return e->dispatcher().post([func]() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ way is to never have handles to anything that is deallocated, since there's really no way to inspect something post-destruction to see if destruction has happened (this is UB if done naively). What I would probably want to understand is how we can lock the shared ptr but end up with a handle that is not guaranteed to be valid for the duration of this shared ptr. If we were calling terminateEngine I can see how this happens (we call terminate even though there might be external references to the engine), but I think in the cases we're seeing the issue we're not explicitly terminating? So not sure how this plays out in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll think about this some more and re-open a new PR if I come up with a good way to handle this.

@jpsim jpsim closed this Feb 15, 2022
@jpsim jpsim deleted the fix-ios-termination-crash branch February 15, 2022 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants