-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[host] Prevent swift backtrace handler from firing when the runtime aborts on OSX #119429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For reasons yet to be confirmed, when an application throws an unhandled exception, the swift backtrace handler appears to be on for various versions of MacOS that we support. This leads to a messy and unclear output of a backtrace on the main host process. It can be worked around by setting the env variable SWIFT_BACKTRACE=enable=no. Contributes to dotnet#118823
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses an issue where Swift backtrace handler interferes with .NET host process exception handling on macOS, causing unclear output when applications throw unhandled exceptions. The fix disables Swift backtrace by setting the SWIFT_BACKTRACE=enable=no
environment variable early in the host process startup.
Key Changes
- Added macOS-specific environment variable setting to disable Swift backtrace functionality
- Positioned the fix at the beginning of the main function to ensure it takes effect before any exception handling
On dummy signal handler vs. set env variable:
|
The signal handler gets installed here: By Which is called by This is the same place that the environment variable is read. It seems very likely to me then that signal handler needs to be installed before the Swift runtime is loaded. |
Can't the signal handler be overwritten with another one? |
My read of it is the swift signal handler won't install if one is already set. |
Fair to conclude we should go with the signal handler approach? |
Yeah but I mean, couldn't dotnet replace the already installed Swift one? |
There is a range of options from small hammer to big hammer, with number of options between:
There is no obvious winner. If I were to pick, I would go with the smallest hammer possible. It works best for runtimes to be as little invasive as possible and avoid "owning the process" since it prevents multiple runtimes from peacefully co-existing in the same process. |
I pushed a change that registers a noop handler in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jkotas @elinor-fung are you two good with this change? I'd like to merge and get the servicing approvals going. |
I do not see this comment addressed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not seeing good explanation what makes NativeAOT immune to this problem. There seems to be something missing in the explanation what's going on.
After more review, I think the explanation is that the swift backtrace handler gets linked out. I suspect that's always going to happen because the handler is never going to get hit by any of our code since it only runs in a global constructor. To prove this, I modified my local sample to use |
I do not think that's right. I have set a breakpoint at Swift backtrace initialization. I see it being called in both regular CoreCLR and NativeAOT. In NativeAOT, it is called during process startup at:
In regular CoreCLR, it gets called when libcoreclr.dylib is loaded at:
Swift backtrace initialization fails in NativeAOT HelloWorld binary. You can see the failure if you enable tracing via environment variable:
Here is the code that emits this warning https://github.com/swiftlang/swift/blob/01d458b7a36c1fb971f32f4e6163b56d59b21d73/stdlib/public/runtime/Backtrace.cpp#L376-L392 . NativeAOT binaries are considered privileged since they are missing (This is what I have found so far. I plan to look into how we deal with |
The issue is a regression introduced between .NET 10 Preview 6 and .NET 10 Preview 7 SDKs. The app hosts produced by .NET 10 Preview 7 and newer SDKs have this issue, irrespective of the target runtime. For example, .NET 8 app published by .NET 10 SDK is going to hit this issue as well. See a log from my experiments below. @jtschuster @agocke This looks like a regression introduced by .NET 10 managed signer (#108992 and follow up PRs). Is it expected that the signatures produced by the managed signer result into behavior changes like this? (It is certainly an option to workaround the regression in the runtime. The workaround for the regression will have to be backported to .NET 8 and .NET 9 to make sure that .NET 10 SDK can continue target older runtimes. It would be preferable to fix the root cause of the regression instead.) The issue does not reproduce for apps launched using
.NET 9 app built using .NET SDK 9.0.305 - not affected:
.NET 9 app built using .NET 10 Preview 6 - not affected:
.NET 9 app built using .NET 10 Preview 7 - affected:
Self-contained .NET 8 app published using .NET 10 Preview 7 - affected:
|
If this is related to entitlements, it could be a result of #116659 which preserves the entitlements of the apphost. This is the only expected different behavior between .NET 9 and 10 with the managed signer. If that's the issue, I'd expect it to repro if we re-sign |
Yes, this makes
What is the user facing behavior improved by this PR? (In other words, what is going to break if this PR is reverted?) |
The change was motivated by #113707. Apps that are signed with the hardened runtime would have to re-add the entitlements that were stripped from the apphost, which isn't obvious. There was at least one other internal team that is expecting the entitlements to be preserved after that PR, but I can notify them if we do need to revert the change. |
I think we should revert the change in the signer.
If apps want to be signed with the hardened runtime, they should actively consider what entitlements they actually need and go with minimal set possible. We should not be blindly copying everything for them. It is unsecure default.
This should be fixed by better documentation. The documentation should lead with recommending NativeAOT as the form-factor to use for hardened environments. I believe that NativeAOT runtime does not need any additional entitlements to function - is that right? |
I agree copying all of the entitlements doesn't make sense, but it could make sense to add the jit entitlement by default. I can't think of any way a framework-dependendent or singlefile application wouldn't need it. NativeAOT and interpreter-based applications are the only exceptions. NAOT doesn't use the managed signer, I don't know about interpreter apps. |
Yes, I agree that it would make sense to add the minimum entitlements that are required for the system to function at all, such as the JIT entitlement for runtimes with the JIT. |
Note that this is effectively mandatory. All apps must enable the hardened runtime to be notarized. I think we should ensure that, even if we’re stripping entitlements, we’re adding back the minimum set needed for .NET to function. However, I would be surprised if we’re adding more entitlements to the apphost than exactly that. I’m not sure which missing entitlement is blocking the swift backtrace. |
Presumably the |
I guess we could keep the entitlement perseveration change and remove that entitlement, but it wouldn’t help older frameworks. We could add a new set of entitlements during apphost publish. Large change maybe. |
Is this "just" an entitlements problem? Presumably someone could apply |
I suspect not, but I'm waiting until we align on the entitlements before going there. |
I also thought ad hoc signing overwrote all these values. How is backtrace getting blocked when the apphost is ad hoc signed? Do some of the permissions still apply? Clearly JIT does because otherwise nothing from dotnet would work before the entitlements change. |
It is not clear to me why all hardened apps need (I understand why runtime w/ JIT needs
There is no good answer here. If we go with the fix in this PR, somebody may want to get notified about all aborts by subscribing to the signal. This change will break them. I think we should shoot for reasonable default experience (e.g. no swift debugger by default) and stay out of the way as much as possible otherwise. |
If we do more than strip all entitlements, I'd lean more towards this rather than tracking what to remove from the apphost entitlements. It also means we don't have to parse and edit the entitlements. But I think the "right" way to do this would be to create an entitlements itemgroup in the sdk which is passed to the managed signer, and a cross-cutting change like that would be a fair bit of new code, even if we make it off by default. But we also could just keep the minimal set hardcoded in the managed signer and add the itemgroup for 11. |
It is not easy to reverse engineer all details how this works. ( |
Overall, moving from preserving all entitlements to preserving a small, predetermined set makes sense to me. However, I'm worried about the lack of the |
I don't clearly remember it, but I think that entitlement |
This may explain why our single file SOS tests hit the floor on osx-arm64. |
To be clear, I'm fine with blocking debugging by default for release bits as part of Mac's policy to use hardened runtime by default. What I don't want is for local builds to suddenly become undebuggable. |
Closing as we went with #119824 instead |
A fairly recent change in the swift runtime fires the swift backtrace handler by default when a process has an unhandled exception. This applies to most / all of the MacOS versions we support.
This leads to a messy and unclear output of a backtrace on the main host process. We can work around it by setting a noop handler for
SIGABRT
before we callabort()
inPROCAbort
.Contributes to #118823