Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on macOS related to code signing and third-party JIT compilation #63952

Closed
ClearScriptLib opened this issue Jan 18, 2022 · 36 comments
Closed
Labels
area-Host untriaged New issue has not been triaged by the area owner

Comments

@ClearScriptLib
Copy link

ClearScriptLib commented Jan 18, 2022

Description

Our .NET library hosts the V8 JavaScript engine – a native component that performs JIT compilation.

On macOS, .NET code that consumes our library works correctly when run as an application, (dotnet run), but crashes when run from the test host (dotnet test) or as a tool (dotnet tool run).

The crash occurs when V8 jumps into its JIT-compiled code. According to the Console application, the exception type is "EXC_CRASH (SIGKILL (Code Signature Invalid))". The native library that hosts V8 is signed with an official Microsoft signature.

This affects arm64 (M1) hardware, and possibly x64 in some scenarios. It does not affect Linux or Windows.

This issue was reported for our project here and here.

Reproduction Steps

Andrey Taritsyn has provided a minimal sample. Here are his instructions:

  • Clone a repository.
    • mkdir TemporaryProjects && cd TemporaryProjects
    • git clone https://github.com/Taritsyn/ClearScriptV8Tester.DotNetTool
  • Build a NuGet package.
    • cd ClearScriptV8Tester.DotNetTool
    • dotnet pack
  • Install and run a tool locally.
    • cd ..
    • mkdir TestDotNetTool && cd TestDotNetTool
    • dotnet new tool-manifest
    • dotnet tool install ClearScriptV8Tester.DotNetTool --add-source ../ClearScriptV8Tester.DotNetTool/nupkg
    • dotnet tool run clearscript-v8-tester

Expected behavior

The tool should print out "Number of iterations: 999999" in green characters.

Actual behavior

The tool crashes abruptly.

Regression?

We've only reproduced this issue on arm64 (M1) hardware, so it appears to be new in .NET 6. Our tests on x64 and Rosetta did not crash with .NET 6 or .NET 5, but some users are reporting that the issue affects .NET 6 on x64.

Known Workarounds

In a different scenario (see this issue), we were able to eliminate the crash by overwriting the official Microsoft code signature with the ad hoc linker signature as follows:

codesign --sign - --options linker-signed --force ClearScriptV8.osx-arm64.dylib

Configuration

We've confirmed the crash with .NET 6 in macOS 12.1 on the arm64 (M1) architecture.

Other information

Our coded tests are not affected by this issue, apparently because they test only locally built, unsigned (or linker-signed) libraries.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 18, 2022
@ghost
Copy link

ghost commented Jan 19, 2022

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Our .NET library hosts the V8 JavaScript engine – a native component that performs JIT compilation.

On macOS, .NET code that consumes our library works correctly when run as an application, (dotnet run), but crashes when run from the test host (dotnet test) or as a tool (dotnet tool run).

The crash occurs when V8 jumps into its JIT-compiled code. According to the Console application, the exception type is "EXC_CRASH (SIGKILL (Code Signature Invalid))". The native library that hosts V8 is signed with an official Microsoft signature.

This affects arm64 (M1) hardware, and possibly x64 in some scenarios. It does not affect Linux or Windows.

This issue was reported for our project here and here.

Reproduction Steps

Andrey Taritsyn has provided a minimal sample. Here are his instructions:

  • Clone a repository.
    • mkdir TemporaryProjects && cd TemporaryProjects
    • git clone https://github.com/Taritsyn/ClearScriptV8Tester.DotNetTool
  • Build a NuGet package.
    • cd ClearScriptV8Tester.DotNetTool
    • dotnet pack
  • Install and run a tool locally.
    • cd ..
    • mkdir TestDotNetTool && cd TestDotNetTool
    • dotnet new tool-manifest
    • dotnet tool install ClearScriptV8Tester.DotNetTool --add-source ../ClearScriptV8Tester.DotNetTool/nupkg
    • dotnet tool run clearscript-v8-tester

Expected behavior

The tool should print out "Number of iterations: 999999" in green characters.

Actual behavior

The tool crashes abruptly.

Regression?

We've only reproduced this issue on arm64 (M1) hardware, so it appears to be new in .NET 6. Our tests on x64 and Rosetta did not crash with .NET 6 or .NET 5, but some users are reporting that the issue affects .NET 6 on x64.

Known Workarounds

In a different scenario (see this issue), we were able to eliminate the crash by overwriting the official Microsoft code signature with the ad hoc linker signature as follows:

codesign --sign - --options linker-signed --force ClearScriptV8.osx-arm64.dylib

Configuration

We've confirmed the crash with .NET 6 in macOS 12.1 on the arm64 (M1) architecture.

Other information

Our coded tests are not affected by this issue, apparently because they test only locally built, unsigned (or linker-signed) libraries.

Author: ClearScriptLib
Assignees: -
Labels:

area-Host, untriaged

Milestone: -

@agocke
Copy link
Member

agocke commented Jan 19, 2022

@ClearScriptLib This sounds like the ARM64 signing restrictions to me. M1 Macs have new restrictions that force the binaries to be codesigned or MacOS will reject them.

Could you run codesign -v on your binaries and see which ones have bad signatures? That should narrow down whether this is a redistributed original file, or part of your app.

@ClearScriptLib
Copy link
Author

ClearScriptLib commented Jan 20, 2022

Hi @agocke,

@ClearScriptLib This sounds like the ARM64 signing restrictions to me. M1 Macs have new restrictions that force the binaries to be codesigned or MacOS will reject them.

That's what's so strange about this. Our library crashes only when it has an official Microsoft signature. And again, dotnet run always works; only dotnet test and dotnet tool run are affected.

Could you run codesign -v on your binaries and see which ones have bad signatures? That should narrow down whether this is a redistributed original file, or part of your app.

We've confirmed the crash using Andrey's reproduction steps above in conjunction with our official binaries. There are no bad signatures, unless the official Microsoft signature is bad somehow. Anyway, here's the codesign output:

% codesign --verify --verbose ClearScriptV8.osx-arm64.dylib
ClearScriptV8.osx-arm64.dylib: valid on disk
ClearScriptV8.osx-arm64.dylib: satisfies its Designated Requirement
%
% codesign --display --verbose=4 ClearScriptV8.osx-arm64.dylib
Executable=/Users/csl/test/ClearScriptV8Tester.DotNetTool/bin/Debug/net6.0/runtimes/osx-arm64/native/ClearScriptV8.osx-arm64.dylib
Identifier=ClearScriptV8.osx-arm64.dylib
Format=Mach-O thin (arm64)
CodeDirectory v=20200 size=145693 flags=0x0(none) hashes=4548+2 location=embedded
VersionPlatform=1
VersionMin=786432
VersionSDK=786688
Hash type=sha256 size=32
CandidateCDHash sha256=7dc4d026b1ae8974808b6cac5da2022c6c813dd2
CandidateCDHashFull sha256=7dc4d026b1ae8974808b6cac5da2022c6c813dd2c2e32d1e6865161d6218a204
Hash choices=sha256
CMSDigest=7dc4d026b1ae8974808b6cac5da2022c6c813dd2c2e32d1e6865161d6218a204
CMSDigestType=2
Page size=4096
CDHash=7dc4d026b1ae8974808b6cac5da2022c6c813dd2
Signature size=8980
Authority=Developer ID Application: Microsoft Corporation (UBF8T346G9)
Authority=Developer ID Certification Authority
Authority=Apple Root CA
Timestamp=Jan 11, 2022 at 10:11:50 AM
Info.plist=not bound
TeamIdentifier=UBF8T346G9
Sealed Resources=none
Internal requirements count=1 size=192
%

Please let us know if we can provide any additional information.

@agocke
Copy link
Member

agocke commented Jan 20, 2022

I wonder if this has something to do with a missing JIT entitlement. What does the following print for you?

codesign -d --entitlements - /usr/local/share/dotnet/dotnet

@ClearScriptLib
Copy link
Author

ClearScriptLib commented Jan 20, 2022

Here's the output for dotnet:

% codesign -d --entitlements - /usr/local/share/dotnet/dotnet
Executable=/usr/local/share/dotnet/dotnet
[Dict]
        [Key] com.apple.security.cs.allow-dyld-environment-variables
        [Value]
                [Bool] true
        [Key] com.apple.security.cs.allow-jit
        [Value]
                [Bool] true
        [Key] com.apple.security.cs.debugger
        [Value]
                [Bool] true
        [Key] com.apple.security.cs.disable-library-validation
        [Value]
                [Bool] true
        [Key] com.apple.security.get-task-allow
        [Value]
                [Bool] true
%

And here's the output for our officially signed native library:

% codesign -d --entitlements - ClearScriptV8.osx-arm64.dylib
Executable=/Users/csl/test/ClearScriptV8Tester.DotNetTool/bin/Debug/net6.0/runtimes/osx-arm64/native/ClearScriptV8.osx-arm64.dylib
%

Hmm, should our library also have the entitlements?

@janvorli
Copy link
Member

I think your library needs the allow-jit entitlement.

@ClearScriptLib
Copy link
Author

ClearScriptLib commented Jan 20, 2022

Hi @janvorli,

I think your library needs the allow-jit entitlement.

Interesting. We tried re-signing the library – first with only allow-jit and then with all the entitlements listed for the dotnet executable – but the crash still occurred.

We did use a local signing identity for this experiment, as opposed to an official Microsoft signature or a developer account. Could that still be a problem? If so, we could try an official Microsoft signature with entitlements, once we figure out how to do that 😂.

Thanks!

@janvorli
Copy link
Member

Ok, thank you for trying.
Thinking about it more, I am actually not sure what is the executable used to run the dotnet test, but I believe that the tool would have an executable named by the tool. That is a renamed apphost executable which I believe we don't sign. The reason is that the final executable belongs to the developer and so you are responsible for signing that one. We only sign the dotnet executable.
Maybe the dotnet test would also execute the test under other executable host than the dotnet.

Can you try to sign the actual binary of the tool that gets built using e.g. the same entitlements as the dotnet is signed with and see if it helps?

@vitek-karas
Copy link
Member

@MarcoRossignoli should know for sure about the dotnet test - I think it uses an apphost derived testhost. But I'm not sure.

@ClearScriptLib
Copy link
Author

Thinking about it more, I am actually not sure what is the executable used to run the dotnet test, but I believe that the tool would have an executable named by the tool. That is a renamed apphost executable which I believe we don't sign. The reason is that the final executable belongs to the developer and so you are responsible for signing that one. We only sign the dotnet executable.

Hmm. This might not be the right place to look, but there's a JSON file in ~/.dotnet/toolResolverCache that specifies "Runner":"dotnet" and "PathToExecutable":"/Users/.../ClearScriptV8Tester.DotNetTool.dll". We can't codesign a DLL, right?

@agocke
Copy link
Member

agocke commented Jan 20, 2022

No, I don’t think you can add entitlements for the libraries. The strange thing is: this repro passes in my M1 machine. There seems to be an apphost if I install globally, but not locally. Both configurations actually run fine for me.

@agocke
Copy link
Member

agocke commented Jan 20, 2022

Btw the apphost when I installed globally was in ~/.dotnet/tools

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jan 20, 2022

At the moment on Arm64 M1 that run a arm64 muxer version(dotnet) we start the host using muxer self(current running process), something like dotnet exec testhost.dll ...

To confirm that can you append --diag:log to the test command?

dotnet test ... --diag:log.txt

This will produce logs where we can verify how the host is started(also we did some changes between 6.0.1 and next 6.0.2/7 SDK).

@ClearScriptLib
Copy link
Author

Hi @agocke,

Both configurations actually run fine for me.

Wow, really? It's a 100% crash repro here. You're seeing the line "Number of iterations: 999999" in green? Perhaps V8 doesn't detect a hot path on your machine and doesn't bother with JIT compilation. Very odd.

@ClearScriptLib
Copy link
Author

Hi @MarcoRossignoli,

To confirm that can you append --diag:log to the test command?

Will do. BTW, does that not work with dotnet tool run?

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jan 20, 2022

I did a test on my M1 and confirm:

.NET SDK (reflecting any global.json):
 Version:   6.0.101
 Commit:    ef49f6213a

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  11.4
 OS Platform: Darwin
 RID:         osx.11.0-arm64
 Base Path:   /usr/local/share/dotnet/sdk/6.0.101/

Host (useful for support):
  Version: 6.0.1
  Commit:  3a25a7f1cc

.NET SDKs installed:
  6.0.101 [/usr/local/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 6.0.1 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 6.0.1 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET runtimes or SDKs:
  https://aka.ms/dotnet-download

Repro

dotnet new mstest
dotnet test --diag:log.txt

Logs

TpTrace Verbose: 0 : 14725, 5, 2022/01/20, 09:32:58.538, 5702078212067541, vstest.console.dll, DotnetTestHostManager: Assume published test project, with test host path = /Users/.../sdk_63952/bin/Debug/net6.0/testhost.dll.
TpTrace Verbose: 0 : 14725, 5, 2022/01/20, 09:32:58.538, 5702078212132833, vstest.console.dll, DotnetTestHostmanager: Full path of testhost.dll is /Users/.../sdk_63952/bin/Debug/net6.0/testhost.dll
TpTrace Verbose: 0 : 14725, 5, 2022/01/20, 09:32:58.538, 5702078212160250, vstest.console.dll, DotnetTestHostmanager: Full path of host exe is /usr/local/share/dotnet/dotnet
TpTrace Verbose: 0 : 14725, 7, 2022/01/20, 09:32:58.540, 5702078214517583, vstest.console.dll, DotnetTestHostManager: Starting process '/usr/local/share/dotnet/dotnet' with command line 'exec --runtimeconfig "/Users/.../sdk_63952/bin/Debug/net6.0/sdk_63952.runtimeconfig.json" --depsfile "/Users/.../sdk_63952/bin/Debug/net6.0/sdk_63952.deps.json" "/Users/.../sdk_63952/bin/Debug/net6.0/testhost.dll" --port 59769 --endpoint 127.0.0.1:059769 --role client --parentprocessid 14725 --diag "/Users/.../sdk_63952/log.host.22-01-20_09-32-58_53119_5.txt" --tracelevel 4 --telemetryoptedin false'

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jan 20, 2022

Will do. BTW, does that not work with dotnet tool run?

@ClearScriptLib I don't think so --diag is a flag used by VSTest platform.

@ClearScriptLib
Copy link
Author

ClearScriptLib commented Jan 20, 2022

Hi @MarcoRossignoli,

We have a trivial test project that confirms the crash via dotnet test. Here's the log: log.txt

Thanks!

@agocke
Copy link
Member

agocke commented Jan 22, 2022

My results:

$ dotnet clearscript-v8-tester
Start script execution...
Number of iterations: 999999

$ dotnet tool run clearscript-v8-tester
Start script execution...
Number of iterations: 999999

Hmm, still trying to figure out what could be different.

@agocke
Copy link
Member

agocke commented Jan 22, 2022

Ha! I updated to 6.0.1 and now it doesn't work. So something must have regressed

@agocke
Copy link
Member

agocke commented Jan 24, 2022

Welp, still haven't figured it out though. Mac reports everything seems to have the right entitlements and is signed appropriately...

I can confirm that dev-signing the dotnet host does make the problem go away though, so this is somehow related to the Mac permission system

@ClearScriptLib
Copy link
Author

I can confirm that dev-signing the dotnet host does make the problem go away though, so this is somehow related to the Mac permission system

Interestingly, linker-signing our native library (ClearScriptV8.osx-arm64.dylib) also makes the problem go away, so the OS takes both signatures into consideration. Pinning down the exact ruleset might take some voodoo.

We've reached out and requested instructions for including entitlements with the official signature. Presumably that's possible, since dotnet does it 😅. Maybe everything will work if both dotnet and our library are signed identically.

@ClearScriptLib
Copy link
Author

Maybe everything will work if both dotnet and our library are signed identically.

Nope. Just tried it. The crash still occurs.

@agocke
Copy link
Member

agocke commented Jan 27, 2022

Ah, it looks like allow-unsigned-executable-memory is required. Unfortunately, I don't think adding that to dotnet as a whole is a good idea. We've tried hard to make dotnet use as few entitlements as possible.

Support could be implemented for dotnet tools by having the tool install always use an apphost for executing the tool. The apphost is adhoc signed, so it will allow everything to execute.

I've confirmed that if you run the apphost created during tool install -g in ~/.dotnet/Tools/ClearScriptV8Tester

@janvorli
Copy link
Member

allow-unsigned-executable-memory means:
A Boolean value that indicates whether the app may create writable and executable memory without the restrictions imposed by using the MAP_JIT flag.

It basically prevents macOS from checking that only memory explicitly allocated as executable can be ever executable. If that helps, that seems to indicate that the V8 engine doesn't use the MAP_JIT flag in its mmap calls for executable memory. But then I am confused by the fact that dotnet run works.

@agocke
Copy link
Member

agocke commented Jan 27, 2022

I haven't actually tested this in dotnet, since using ad hoc signing seems to ignore entitlements entirely, but looking at electron, it appears that allow-unsigned-executable-memory is required for them, so I'm assuming it is for V8 as well. electron-userland/electron-builder#4040

@agocke
Copy link
Member

agocke commented Jan 27, 2022

@janvorli I believe dotnet run should now use the apphost, which is ad hoc signed, so I think that would explain it.

@janvorli
Copy link
Member

Ok, that seems to explain the problem then.

@ClearScriptLib
Copy link
Author

Fascinating! If we understand correctly, we should be able to fix this by adding that entitlement to our dylib. We'll give it a try!

@agocke
Copy link
Member

agocke commented Jan 27, 2022

I don't think that will work, Apple seems to state that entitlements must be in the executable:

https://developer.apple.com/documentation/security/hardened_runtime

You add entitlements only to executables. Shared libraries, frameworks, and in-process plug-ins inherit the entitlements of their host executable.

@agocke
Copy link
Member

agocke commented Jan 27, 2022

I filed dotnet/sdk#23640 for the SDK to explore more options for dotnet tools

@ClearScriptLib
Copy link
Author

I don't think that will work, Apple seems to state that entitlements must be in the executable:

But then why does the crash go away if we adhoc-sign our dylib?

@agocke
Copy link
Member

agocke commented Jan 27, 2022

Ad hoc signing seems to bypass quite a lot of restrictions, so I'm not sure. 😅

@ClearScriptLib
Copy link
Author

Yeah, adding that entitlement to our dylib doesn't work. Also, V8 does appear to be using MAP_JIT, but we haven't verified that in the debugger.

@agocke
Copy link
Member

agocke commented Feb 7, 2022

I don't think there's anything more for us to do in the runtime, so I'm going to close this as a dup of dotnet/sdk#23640

@agocke agocke closed this as completed Feb 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Host untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

6 participants