Catalina enable-hdmi20 CoreDisplay patch leads to Code Signing crash of WindowServer #1335

lambdaupb · 2020-11-28T08:29:54Z

DeskMini 310, i5-8500 UHD630, Catalina 10.15.7, Opencore 0.6.3

related code (probably): https://github.com/acidanthera/WhateverGreen/blob/7d30dd8a624d0d3b2d4882fcc689b9db4964efd5/WhateverGreen/kern_cdf.cpp#L182

enable-hdmi20 patches CoreDisplay at runtime.
When in a High Memory Pressure situation it apparently happens that the CoreDisplay library memory is moved to swap.

When reloading the library memory to RAM, a code signing check is done and fails, causing a WindowServer crash.

I am able to reproduce this by using Prime95 > Torture Test > Large FFTs which allocates almost all of system memory and then doing some UI stuff involving animations etc (~1min).

Possible fixes

document that users need to disable code signing (SIP ?), not sure how to do that
maybe add MAP_RESILIENT_CODESIGN flag to mmap of library/dyld_cache (Crash on macOS (EXC_BAD_ACCESS (Code Signature Invalid)) VirusTotal/yara#1309) - I have 0 clue if that works for executable regions

logs

Process:               WindowServer [5465]
Path:                  /System/Library/PrivateFrameworks/SkyLight.framework/Versions/A/Resources/WindowServer
Identifier:            WindowServer
Version:               600.00 (451.4)
Code Type:             X86-64 (Native)
Parent Process:        launchd [1]
Responsible:           WindowServer [5465]
User ID:               88

PlugIn Path:             /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
PlugIn Identifier:       com.apple.CoreDisplay
PlugIn Version:          1.0 (186.6.15)

Date/Time:             2020-11-16 19:09:29.410 +0100
OS Version:            Mac OS X 10.15.7 (19H15)
Report Version:        12
Anonymous UUID:        066D0EDF-3DB8-4976-B736-5BD0416F165D

Sleep/Wake UUID:       E94190B2-19CB-47AB-B1AE-97DCA13B6988

Time Awake Since Boot: 150000 seconds
Time Since Wake:       100000 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (Code Signature Invalid)
Exception Codes:       0x0000000000000032, 0x00007fff347d72d9
Exception Note:        EXC_CORPSE_NOTIFY

Termination Reason:    Namespace CODESIGNING, Code 0x2

kernel messages:

VM Regions Near 0x7fff347d72d9:
    __TEXT                 00007fff347b8000-00007fff347d7000 [  124K] r-x/r-x SM=COW  /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
--> __TEXT                 00007fff347d7000-00007fff347d8000 [    4K] r-x/rwx SM=COW  /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
    Submap                 00007fff347d8000-00007fff40000000 [184.2M] r--/rwx SM=PRV  process-only VM submap

Application Specific Information:
StartTime:2020-11-16 18:31:50
GPU:IG
MetalDevice for accelerator(0x312b): 0x7ff210d29038 (MTLDevice: 0x7ff1e8048000)
IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/IGPU@2/AppleIntelFramebuffer@0

2020-11-17 01:00:58.772582+0100  localhost kernel[0]: CODE SIGNING: process 241[WindowServer]: rejecting invalid page at address 0x7fff330bf000 from offset 0xcfb7000 in file "/private/var/db/dyld/dyld_shared_cache_x86_64h" (cs_mtime:1605366281.472771946 == mtime:1605366281.472771946) (signed:0 validated:0 tainted:0 nx:0 wpmapped:0 dirty:1 depth:2)

The text was updated successfully, but these errors were encountered:

lambdaupb · 2020-11-29T03:59:58Z

/*
 * The MAP_RESILIENT_* flags can be used when the caller wants to map some
 * possibly unreliable memory and be able to access it safely, possibly
 * getting the wrong contents rather than raising any exception.
 * For safety reasons, such mappings have to be read-only (PROT_READ access
 * only).
 *
 * MAP_RESILIENT_CODESIGN:
 * 	accessing this mapping will not generate code-signing violations,
 *	even if the contents are tainted.
 * MAP_RESILIENT_MEDIA:
 *	accessing this mapping will not generate an exception if the contents
 *	are not available (unreachable removable or remote media, access beyond
 *	end-of-file, ...).  Missing contents will be replaced with zeroes.
 */
#define MAP_RESILIENT_CODESIGN	0x2000 /* no code-signing failures */
#define MAP_RESILIENT_MEDIA	0x4000 /* no backing-store failures */

Seems that only works for read only mappings.

vit9696 · 2020-11-30T08:34:55Z

That's very interesting, but I believe we cannot quite remap things here. Instead we should adjust the codesign flags as we already do, but perhaps in a slightly different manner. It may be possible that I missed some for the latest 10.15 version. Could you play with it and try setting/dropping different flags?

CC @usr-sse2 @osy86 @lvs1974 @07151129

al3xtjames · 2020-12-12T01:53:25Z

Can easily reproduce on 10.14.6 here: run P95 large FFTs until some swapping occurs, and then try to open About This Mac. This should cause WindowServer to crash.

sudo sysctl vm.cs_debug=255 adds some more info:

2020-12-11 19:35:59.509 Df kernel[0:1f4918] vm_fault: signed: no validate: no tainted: no wpmapped: no prot: 0x5
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODE SIGNING: cs_invalid_page(0x7fff3ad17000): p=38037[WindowServer]
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODE SIGNING: cs_invalid_page(0x7fff3ad17000): p=38037[WindowServer] final status 0x23007b01, denying page sending SIGKILL
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODE SIGNING: process 38037[WindowServer]: rejecting invalid page at address 0x7fff3ad17000 from offset 0xb89e000 in file "/private/var/db/dyld/dyld_shared_cache_x86_64h" (cs_mtime:1605723499.64038983 == mtime:1605723499.64038983) (signed:0 validated:0 tainted:0 nx:0 wpmapped:0 dirty:1 depth:2)
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODESIGNING: vm_fault_enter(0x7fff3ad17000): *** INVALID PAGE ***

sending SIGKILL means that CS_KILL was set (note that cs_invalid_page hasn't changed in 10.15).

lvs1974 · 2020-12-12T06:58:25Z

@al3xtjames: try to add a boot-arg -liluuseroff.

vit9696 · 2020-12-13T15:23:06Z

@al3xtjames @lambdaupb could you check whether the offset found by UserPatcher::vmProtect is correct? Because it clearly strips CS_KILL from the process.

lambdaupb · 2020-12-13T16:25:20Z

I'm not a C programmer and have no real Idea how to do that.
If I'm provided with step-by-step instruction, I can repro this though.

This machine is my daily driver at the moment so I'm reluctant to dive into it since my issue was solved by removing the enable-hdmi20 setting.

vit9696 · 2020-12-13T16:38:42Z

The easiest test is to enable Lilu debug logging and create a debug log in /var/log/Lilu_x.x.x.txt via -liludbgall liludump=60 boot arguments. Upload it here, and perhaps it sheds some light on the issue.

al3xtjames · 2020-12-15T23:41:20Z

Lilu is using 308 as the offset for p_csflags.
Lilu_1.5.1_18.7.txt

stevezhengshiqi · 2020-12-16T17:19:57Z

@al3xtjames thx a lot for the CoreDisplay fix on weg. Would you mind providing some more information about max-pixel-clock-frequency value? If you have time to update Manual in weg, then will be so nice.

zearp · 2020-12-20T13:08:02Z

I tried to reproduce on my NUC but couldn't. System becomes laggy but not unresponsive and it doesn't crash or even overheat. CPU usage went up and down, I guess thats part of the Large FFT torture test? I left it running for about 10 minutes whilst browsing Github and opening/closing the about my Mac dialog every now and then. My config can be found here.

As I mentioned here I believe these forced logouts on NUC 8th gens are due to missing ACPI patches and/or the OpenCore configuration used. But thats just my guess since I have no issues and run multiple NUCs. I have stress tested them with stress-ng quite heavily a few months ago. No problems whatsoever, these Kaby Lake NUCs are rock solid with OpenCore for me.

I'm running the latest versions of OpenCore/Lilu/etc and compiling everything from source now but also had no problems when I didn't do that and just used the release versions. Are there any other ways for me to try and reproduce this?

lambdaupb · 2020-12-20T14:08:59Z

@zearp thank you for your attempt at reproducing this issue!

I think you have SIP disabled with

<key>csr-active-config</key>
<data>/wcAAA==</data>

where /wcAAA== b64 is equal to ff 07 00 00 hex. Which according to Dorthania
https://dortania.github.io/OpenCore-Install-Guide/troubleshooting/extended/post-issues.html#disabling-sip

disables all SIP on Mojave / Catalina.

So code signing would be disabled and not kill WindowServer.

zearp · 2020-12-20T14:53:05Z

@lambdaupb Good point! I have it disabled cuz I use VoltageShift. I just repeated the test with SIP enabled. It did run a little hotter but after ~10 minutes of running Prime95 and opening about this Mac and Launchpad/Notification Centre a bunch of times I didn't get any crash. The fading animation varies from smooth to choppy but nothing grinds to a halt.

I'm thinking that the logouts people experienced on the NUC may have nothing to do with this, which is why I can't reproduce. Unless it also happens to you on a NUC but it seems you're using a different mini computer. I'm only here cuz you mentioned this in a NUC issue I was still subscribed to haha. But I can't seem to reproduce it on my NUCs.

lambdaupb · 2020-12-20T15:14:02Z

@zearp I have little experience with that setting, but could you check if SIP is really ~~disabled~~ enabled? The dorthania guide mentions it will not overwrite old values in NVRAM unless the property is mentioned in the delete section as well.

Note: Disabling SIP with OpenCore is quite a bit different compared to Clover, specifically that NVRAM variables will not be overwritten unless explicitly told so under the Delete section. So if you've already set SIP once either via OpenCore or in macOS, you must override the variable:
NVRAM -> Block -> 7C436110-AB2A-4BBB-A880-FE41995C9F82 -> csr-active-config

zearp · 2020-12-20T15:21:59Z

@lambdaupb Yes it was really enabled. I checked with csrutil status after rebooting and reset NVRAM in between boots for good measure. I was also prompted with a bunch of security warnings, those are due voltageShift, Intel Power Gadget and some other kexts I use. So my guess its that it's really turned on. Does this happen to you on a Kaby Lake NUC too or only on your DeskMini?

lambdaupb · 2020-12-20T15:28:13Z

My deskmini has a Coffee Lake R (I think) i5-8500 CPU.

There might be something else going on as well.
The crash report of WindowServer clearly shows a code signing crash on the NUC

appleserial/NUC8I5BEH#13

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (Code Signature Invalid)
Exception Codes:       0x0000000000000032, 0x00007fff37028253
Exception Note:        EXC_CORPSE_NOTIFY

Termination Reason:    Namespace CODESIGNING, Code 0x2

kernel messages:

VM Regions Near 0x7fff37028253:
    __TEXT                 00007fff37009000-00007fff37028000 [  124K] r-x/r-x SM=COW  /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
--> __TEXT                 00007fff37028000-00007fff37029000 [    4K] r-x/rwx SM=COW  /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
    Submap                 00007fff37029000-00007fff40000000 [143.8M] r--/rwx SM=PRV  process-only VM submap

So the issue exists and is fixed by removing enable-hdmi20 for me on 10.15 and @al3xtjames on 10.14.

It might very well be a combination with another setting or ACPI patch that triggers it though.

zearp · 2020-12-20T15:39:44Z

It might very well be a combination with another setting or ACPI patch that triggers it though.

@lambdaupb Yeah thats my guess too. What I will do is try the EFI from the repo you linked and report back in a bit. When I wrote Kaby Lake I meant Coffee Lake of course. I'm a pro at messing up those Intel codenames, sorry for any confusion it may have caused.

lambdaupb · 2020-12-20T15:41:28Z

Thanks for the help. I will try to reproduce this issue with opencore updated to 0.6.4 and all other modules updated as well.

vit9696 · 2020-12-20T15:55:36Z

Let me be clear:

The issue does exist and is specific to Lilu user patcher
Disabling SIP may hide the issue, but is not recommended
@al3xtjames provided an alternative to CDF patches
Lilu user patcher is not supported on 11.x, and that will unlikely change (thus the issue will unlikely be fixed)

zearp · 2020-12-20T16:16:34Z

@lambdaupb Just ran the same tests using the EFI from the repo you linked and again no crashes, SIP is enabled and the hdmi setting too. I'm thinking these random logouts people experienced on the NUC have nothing to do with this issue, which would explain my failure to reproduce it. But it doesn't mean there is no issue of course. I don't have a DeskMini 310 to play with but it looks like a fun little machine so I hope you can get this sorted.

The issue with the WindowServer crash you linked seems to be solved by a comment on a blog thats linked but I can't read the comment because the comments are not loading for me for some reason. I've not done any upgrading from 10.14.x to 10.15.x and only ever used Catalina and Big Sur on my NUCs. Maybe those crashes were related to the upgrade or something else in their setup? I think this specific issue isn't present on the NUC Coffee Lake models but do let me know if there's anything else I can try.

likaci · 2021-01-01T05:51:58Z

@lambdaupb Just ran the same tests using the EFI from the repo you linked and again no crashes, SIP is enabled and the hdmi setting too. I'm thinking these random logouts people experienced on the NUC have nothing to do with this issue, which would explain my failure to reproduce it. But it doesn't mean there is no issue of course. I don't have a DeskMini 310 to play with but it looks like a fun little machine so I hope you can get this sorted.

The issue with the WindowServer crash you linked seems to be solved by a comment on a blog thats linked but I can't read the comment because the comments are not loading for me for some reason. I've not done any upgrading from 10.14.x to 10.15.x and only ever used Catalina and Big Sur on my NUCs. Maybe those crashes were related to the upgrade or something else in their setup? I think this specific issue isn't present on the NUC Coffee Lake models but do let me know if there's anything else I can try.

@zearp Hi,
I can reproduce WindowServer crash with your EFI and https://github.com/appleserial/NUC8I5BEH 's EFI by running "Large FFTs".
And my NUC is upgraded from 10.14 .
Can you post the blog link?
Thank you.

zearp · 2021-01-01T14:44:54Z

@likaci You can’t follow the link I referred to and find the blog post yourself? Please
don't quote an entire post to only add a sentence.

Try if you can also reproduce it on a system that wasn’t upgraded from 10.14.x because no matter how long I let it run I get no crashes and I directly installed Catalina on mine.

I don’t have a 10.14.x installer laying around to do a clean install with and then upgrade to Catalina but I might try for the fun of it and see if I get crashes that way.

likaci · 2021-01-01T15:15:22Z

@zearp Sorry for my disturbing and bad english.
I have read the entire page but can't find the link that mentioned about upgrad from 10.14 may cause the problem.

I have only one NUC running some services , so I can't reinstall it.
I confirmed that Disable SIP or Disable HDMI2.0 can void the problem.

Thank you for your help, Happy new year.

Sher1ocks · 2021-03-21T14:44:05Z

I also had this problem in Big Sur.
In the Skylake laptop, only the freq of 1.5ghz or more was maintained, and the overheating phenomenon was constantly maintained, leading to poor performance.
It was resolved by turning off the enable-hdmi20 option.
thank you for tip!

vit9696 · 2021-03-28T09:26:44Z

enable-hdmi20 is deprecated in favour of max-pixel-clock feature (acidanthera/WhateverGreen#79). Although the issue is not exclusive to CDF side of WEG, userspace patching is implemented differently on Big Sur and above, and is not affected by this issue. I no longer use Catalina or older, and thus decided not to address this issue. Closing.

zearp · 2021-03-30T15:01:29Z

Does this mean that enable-max-pixel-clock-override replaces the enable-hdmi20 option? Will the option stay or will it be removed in future builds?

Because at the moment removing enable-hdmi20 and replacing it with enable-max-pixel-clock-override breaks 4k on Catalina and earlier.

It seems its not doing the same as the hdmi20 option did. But I may have misunderstood and/or not implemented it properly.

vit9696 · 2021-03-30T15:26:17Z

You may need higher max-pixel-clock-frequency (in Hz, defaults to 675000000). https://github.com/acidanthera/WhateverGreen/blob/master/Manual/FAQ.IntelHD.en.md#hdmi-in-uhd-resolution-with-60fps

vit9696 added the project:lilu label Nov 30, 2020

This was referenced Dec 9, 2020

Random logouts appleserial/NUC8I5BEH#17

Open

WindowServer keep crashing in 10.15.6 appleserial/NUC8I5BEH#13

Open

vit9696 closed this as completed Mar 28, 2021

zearp mentioned this issue Mar 29, 2021

DeviceProperties update to enable 4k, remove userspace dependencies for possible Big Sur zearp/OptiHack#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catalina enable-hdmi20 CoreDisplay patch leads to Code Signing crash of WindowServer #1335

Catalina enable-hdmi20 CoreDisplay patch leads to Code Signing crash of WindowServer #1335

lambdaupb commented Nov 28, 2020

lambdaupb commented Nov 29, 2020

vit9696 commented Nov 30, 2020

al3xtjames commented Dec 12, 2020 •

edited

Loading

lvs1974 commented Dec 12, 2020

vit9696 commented Dec 13, 2020 •

edited

Loading

lambdaupb commented Dec 13, 2020

vit9696 commented Dec 13, 2020

al3xtjames commented Dec 15, 2020 •

edited

Loading

stevezhengshiqi commented Dec 16, 2020 •

edited

Loading

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020 •

edited

Loading

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020

vit9696 commented Dec 20, 2020 •

edited

Loading

zearp commented Dec 20, 2020

likaci commented Jan 1, 2021

zearp commented Jan 1, 2021

likaci commented Jan 1, 2021

Sher1ocks commented Mar 21, 2021 •

edited

Loading

vit9696 commented Mar 28, 2021

zearp commented Mar 30, 2021

vit9696 commented Mar 30, 2021 •

edited

Loading

Catalina enable-hdmi20 CoreDisplay patch leads to Code Signing crash of WindowServer #1335

Catalina enable-hdmi20 CoreDisplay patch leads to Code Signing crash of WindowServer #1335

Comments

lambdaupb commented Nov 28, 2020

Possible fixes

logs

lambdaupb commented Nov 29, 2020

vit9696 commented Nov 30, 2020

al3xtjames commented Dec 12, 2020 • edited Loading

lvs1974 commented Dec 12, 2020

vit9696 commented Dec 13, 2020 • edited Loading

lambdaupb commented Dec 13, 2020

vit9696 commented Dec 13, 2020

al3xtjames commented Dec 15, 2020 • edited Loading

stevezhengshiqi commented Dec 16, 2020 • edited Loading

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020 • edited Loading

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020

zearp commented Dec 20, 2020

lambdaupb commented Dec 20, 2020

vit9696 commented Dec 20, 2020 • edited Loading

zearp commented Dec 20, 2020

likaci commented Jan 1, 2021

zearp commented Jan 1, 2021

likaci commented Jan 1, 2021

Sher1ocks commented Mar 21, 2021 • edited Loading

vit9696 commented Mar 28, 2021

zearp commented Mar 30, 2021

vit9696 commented Mar 30, 2021 • edited Loading

al3xtjames commented Dec 12, 2020 •

edited

Loading

vit9696 commented Dec 13, 2020 •

edited

Loading

al3xtjames commented Dec 15, 2020 •

edited

Loading

stevezhengshiqi commented Dec 16, 2020 •

edited

Loading

lambdaupb commented Dec 20, 2020 •

edited

Loading

vit9696 commented Dec 20, 2020 •

edited

Loading

Sher1ocks commented Mar 21, 2021 •

edited

Loading

vit9696 commented Mar 30, 2021 •

edited

Loading