[PAL/Linux-SGX] AEX-Notify 5/5: Add AEX-Notify flows in exception handling #2037

dimakuv · 2024-10-16T13:31:28Z

Description of the changes

Part 5 in AEX-Notify series.

This PR adds the AEX-Notify flows inside the enclave.

The stage-1 signal handler is augmented as follows when AEX-Notify is enabled: manually restore SSA[0] context, invoke the EDECCSSA instruction instead of EEXIT (to go from SSA[1] to SSA[0] without exiting the enclave) and finally jump to SSA[0].GPRSGX.RIP to resume enclave execution (it will resume in stage-2 signal handler).

The stage-2 signal handler is augmented as follows: set bit 0 of SSA[0].GPRSGX.AEXNOTIFY (so that AEX-Notify starts working again for this thread), then apply AEX-Notify mitigations and finally restore regular enclave execution.

This PR does not add any real AEX-Notify mitigations. Instead, we count the number of AEX events reported inside the SGX enclave and print this number on enclave termination (if log level is at least "warning").

Note that current implementation of AEX-Notify does not use the checkpoint mechanism described in the official AEX-Notify whitepaper. That checkpoint mechanism allows to coalesce multiple AEX events that occur during the execution of mitigations. This saves some CPU cycles and some signal-handling stack space, but we leave implementing this optimization as future work.

How to test this PR?

AEX-Notify is enabled in all LibOS/PAL test manifests if AEXNOTIFY=1 environment variable is set.

This change is

dimakuv

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):
Debug failure with GDB support (try LibOS regression tests that use GDB).

dimakuv

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
Debug some failures with EDMM (try LibOS regression tests).

dimakuv

Reviewable status: 0 of 11 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Debug failure with GDB support (try LibOS regression tests that use GDB).

Done. AEX-Notify is not really compatible with GDB, see e.g. the official whitepaper: https://cdrdv2-public.intel.com/736463/aex-notify-white-paper-public.pdf, Section 8.

libos/test/regression/manifest.template line 27 at r2 (raw file):

sgx.edmm_enable = {{ 'true' if env.get('EDMM', '0') == '1' else 'false' }}
sgx.use_exinfo = {{ 'true' if env.get('EDMM', '0') == '1' else 'false' }}
sgx.experimental_enable_aex_notify = {{ 'true' if env.get('AEXNOTIFY', '0') == '1' else 'false' }}

For now added only in this manifest file, but technically should add in all files (similar to sgx.edmm_enable) and have at least one CI pipeline with AEXNOTIFY=1 envvar.

pal/src/host/linux-sgx/host_exception.c line 397 at r2 (raw file):

noreturn void fail_on_morphed_eresume(void) {
    log_error("Bug in AEX-Notify flows: ERESUME morphed into EENTER but then the enclave performed "
              "EEXIT instead of EDECCSSA. Please debug.");

This particular bug (data race) made my brain boil for two days. I definitely want to keep this diagnostics for future, if we ever have more bugs like this.

Explaining this data race is hard, but basically:

AEX-Notify now allows ERESUME to morph into EENTER.
EENTER may be exited via EDECCSSA (assumed in AEX-Notify) or via EEXIT (legacy non-AEX-Notify flows).
The above implies that ERESUME can end up in EEXIT, and the enclave should jump out to the "exit target" that by our Gramine convention is specified in RDX reg (I mean the Gramine convention for EEXIT).
But Gramine also assumes that ERESUME never returns, and this is now broken -> data race! Note that before, RDX reg was random garbage upon ERESUME (which made sense before, as ERESUME would never use RDX and would never return).

pal/src/host/linux-sgx/pal_exception.c line 61 at r2 (raw file):

    MB();
    SET_ENCLAVE_TCB(ready_for_aex_notify, 0UL);
    MB();

I feel like this could be implemented in a cleaner way, maybe re-using some of the existing variables... I am definitely not proud of this stopping_aex_notify helper variable, but this seemed easy.

dimakuv

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Debug some failures with EDMM (try LibOS regression tests).

Done

dimakuv

Reviewable status: 0 of 61 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):
In my local tests, everything seems to work. I stress-tested all our Gramine LibOS and PAL tests, with and without EDMM.

To merge this PR, our CI should have AEX-Notify-supporting workers, and AEXNOTIFY=1 envvar must be set in at least one Jenkins pipeline. This enablement must be done similarly to the EDMM=1 one.

I currently put a blocking comment on this, not to forget about updating the CI.

libos/test/regression/manifest.template line 27 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

For now added only in this manifest file, but technically should add in all files (similar to sgx.edmm_enable) and have at least one CI pipeline with AEXNOTIFY=1 envvar.

Done, now added everywhere. The enablement is similar to EDMM (with its EDMM=1 envvar).

dimakuv

Reviewable status: 0 of 61 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):
Quick perf numbers for Gramine built in Release mode on Ubuntu 24.04 with Linux v6.11.

Not sure if they are useful, just wanted to post here. They show that current AEX-Notify (with dummy mitigation) has small overhead.

make clean; AEXNOTIFY=0 EDMM=0 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 200.36s
make clean; AEXNOTIFY=0 EDMM=1 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 138.00s
make clean; AEXNOTIFY=1 EDMM=0 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 208.25s
make clean; AEXNOTIFY=1 EDMM=1 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 141.50s

This commit adds the AEX-Notify flows inside the enclave. The stage-1 signal handler is augmented as follows when AEX-Notify is enabled: manually restore SSA[0] context, invoke the EDECCSSA instruction instead of EEXIT (to go from SSA[1] to SSA[0] without exiting the enclave) and finally jump to SSA[0].GPRSGX.RIP to resume enclave execution (it will resume in stage-2 signal handler). The stage-2 signal handler is augmented as follows: set bit 0 of SSA[0].GPRSGX.AEXNOTIFY (so that AEX-Notify starts working again for this thread), then apply AEX-Notify mitigations and finally restore regular enclave execution. This commit does not add any real AEX-Notify mitigations. Instead, we count the number of AEX events reported inside the SGX enclave and print this number on enclave termination (if log level is at least "warning"). Note that current implementation of AEX-Notify does not use the checkpoint mechanism described in the official AEX-Notify whitepaper. That checkpoint mechanism allows to coalesce multiple AEX events that occur during the execution of mitigations. This saves some CPU cycles and some signal-handling stack space, but we leave implementing this optimization as future work. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

Fixed GDB issue. Fixed a SIGSEGV data race on thread termination (ERESUME morphs into EENTER but then performs EEXIT). Added AEXNOTIFY envvar to LibOS regression tests (but only to a subset from `manifest.template`, simply because changing all manifest template files would be a huge git diff). Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

Fixed EDMM issue. Turned out to be a case of too many nested signal handlers inside Gramine's SGX PAL, which overflowed the SGX enclave signal stack. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

This commit adds conditional AEX-Notify enablement to all Gramine tests. Run tests e.g. like this (on a machine that supports AEX-Notify both in hardware and in Linux kernel): $ EDMM=1 AEXNOTIFY=1 SGX=1 gramine-test pytest Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

dimakuv

Reviewable status: 0 of 61 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):
Must be applied on top of #2036. Blocking.

scottconstable · 2024-10-25T23:40:12Z

Reviewable status: 0 of 61 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file): Quick perf numbers for Gramine built in Release mode on Ubuntu 24.04 with Linux v6.11.

Not sure if they are useful, just wanted to post here. They show that current AEX-Notify (with dummy mitigation) has small overhead.

make clean; AEXNOTIFY=0 EDMM=0 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 200.36s

make clean; AEXNOTIFY=0 EDMM=1 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 138.00s

make clean; AEXNOTIFY=1 EDMM=0 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 208.25s

make clean; AEXNOTIFY=1 EDMM=1 SGX=1 gramine-test pytest -k 'not TC_04_Attestation' -- done in 141.50s

Do you have any intuition as to why AEX-Notify introduces overhead for these tests? In the Intel SGX SDK implementation, we never observed overheads introduced by AEX-Notify, except in the microbenchmarks that repeatedly enter and exit the enclave and do nothing else.

scottconstable · 2024-10-25T23:45:48Z

Note that current implementation of AEX-Notify does not use the checkpoint mechanism described in the official AEX-Notify whitepaper. That checkpoint mechanism allows to coalesce multiple AEX events that occur during the execution of mitigations. This saves some CPU cycles and some signal-handling stack space, but we leave implementing this optimization as future work.

IMO the checkpoint mechanism is not strictly an optimization. In very specific circumstances, if the enclave repeatedly takes interrupts in the AEX-Notify handler that prevent the stack from unwinding, the enclave thread could run out of stack. This is a different user-observable behavior that would be introduced by this PR, and therefore the checkpoint mechanism--which prevents stack overflow--is not merely an optimization.

dimakuv

Reviewable status: 0 of 61 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @scottconstable)

a discussion (no related file):

Previously, scottconstable (Scott Constable) wrote…

Do you have any intuition as to why AEX-Notify introduces overhead for these tests? In the Intel SGX SDK implementation, we never observed overheads introduced by AEX-Notify, except in the microbenchmarks that repeatedly enter and exit the enclave and do nothing else.

To be honest, no, I don't know. I just assumed that this tiny overhead is normal for AEX-Notify. Also note that this is the complete start-to-end time from starting the first regression test to finishing the last regression test, so maybe some "enclave startup" AEX-Notify-specific logic explains this overhead?

a discussion (no related file):
@scottconstable wrote:

IMO the checkpoint mechanism is not strictly an optimization. In very specific circumstances, if the enclave repeatedly takes interrupts in the AEX-Notify handler that prevent the stack from unwinding, the enclave thread could run out of stack. This is a different user-observable behavior that would be introduced by this PR, and therefore the checkpoint mechanism--which prevents stack overflow--is not merely an optimization.

Yes, it's correct, the enclave thread could run out of stack. In this case, Gramine chooses a fail-fast approach and hangs such a thread:

gramine/pal/src/host/linux-sgx/enclave_entry.S

Lines 431 to 438 in 6be2b4a

    
           # Disallow too many nested exceptions. In normal Gramine flow, this should never happen. Since 
        
           # addresses need to be canonical, this addition does not overflow. 
        
           movq %gs:SGX_SIG_STACK_HIGH, %rax 
        
           addq %gs:SGX_SIG_STACK_LOW, %rax 
        
           shrq $1, %rax 
        
           cmp %rax, %rsi 
        
           jae 1f 
        
           FAIL_LOOP

I hope Gramine never hits these "very specific circumstances" in benign executions. If it does, then yes, the checkpoint mechanism would need to be implemented on top of the current code.

scottconstable

Reviewable status: 0 of 61 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

To be honest, no, I don't know. I just assumed that this tiny overhead is normal for AEX-Notify. Also note that this is the complete start-to-end time from starting the first regression test to finishing the last regression test, so maybe some "enclave startup" AEX-Notify-specific logic explains this overhead?

I don't see any reason that AEX-Notify would introduce additional startup overhead. These performance results look highly suspicious to me. In our benchmarks on the Intel SGX SDK we observed a performance impact in only two scenarios, and this impact is explainable by the changes to the software architecture:

Resuming after an AEX triggered by an exception that the enclave is expected to handle is faster because EDECCSSA saves an additional round-trip out of the enclave.
Resuming after an AEX triggered by any other interrupt/exception is slower because software is responsible for doing the work (restoring state from the SSA, etc.) that would otherwise be done by ERESUME when AEX-Notify is not enabled.

When the exception handling flows in Gramine were changed to support AEX-Notify, I wonder if these changes introduced some new, unintended overhead. That is one possible explanation. Another is that Gramine takes many more exceptions that are not expected to be handled within the enclave, and therefore flow (2) is occurring much more frequently than it does with enclaves built using the Intel SGX SDK.

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@scottconstable wrote:

IMO the checkpoint mechanism is not strictly an optimization. In very specific circumstances, if the enclave repeatedly takes interrupts in the AEX-Notify handler that prevent the stack from unwinding, the enclave thread could run out of stack. This is a different user-observable behavior that would be introduced by this PR, and therefore the checkpoint mechanism--which prevents stack overflow--is not merely an optimization.

Yes, it's correct, the enclave thread could run out of stack. In this case, Gramine chooses a fail-fast approach and hangs such a thread:

gramine/pal/src/host/linux-sgx/enclave_entry.S

Lines 431 to 438 in 6be2b4a

# Disallow too many nested exceptions. In normal Gramine flow, this should never happen. Since

# addresses need to be canonical, this addition does not overflow.

movq %gs:SGX_SIG_STACK_HIGH, %rax

addq %gs:SGX_SIG_STACK_LOW, %rax

shrq $1, %rax

cmp %rax, %rsi

jae 1f

FAIL_LOOP

I hope Gramine never hits these "very specific circumstances" in benign executions. If it does, then yes, the checkpoint mechanism would need to be implemented on top of the current code.

I agree with your summary. The checkpoint mechanism is required for the AEX-Notify single-/zero-step mitigation that is used by the SGX SDK, because the mitigation can be trivially bypassed without the checkpoint. The only other benefit of the checkpoint is that it avoids this theoretical stack overflow, and I agree that this should never happen in benign circumstances (under malicious circumstances, it would be a DoS).

dimakuv force-pushed the dimakuv/aex-notify-part5 branch 2 times, most recently from 5a8651c to 4ea9dcb Compare October 16, 2024 14:37

dimakuv commented Oct 16, 2024

View reviewed changes

dimakuv commented Oct 17, 2024

View reviewed changes

dimakuv commented Oct 18, 2024

View reviewed changes

Dmitrii Kuvaiskii added 4 commits October 22, 2024 00:27

fixup! [PAL/Linux-SGX] Add AEX-Notify flows in exception handling

0e99648

Fixed EDMM issue. Turned out to be a case of too many nested signal handlers inside Gramine's SGX PAL, which overflowed the SGX enclave signal stack. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

dimakuv force-pushed the dimakuv/aex-notify-part4 branch from 45f12b3 to 6b3950c Compare October 22, 2024 07:38

dimakuv changed the title ~~[PAL/Linux-SGX] Add AEX-Notify flows in exception handling~~ [PAL/Linux-SGX] AEX-Notify 5/5: Add AEX-Notify flows in exception handling Oct 22, 2024

dimakuv force-pushed the dimakuv/aex-notify-part5 branch from 3e518fc to 6504586 Compare October 22, 2024 07:49

dimakuv commented Oct 22, 2024

View reviewed changes

dimakuv commented Oct 31, 2024

View reviewed changes

scottconstable reviewed Oct 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PAL/Linux-SGX] AEX-Notify 5/5: Add AEX-Notify flows in exception handling #2037

[PAL/Linux-SGX] AEX-Notify 5/5: Add AEX-Notify flows in exception handling #2037

dimakuv commented Oct 16, 2024 •

edited

Loading

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

scottconstable commented Oct 25, 2024

scottconstable commented Oct 25, 2024

dimakuv left a comment

scottconstable left a comment

	# Disallow too many nested exceptions. In normal Gramine flow, this should never happen. Since
	# addresses need to be canonical, this addition does not overflow.
	movq %gs:SGX_SIG_STACK_HIGH, %rax
	addq %gs:SGX_SIG_STACK_LOW, %rax
	shrq $1, %rax
	cmp %rax, %rsi
	jae 1f
	FAIL_LOOP

[PAL/Linux-SGX] AEX-Notify 5/5: Add AEX-Notify flows in exception handling #2037

Are you sure you want to change the base?

[PAL/Linux-SGX] AEX-Notify 5/5: Add AEX-Notify flows in exception handling #2037

Conversation

dimakuv commented Oct 16, 2024 • edited Loading

Description of the changes

How to test this PR?

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

scottconstable commented Oct 25, 2024

scottconstable commented Oct 25, 2024

dimakuv left a comment

Choose a reason for hiding this comment

scottconstable left a comment

Choose a reason for hiding this comment

dimakuv commented Oct 16, 2024 •

edited

Loading