Skip to content

Restore fixupBoehmStackPointer#13709

Merged
Mic92 merged 1 commit intomasterfrom
boehm-coroutines-sp
Aug 11, 2025
Merged

Restore fixupBoehmStackPointer#13709
Mic92 merged 1 commit intomasterfrom
boehm-coroutines-sp

Conversation

@edolstra
Copy link
Member

@edolstra edolstra commented Aug 7, 2025

Motivation

This was removed in #11152. However, we need it for the multi-threaded evaluator, because otherwise Boehm GC will crash while scanning the thread stack:

#0  GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1488
#1  0x00007ffff74691d5 in GC_push_all_stack_sections (lo=<optimized out>, hi=<optimized out>, traced_stack_sect=0x0) at extra/../mark_rts.c:704
#2  GC_push_all_stacks () at extra/../pthread_stop_world.c:876
#3  GC_default_push_other_roots () at extra/../os_dep.c:2893
#4  0x00007ffff746235c in GC_mark_some (cold_gc_frame=0x7ffee8ecaa50 "`\304G\367\377\177") at extra/../mark.c:374
#5  0x00007ffff7465a8d in GC_stopped_mark (stop_func=stop_func@entry=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:875
#6  0x00007ffff7466724 in GC_try_to_collect_inner (stop_func=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:624
#7  0x00007ffff7466a22 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at extra/../alloc.c:1688
#8  0x00007ffff746878f in GC_allocobj (gran=<optimized out>, kind=<optimized out>) at extra/../alloc.c:1798
#9  GC_generic_malloc_inner (lb=<optimized out>, k=k@entry=1) at extra/../malloc.c:193
#10 0x00007ffff746cd40 in GC_generic_malloc_many (lb=<optimized out>, k=<optimized out>, result=<optimized out>) at extra/../mallocx.c:477
#11 0x00007ffff746cf35 in GC_malloc_kind (bytes=120, kind=1) at extra/../thread_local_alloc.c:187
#12 0x00007ffff796ede5 in nix::allocBytes (n=<optimized out>, n=<optimized out>) at ../src/libexpr/include/nix/expr/eval-inline.hh:19

This is because it will use the stack pointer of the coroutine, so it will scan a region of memory that doesn't exist, e.g.

Stack for thread 0x7ffea4ff96c0 is [0x7ffe80197af0w,0x7ffea4ffa000)

(where 0x7ffe80197af0 is the sp of the coroutine and 0x7ffea4ffa000 is the base of the thread stack).

We don't scan coroutine stacks, because currently they don't have GC roots (there is no evaluation happening in coroutines). So there is currently no need to restore the other parts of the original patch, such as BoehmGCStackAllocator.

I'm not sure if this is an issue for the single-threaded evaluator, at least I haven't seen these crashes there. That might be luck (a GC is unlikely to be triggered if a coroutine is running, since they don't do any GC allocations) or it might be that the main thread stack is treated differently. Still doesn't hurt to be safe.

Cherry-picked from DeterminateSystems#125.

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

This was removed in #11152. However,
we need it for the multi-threaded evaluator, because otherwise Boehm
GC will crash while scanning the thread stack:

  #0  GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1488
  #1  0x00007ffff74691d5 in GC_push_all_stack_sections (lo=<optimized out>, hi=<optimized out>, traced_stack_sect=0x0) at extra/../mark_rts.c:704
  #2  GC_push_all_stacks () at extra/../pthread_stop_world.c:876
  #3  GC_default_push_other_roots () at extra/../os_dep.c:2893
  #4  0x00007ffff746235c in GC_mark_some (cold_gc_frame=0x7ffee8ecaa50 "`\304G\367\377\177") at extra/../mark.c:374
  #5  0x00007ffff7465a8d in GC_stopped_mark (stop_func=stop_func@entry=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:875
  #6  0x00007ffff7466724 in GC_try_to_collect_inner (stop_func=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:624
  #7  0x00007ffff7466a22 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at extra/../alloc.c:1688
  #8  0x00007ffff746878f in GC_allocobj (gran=<optimized out>, kind=<optimized out>) at extra/../alloc.c:1798
  #9  GC_generic_malloc_inner (lb=<optimized out>, k=k@entry=1) at extra/../malloc.c:193
  #10 0x00007ffff746cd40 in GC_generic_malloc_many (lb=<optimized out>, k=<optimized out>, result=<optimized out>) at extra/../mallocx.c:477
  #11 0x00007ffff746cf35 in GC_malloc_kind (bytes=120, kind=1) at extra/../thread_local_alloc.c:187
  #12 0x00007ffff796ede5 in nix::allocBytes (n=<optimized out>, n=<optimized out>) at ../src/libexpr/include/nix/expr/eval-inline.hh:19

This is because it will use the stack pointer of the coroutine, so it
will scan a region of memory that doesn't exist, e.g.

  Stack for thread 0x7ffea4ff96c0 is [0x7ffe80197af0w,0x7ffea4ffa000)

(where 0x7ffe80197af0w is the sp of the coroutine and 0x7ffea4ffa000
is the base of the thread stack).

We don't scan coroutine stacks, because currently they don't have GC
roots (there is no evaluation happening in coroutines). So there is
currently no need to restore the other parts of the original patch,
such as BoehmGCStackAllocator.
@edolstra edolstra requested a review from roberth August 7, 2025 10:50
@Mic92 Mic92 merged commit 4b3ca9b into master Aug 11, 2025
29 checks passed
@Mic92 Mic92 deleted the boehm-coroutines-sp branch August 11, 2025 14:17
Mic92 added a commit to Mic92/nix-1 that referenced this pull request Aug 11, 2025
This reverts commit 4b3ca9b, reversing
changes made to 867b69f.

Since this commit we get reproducible segfaults building Nix ci in macos github runners:
https://github.com/NixOS/nix/actions/runs/16885882321/job/47837390248
@Mic92
Copy link
Member

Mic92 commented Aug 11, 2025

Mic92 added a commit that referenced this pull request Aug 11, 2025
Revert "Merge pull request #13709 from NixOS/boehm-coroutines-sp"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants