Skip to content

libfetchers: Add unit test#6

Closed
roberth wants to merge 1 commit intoedolstra:lazy-treesfrom
hercules-ci:lazy-tree-testing
Closed

libfetchers: Add unit test#6
roberth wants to merge 1 commit intoedolstra:lazy-treesfrom
hercules-ci:lazy-tree-testing

Conversation

@roberth
Copy link

@roberth roberth commented Feb 11, 2023

Findings:

  • The error for an empty work tree isn't great, and should perhaps
    not be an error at all.

  • It would be nice to use a C++ wrapper library. Writing this in C style
    is a little unpleasant. Would make sense to use that in the main code
    as well, unless we want to keep the untested build from source closure
    small and not vendor the wrapper.
    An example of such a wrapper is https://github.com/AndreyG/libgit2cpp
    Alternatively, we might do a lightweight wrapper of our own that only
    adds constructors and destructors.

  • Alternatively, we could use a store-only CLI command to test fetchers.
    This won't let us test the intricacies of laziness, but does simplify
    the setup of such things as git repositories by an order of magnitude.
    In this case, I'd keep the test suite around for tests that are specific
    to the input accessor methods, insofar as those aren't covered by the
    such CLI-based tests.

  • Nix couldn't find a suitable getCacheDir() when run in the sandbox.
    I think a fixed location in /tmp is a sensible behavior in this case.

Findings:

 - The error for an empty work tree isn't great, and should perhaps
   not be an error at all.

 - It would be nice to use a C++ wrapper library. Writing this in C style
   is a little unpleasant. Would make sense to use that in the main code
   as well, unless we want to keep the untested build from source closure
   small and not vendor the wrapper.
   An example of such a wrapper is https://github.com/AndreyG/libgit2cpp/tree/master/src
   Alternatively, we might do a lightweight wrapper of our own that only
   adds constructors and destructors.

 - Alternatively, we could use a store-only CLI command to test fetchers.
   This won't let us test the intricacies of laziness, but does simplify
   the setup of such things as git repositories by an order of magnitude.
   In this case, I'd keep the test suite around for tests that are specific
   to the input accessor methods, insofar as those aren't covered by the
   such CLI-based tests.
@roberth roberth requested a review from edolstra as a code owner February 11, 2023 15:22
@roberth
Copy link
Author

roberth commented Feb 11, 2023

Cost breakdown of last unit test, repo with 1 file in HEAD.

TmpRepo init: 1ms
TmpStore init: 36ms
Adding 1 file to HEAD: 1.2ms
Acquire an InputAccessor: instant
readDirectory: 7ms
fetchToSTore (after readDirectory): 4ms

Total including cleanup: 68ms

Tests could be sped up 2× by using a global store.

Sometimes the test takes twice as long. Maybe libgit2 tries to auto gc or something.

@roberth roberth mentioned this pull request Feb 11, 2023
9 tasks
@Ericson2314
Copy link

Oh did this work ever end up anywhere?

@roberth
Copy link
Author

roberth commented Jan 3, 2024

It did not. I've added it to the board just now: Nix team (view)

EDIT: was wrong board

@edolstra edolstra closed this Jan 5, 2024
@edolstra
Copy link
Owner

edolstra commented Jan 5, 2024

Will move this to the NixOS/nix repo.

edolstra pushed a commit that referenced this pull request Sep 29, 2025
We now see exception beeing thrown when remote building in master
because of writing to a non-blocking file descriptor from our json logger.

> #0  0x00007f2ea97aea9c in __pthread_kill_implementation () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> NixOS#1  0x00007f2ea975c576 in raise () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #2  0x00007f2ea9744935 in abort () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
> #3  0x00007f2ea99e8c2b in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #4  0x00007f2ea99f820a in __cxxabiv1::__terminate(void (*)()) () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #5  0x00007f2ea99f8275 in std::terminate() () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #6  0x00007f2ea99f84c7 in __cxa_throw () from /nix/store/ybjcla5bhj8g1y84998pn4a2drfxybkv-gcc-13.3.0-lib/lib/libstdc++.so.6
> #7  0x00007f2eaa5035c2 in nix::writeFull (fd=2, s=..., allowInterrupts=true) at ../unix/file-descriptor.cc:43
> #8  0x00007f2eaa5633c4 in nix::JSONLogger::write (this=this@entry=0x249a7d40, json=...) at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/char_traits.h:358
> #9  0x00007f2eaa5658d7 in nix::JSONLogger::logEI (this=<optimized out>, ei=...) at ../logging.cc:242
> #10 0x00007f2ea9c5d048 in nix::Logger::logEI (ei=..., lvl=nix::lvlError, this=0x249a7d40) at /nix/store/a7cq5bqh0ryvnkv4m19ffchnvi8l9qx6-nix-util-2.27.0-dev/include/nix/logging.hh:108
> #11 nix::handleExceptions (programName="nix", fun=...) at ../shared.cc:343
> #12 0x0000000000465b1f in main (argc=<optimized out>, argv=<optimized out>) at /nix/store/4krab2h0hd4wvxxmscxrw21pl77j4i7j-gcc-13.3.0/include/c++/13.3.0/bits/allocator.h:163
> (gdb) frame 10
> #10 0x00007f2ea9c5d048 in nix::Logger::logEI (ei=..., lvl=nix::lvlError, this=0x249a7d40) at /nix/store/a7cq5bqh0ryvnkv4m19ffchnvi8l9qx6-nix-util-2.27.0-dev/include/nix/logging.hh:108
> 108             logEI(ei);

So far only drainFD sets the non-blocking flag on a "readable" file descriptor,
while this is a "writeable" file descriptor.
It's not clear to me yet, why we see logs after that point, but it's
also not that bad to handle EAGAIN in read/write functions after all.
edolstra added a commit that referenced this pull request Sep 29, 2025
This makes it behave the same as nix-daemon. Opening the store in the
parent can cause a SIGBUS in libsqlite in the child:

  #0  0x00007f141cf6f789 in __memset_avx2_unaligned_erms () from /nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libc.so.6
  NixOS#1  0x00007f141c322fe8 in walIndexAppend () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #2  0x00007f141c3711a2 in pagerWalFrames () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #3  0x00007f141c38317e in sqlite3PagerCommitPhaseOne.part.0 () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #4  0x00007f141c383555 in sqlite3BtreeCommitPhaseOne.part.0 () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #5  0x00007f141c384797 in sqlite3VdbeHalt () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #6  0x00007f141c3b8f60 in sqlite3VdbeExec () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #7  0x00007f141c3bbfef in sqlite3_step () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #8  0x00007f141c3bd0e5 in sqlite3_exec () from /nix/store/bbd59cqw259149r2ddk4w1q0lr2fch8c-sqlite-3.46.1/lib/libsqlite3.so.0
  #9  0x00007f141da140e0 in nix::SQLiteTxn::commit() () from /nix/store/1m4r8s7s1v54zq9isncvjgia02bffxlz-determinate-nix-store-3.1.0/lib/libnixstore.so
  #10 0x00007f141d9ce69c in nix::LocalStore::registerValidPaths(std::map<nix::StorePath, nix::ValidPathInfo, std::less<nix::StorePath>, std::allocator<std::pair<nix::StorePath const, nix::ValidPathInfo> > > const&)::{lambda()NixOS#1}::operator()() const () from /nix/store/1m4r8s7s1v54zq9isncvjgia02bffxlz-determinate-nix-store-3.1.0/lib/libnixstore.so
edolstra added a commit that referenced this pull request Sep 29, 2025
This was removed in NixOS#11152. However,
we need it for the multi-threaded evaluator, because otherwise Boehm
GC will crash while scanning the thread stack:

  #0  GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1488
  NixOS#1  0x00007ffff74691d5 in GC_push_all_stack_sections (lo=<optimized out>, hi=<optimized out>, traced_stack_sect=0x0) at extra/../mark_rts.c:704
  #2  GC_push_all_stacks () at extra/../pthread_stop_world.c:876
  #3  GC_default_push_other_roots () at extra/../os_dep.c:2893
  #4  0x00007ffff746235c in GC_mark_some (cold_gc_frame=0x7ffee8ecaa50 "`\304G\367\377\177") at extra/../mark.c:374
  #5  0x00007ffff7465a8d in GC_stopped_mark (stop_func=stop_func@entry=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:875
  #6  0x00007ffff7466724 in GC_try_to_collect_inner (stop_func=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:624
  #7  0x00007ffff7466a22 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at extra/../alloc.c:1688
  #8  0x00007ffff746878f in GC_allocobj (gran=<optimized out>, kind=<optimized out>) at extra/../alloc.c:1798
  #9  GC_generic_malloc_inner (lb=<optimized out>, k=k@entry=1) at extra/../malloc.c:193
  #10 0x00007ffff746cd40 in GC_generic_malloc_many (lb=<optimized out>, k=<optimized out>, result=<optimized out>) at extra/../mallocx.c:477
  #11 0x00007ffff746cf35 in GC_malloc_kind (bytes=120, kind=1) at extra/../thread_local_alloc.c:187
  #12 0x00007ffff796ede5 in nix::allocBytes (n=<optimized out>, n=<optimized out>) at ../src/libexpr/include/nix/expr/eval-inline.hh:19

This is because it will use the stack pointer of the coroutine, so it
will scan a region of memory that doesn't exist, e.g.

  Stack for thread 0x7ffea4ff96c0 is [0x7ffe80197af0w,0x7ffea4ffa000)

(where 0x7ffe80197af0w is the sp of the coroutine and 0x7ffea4ffa000
is the base of the thread stack).

We don't scan coroutine stacks, because currently they don't have GC
roots (there is no evaluation happening in coroutines). So there is
currently no need to restore the other parts of the original patch,
such as BoehmGCStackAllocator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants