Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless loop when first started with empty filesystem (0.6.1) #93

Open
charles-dyfis-net opened this issue Nov 27, 2018 · 5 comments
Open

Comments

@charles-dyfis-net
Copy link

This was discovered when writing a NixOS system test for the bees module.

The testing logic in https://github.com/NixOS/nixpkgs/blob/923a3e4970226293e4698e44e3e5d5ccf7487603/nixos/tests/bees.nix consistently succeeds every time: This code first creates files on a new filesystem, then starts the bees service. That (passing) test is roughly equivalent to the following shell script:

any_shared_space() {
  [[ $(btrfs fi du -s --raw "$@" | awk 'NR>1 { print $3 }' | grep -E '^0$' | wc -l) -eq 0 ]]
}
die() { echo "$*" >&2; exit 1; }

mkfs.btrfs -f -L aux /dev/vdb || die
mount /dev/vdb /home || die
dd if=/dev/urandom of=/home/dedup-me-1 bs=1M count=8 || die
cp --reflink=never /home/dedup-me-1 /home/dedup-me-2 || die
any_shared_space /home/dedup-me-* && die "ERROR: Detecting shared space before any deduplication has been done"
sync
systemctl start [email protected]
sleep 10
any_shared_space /home/dedup-me-* || die "ERROR: No shared space detected even after bees is running"

By contrast, a test akin to the following -- which starts the service after the filesystem is created and initially mounted, but before any content has been created -- consistently fails, with bees running in a loop which is trying to poll the status of a file descriptor referring to a file that doesn't exist:

any_shared_space() {
  [[ $(btrfs fi du -s --raw "$@" | awk 'NR>1 { print $3 }' | grep -E '^0$' | wc -l) -eq 0 ]]
}
die() { echo "$*" >&2; exit 1; }

mkfs.btrfs -f -L aux /dev/vdb || die
mount /dev/vdb /home || die
systemctl start [email protected]
sleep 1
dd if=/dev/urandom of=/home/dedup-me-1 bs=1M count=8 || die
cp --reflink=never /home/dedup-me-1 /home/dedup-me-2 || die
sync
sleep 10
any_shared_space /home/dedup-me-* || die "ERROR: No shared space detected even after bees is running"

The actual failing test can be found at https://github.com/charles-dyfis-net/nixpkgs/blob/bees-test-failing/nixos/tests/bees.nix; if checking out the relevant branch of nixpkgs, it can be run (from the root of that working tree) with nix-build -I nixpkgs="$PWD" ./nixos/tests/bees.nix.

strace of the loop taking place when in that failed state can be seen at https://gist.github.com/charles-dyfis-net/34ac2e4d2bada0c3a3c8632cab98c8d9

@charles-dyfis-net
Copy link
Author

The title is a bit misleading, as the filesystem is not quite empty even in the failing case -- we've still had a .beeshome subvol created by the wrapper, after all.

@Zygo
Copy link
Owner

Zygo commented Nov 27, 2018

Do you have the log output from the failing case? Among other things it will have the context leading up to the exception.

@charles-dyfis-net
Copy link
Author

Edited an additional file log.txt into the previously-linked gist.

@Zygo
Copy link
Owner

Zygo commented Nov 27, 2018

So it's failing to find the extent tree in the root tree. Fascinating.

Maybe there's some race condition during the first btrfs mount where transid 1 or the extent tree doesn't exist yet?

In any case, some other things are going wrong after that: somehow we get a transid_max of 18446744073709551615 (maybe need something like 9013218 for transid_max()). After that, nothing works because we're assuming the largest possible transaction rate and won't accept any smaller transid than the maximum possible transid.

It should be possible to work around this by forcing a sync after creating .beeshome.

@charles-dyfis-net
Copy link
Author

Confirmed: Adding that sync (as in charles-dyfis-net/nixpkgs@5ed17df) does indeed work around the immediate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants