Skip to content

ZFS improvements#4807

Merged
wmertens merged 5 commits intoNixOS:masterfrom
wizeman:u/zfs-improvements
Nov 13, 2014
Merged

ZFS improvements#4807
wmertens merged 5 commits intoNixOS:masterfrom
wizeman:u/zfs-improvements

Conversation

@wizeman
Copy link
Member

@wizeman wizeman commented Nov 3, 2014

This PR does the following:

  • Uses the upstream systemd services by default (including the one for zed - the ZFS Event Daemon)
  • Reimplements the ZFS pool import service so that:
    • We do not fail if there are no ZFS pools to import
    • We not rely on /etc/zfs/zpool.cache, which eliminates some issues
    • Instead, the ZFS pools to import have to be specified in configuration.nix
    • To avoid issues and make configuration simpler, NixOS will automatically infer the necessary ZFS pools to import for ZFS filesystems which are defined in fileSystems.*
    • For ZFS pools which are not mentioned in fileSystems.*, the user will have to add them to boot.zfs.extraPools if they want them to be automatically imported on every boot
    • Please note that the way in which this PR is now importing ZFS pools is more or less the direction in which upstream ZFS seems to be moving towards
  • Added option to disable force import (i.e. --force / -f) of ZFS pools. Currently, we still force-import by default to maintain backwards compatibility with NixOS systems that don't have the host ID recorded in their ZFS pool labels, but once most people upgrade to this PR we should disable force-importing by default
  • Reimplement SPL's host ID configuration so that:
    • The SPL now correctly picks up the host ID defined in configuration.nix (this wasn't happening before due to a bug in the SPL)
    • The host ID is now configured system-wide by writing it to /etc/hostid, which makes the ZFS userspace tools agree with the kernel as to what the host ID is. It also makes glibc (and therefore, other unrelated applications) pick up the host ID defined in configuration.nix
    • Renamed boot.spl.hostid to networking.hostId, to make it clear that the host ID is a system-wide config option and not tied to the SPL. I've put it under networking because it closely resembles networking.hostName in terms of purpose
    • Only allow ZFS to be enabled if networking.hostId is set. Otherwise, ZFS pools will not record the host ID in the vdev labels and it will be impossible to open or import a ZFS pool without passing the --force / -f parameter
  • Make nixos-generate-config generate a random 32-bit host ID by default (when installing NixOS). This makes the host ID more reliable than falling back to the default glibc behavior of generating a host ID based on the IP address, which can change quite often and conflict with other machines in the same network in some cases. It also makes using ZFS easier by not requiring the user to worry about manually generating a host ID themselves and adding it to configuration.nix

The commit messages contain a bit more details regarding the issues above.

@domenkozar
Copy link
Member

note that this has stdenv changes.

@domenkozar domenkozar added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS mass-rebuild labels Nov 3, 2014
@wizeman
Copy link
Member Author

wizeman commented Nov 3, 2014

@iElectric Oh, is it because of adding the isBigEndian definition? I'm wondering if there is a better way to do that then, seeing that it doesn't make sense to rebuild everything just because I'm adding a utility function...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not urandom? http://www.2uo.de/myths-about-urandom/

Also, perhaps you could use the ethernet MAC address first? That's more unique-per-host? Too bad the id is only 32 bit, it would be nice to include the timestamp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using /dev/urandom is fine by me :) I will fix that, thanks!

Regarding MAC address, what if you generate a host ID using the MAC of machine 1's network adapter, then move that network adapter to machine 2 (because you are doing maintenance and reinstalling machine 2) and then generate a new host ID in machine 2? They would end up with the same host ID?

To me, using /dev/(u)random seems less error-prone, and by design it has to be seeded with as much unique information as possible. It is true that it doesn't guarantee uniqueness due to random collisions, but I'm not sure there is anything that is truly reliably unique, especially with such a small 32-bit integer space...

According to my calculations (using birthday paradox formulas), if you generate 32-bit host IDs randomly, then if you have 10 machines, it's only a 0.000001% chance that 2 of those machines will end up with the same host ID. If you have 100 machines there is only a 0.0001% chance.

At least with random host ID generation we can calculate the collision probability exactly, while if it's not truly random, then it's much harder... I think, more likely the probability of collision will be higher due to unforeseen events, depending on how you generate it.

Consider, for example, the above case of moving network adapters from one machine to another, or moving a machine from one network to another, or using exactly the same simple hardware (such as a Raspberry Pi) in 2 different machines, ...

@wmertens
Copy link
Contributor

wmertens commented Nov 3, 2014

Nice! 👍

@wizeman wizeman force-pushed the u/zfs-improvements branch from e8e6f99 to 99695af Compare November 4, 2014 18:11
@wizeman
Copy link
Member Author

wizeman commented Nov 4, 2014

Fixed the following issues as suggested by @wmertens:

  • Use /dev/urandom instead of /dev/random
  • Simplify generation of /etc/hostid (no point in using sed) and make big endian and little endian expressions match each other for symmetry

I still don't know if the isBigEndian utility function belongs in stdenv, or if it should be defined somewhere else so that stdenv doesn't get rebuilt...

@wizeman wizeman force-pushed the u/zfs-improvements branch from 99695af to 192d311 Compare November 4, 2014 18:17
@domenkozar
Copy link
Member

Note: this PR is required to fix tests on nixos-combined job. cc @edolstra

@wmertens
Copy link
Contributor

wmertens commented Nov 5, 2014

Regarding stdenv, I'm surprised that adding an attribute (function) to it
would cause a world rebuild, is that really the case?

On Wed Nov 05 2014 at 12:40:26 PM Domen Kožar notifications@github.com
wrote:

Note: this PR is required to fix tests on nixos-combined job. cc @edolstra
https://github.com/edolstra


Reply to this email directly or view it on GitHub
#4807 (comment).

@domenkozar
Copy link
Member

@wmertens you're right, it doesn't.

@domenkozar
Copy link
Member

nox-review wip complains about missing grsecurity patches (404) :(

@wizeman
Copy link
Member Author

wizeman commented Nov 9, 2014

@iElectric I have pushed updates for grsec. Anyway, this is a bit unrelated to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, aren't hostId and byte order fully defined when this gets built and could this section therefore be replaced by a @setHostId@ string to be defined in the module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right...
But in which module would I define setHostId? Can I define a function in one module, and then use it in another one?

@wmertens
Copy link
Contributor

...starting to look excellent! 😁

domenkozar referenced this pull request Nov 12, 2014
Also remove custom zfs services from NixOS.  This makes NixOS more aligned with
upstream.

More importantly, it prepares the way for NixOS to use ZED (the ZFS event
daemon). This service will automatically be enabled but it is not possible to
configure it via configuration.nix yet.
It removes duplicate elements from a list.
It turns out that the upstream systemd services that import ZFS pools contain
serious bugs. The first major problem is that importing pools fails if there
are no pools to import. The second major problem is that if a pool ends up in
/etc/zfs/zpool.cache but it disappears from the system (e.g. if you
reboot but during the reboot you unplug your ZFS-formatted USB pen drive),
then the import service will always fail and it will be impossible to get rid
of the pool from the cache (unless you manually delete the cache).

Also, the upstream service would always import all available ZFS pools every
boot, which may not be what is desired in some cases.

This commit will solve these problems in the following ways:

1. Ignore /etc/zfs/zpool.cache. This seems to be a major source of
issues, and also does not play well with NixOS's philosophy of
reproducible configurations. Instead, on every boot NixOS will try to import
the set of pools that are specified in its configuration.  This is also the
direction that upstream is moving towards.

2. Instead of trying to import all ZFS pools, only import those that are
actually necessary. NixOS will automatically determine these from the
config.fileSystems.* option. Also, the user can import any additional
pools every boot by adding them to the config.boot.zfs.extraPools
option, but this is only necessary if their filesystems are not
specified in config.fileSystems.*.

3. Added options to configure if ZFS should force-import ZFS pools. This may
currently be necessary, especially if your pools have not been correctly
imported with a proper host id configuration (which is probably true for 99% of
current NixOS ZFS users). Once host id configuration becomes mandatory when
using ZFS in NixOS and we are sure that most users have updated their
configurations and rebooted at least once, we should disable force-import by
default. Probably, this shouldn't be done before the next stable release.

WARNING: This commit may change the order in which your non-ZFS vs ZFS
filesystems are mounted.  To avoid this problem (now or in the future)
it is recommended that you set the 'mountpoint' property of your ZFS
filesystems to 'legacy', and that you manage them using
config.fileSystems, just like any other non-ZFS filesystem is usually
managed in NixOS.
The old boot.spl.hostid option was not working correctly due to an
upstream bug.

Instead, now we will create the /etc/hostid file so that all applications
(including the ZFS kernel modules, ZFS user-space applications and other
unrelated programs) pick-up the same system-wide host id. Note that glibc
(and by extension, the `hostid` program) also respect the host id configured in
/etc/hostid, if it exists.

The hostid option is now mandatory when using ZFS because otherwise, ZFS will
require you to force-import your ZFS pools if you want to use them, which is
undesirable because it disables some of the checks that ZFS does to make sure it
is safe to import a ZFS pool.

The /etc/hostid file must also exist when booting the initrd, before the SPL
kernel module is loaded, so that ZFS picks up the hostid correctly.

The complexity in creating the /etc/hostid file is due to having to
write the host ID as a 32-bit binary value, taking into account the
endianness of the machine, while using only shell commands and/or simple
utilities (to avoid exploding the size of the initrd).
The host id value gets generated by reading a 32-bit value from
/dev/urandom.

This makes programs that rely on a correct host id more reliable.

It also makes using ZFS more seamless, as you don't need to configure
the hostId manually; instead, it becomes part of your config from the
moment you install NixOS.
@wizeman
Copy link
Member Author

wizeman commented Nov 12, 2014

Updated to implement the latest suggestions by @wmertens. Thanks!

@wmertens
Copy link
Contributor

One last nitpick: The commit for the unique function comes after its first use 😅. Can you reorder that?

Other than that, 😍.

@domenkozar
Copy link
Member

cc @dysinger

@wizeman
Copy link
Member Author

wizeman commented Nov 12, 2014

@wmertens Actually, the commits are ordered correctly:

The unique function's commit is 1fea586
The "improve ZFS boot process"'s commit is 12e77fd and its parent is 1fea586

It's just github who is screwing up how they are displayed...

@wmertens
Copy link
Contributor

...and sold, to the gentleman with the hat!

wmertens added a commit that referenced this pull request Nov 13, 2014
@wmertens wmertens merged commit 5c19521 into NixOS:master Nov 13, 2014
wmertens added a commit that referenced this pull request Nov 13, 2014
@wizeman wizeman deleted the u/zfs-improvements branch November 21, 2014 00:46
@Havvy
Copy link
Contributor

Havvy commented Aug 13, 2015

I was told to leave a comment here. Not sure why I'm not opening a new issue, but whatever.

The docs for networking.hostId are confusing. They say it should be a unique number, but it doesn't exactly state why. It's apparently used by ZFS for now, but I had to find that out by asking on IRC. It should state that if you aren't using ZFS or whatever uses it, you can safely remove it. Especially since it's a part of the nix-generate-config generated configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants