Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/mysql: replace initialDatabases, ensureDatabases, and ensureUsers options with activationScripts option #107342

Closed
wants to merge 3 commits into from

Conversation

aanderse
Copy link
Member

Motivation for this change

I have been brainwashed convinced by @grahamc that the ensure* options aren't a good fit for NixOS. I'm seeking to remove these options from both the mysql and postgresql modules, but still leave other module authors who wish to provision databases with a reasonable way to do so. My plan for postgresql and mysql was as follows:

  • remove the initialDatabases, ensureDatabases and ensureUsers options
  • add an activationScripts option which module authors (and NixOS users) can use to write idempotent sql statements, via shell code, which will be executed as the mysql/postgresql super admin user to do things like provision local accounts, databases, etc... (ie. replace the ensureDatabases and ensureUsers options with a new option that provides the same functionality, but with even more flexibility)
  • replace all usage of initialDatabases, ensureDatabases and ensureUsers in NixOS with the new activationScripts option
  • strongly caution NixOS users against carelessly using the activationScripts option, very explicitly explaining how it leads to systems which aren't reproducible and why that is bad
  • separate the execution of the sql in the activationScripts option from the postgresql/mysql systemd service so a full restart of postgresql/mysql is not required every time someone alters activationScripts - which is a significant advantage over simply tacking more snippets onto the .postStart of the database systemd units

You might ask why should we get rid of the ensure* options?
Aside from promoting systems which aren't reproducible, the ensure* options have a few other major problems with postgresql, mainly dealing with encoding and extensions. Currently all attempts to extend the ensureDatabase options have led to scenarios where a users configuration.nix could very easily not match the reality of what the database actually looks like. After investigating what it would take to make the ensure* options truly reproducible and actually enforce what they describe I came to the conclusion this is simply a bad idea because any database changes we might make can lead to data corruption and require a backup before making them - something we don't want to do every time the user does a rebuild.

While some will argue that users can simply add sql statements to the .postStart of the mysql/postgresql systemd units I do see this as somewhat hacky, providing no documentation, and causes mysql/postgresql restarts with no easy and user-facing-api friendly way correct that.

NOTE: this is to replace #84146 and #94048. The postgresql variant of this PR will follow shortly after, if this is merged.

NOTE: @aszlig - I didn't provision a separate systemd unit for each snippet because I wanted to mimic system.activationScripts - I found that to be a reasonable interface with a good set of trade-offs.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Copy link
Member

@aszlig aszlig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have very mixed feelings about this, since this crams everything into one single unit running in a specific execution environment and also makes ordering way less flexible than with separate systemd service units.

However since I'm not using either the ensure* options nor this mechanism, I'm not having a particularly strong opinion on this.

description = "The content of the script.";
};
};
in either str (submodule { options = scriptOptions; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use coercedTo here?

Eg. like this:

Suggested change
in either str (submodule { options = scriptOptions; });
in coercedTo str (text: { inherit text; }) (submodule { options = scriptOptions; });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation taken directly from system.activationScript, so I didn't think too hard about it. Thanks for the recommendation.

Comment on lines +263 to +264
set' = mapAttrs (n: v: if isString v then noDepEntry v else v) set;
withHeadlines = addAttributeName set';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above change, this would be just:

Suggested change
set' = mapAttrs (n: v: if isString v then noDepEntry v else v) set;
withHeadlines = addAttributeName set';
withHeadlines = addAttributeName set;

wantedBy = [ "multi-user.target" ];

serviceConfig = mkMerge [
commonServiceConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that it would run in the same environment as the MySQL service (eg. same user/group, restricted system, etc...), so if you want to populate the database based on state data in another application with another user, you need to either override these options (which will then affect all activation scripts) or use a separate systemd service anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the script runs as any user othere than mysql database permissions would not exist to create databases, etc... You always want an activationScript to run as mysql, never as another user.

Type = "oneshot";
RemainAfterExit = true;
SyslogIdentifier = "mysql-activation-scripts";
ExecStart = pkgs.writeScript "mysql-activation-scripts.sh" cfg.activationScripts.script;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this directly in the service definition:

{
  description = "Run MySQL-specific NixOS activation";
  after = [ "mysql.service" ];
  wantedBy = [ "multi-user.target" ];

  inherit (cfg.activationScripts) script;

  serviceConfig = ...;
}

You then could also remove the #! ${pkgs.runtimeShell} above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was added because script doesn't allow trap error handling, or something along those lines.

_localstatus=0
${v.text}
if (( _localstatus > 0 )); then
printf "MySQL activation script snippet '%s' failed (%s)\n" "${a}" "$_localstatus"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
printf "MySQL activation script snippet '%s' failed (%s)\n" "${a}" "$_localstatus"
printf "MySQL activation script snippet '%s' failed (%s)\n" ${lib.escapeShellArg a} "$_localstatus"

@aanderse
Copy link
Member Author

I still have very mixed feelings about this, since this crams everything into one single unit running in a specific execution environment and also makes ordering way less flexible than with separate systemd service units.

However since I'm not using either the ensure* options nor this mechanism, I'm not having a particularly strong opinion on this.

The same as system.activationScripts. This option can really promote system configurations which aren't reproducible and work against what NixOS is trying to achieve, so we don't want to give false impressions that this option is rock solid or anything. I feel like I have added adequate warnings to the documentation.

@peterhoeg
Copy link
Member

This is good (and necessary!) work, thank you.

There is a lot of duplication to set up the the auth socket - can't we handle this as part of the activation script's environment instead so we only need to do it once?

@paulreimer
Copy link
Contributor

I'm curious how the deprecated options lead to a not-reproducible system? (I'm still new to nix principles, was about to use them for a postgres for matrix-synapse)

@peterhoeg
Copy link
Member

peterhoeg commented May 27, 2021 via email

@aanderse
Copy link
Member Author

Because the corresponding sql statements are only executed on first run.

@peterhoeg you're absolutely correct about the initialScript option.

@paulreimer additional to the initialScript option there are the ensure* options (ensureDatabases, ensureUsers, and ensurePermissions). Databases are almost entirely state. You shouldn't try to describe state with nix... it often ends poorly. @grahamc offers one example of how this can go bad here. Once you start needing anything above and beyond a basic, default encoded database, the ensure* options really show how dangerous they can be. Specifically, if you provision a database for matrix-synapse with the ensureDatabases option I believe the collate or encoding will be incorrect. If we added the ability to set the encoding or collate in the ensureDatabases options what happens when the sysadmin changes the values? We cannot write scripts to automatically change encodings or collate (those are relatively "dangerous" operations and should always have a database backup taken beforehand). We dig deeper and deeper into the rabbit hole of managing state, which we shouldn't do. Hence we end up in a position where configuration.nix no longer reflects the state of your system. At this point NixOS doesn't add value and you are arguably no better off than some of the other devops tools that we all think NixOS is better than 😉.

Does this help answer the question?

@aanderse
Copy link
Member Author

Closing due to lack on consensus from the community. Unfortunate.

@roberth
Copy link
Member

roberth commented Nov 17, 2022

After investigating what it would take to make the ensure* options truly reproducible and actually enforce what they describe I came to the conclusion this is simply a bad idea because any database changes we might make can lead to data corruption and require a backup before making them - something we don't want to do every time the user does a rebuild.

Do you have a record of this investigation? Are we tracking these issues?

In this conversation, I've found

The community does not seem to be excited about the suggested solution, and neither am I. Nonetheless, some issues do exist, and I believe we can address them, either by reporting state problems, or by solving the problems in new and better ways.

@RaitoBezarius
Copy link
Member

Tracked in #206467.

@luochen1990
Copy link
Contributor

Currently, all attempts to extend the ensureDatabase options have led to scenarios where a user's configuration.nix could very easily not match the reality of what the database actually looks like.

That's true. However, we also notice that many database operation libraries in different programming languages provide similar ensure* operations, and they have not removed these operations just because of this difficulty.

If you provision a database for matrix-synapse with the ensureDatabases option, I believe the collation or encoding will be incorrect

This is the same issue as above, and it won't be a problem as long as we don't consider extending the ensureDatabases option to be a difficult task.

In fact, we already have a PR: #339977

Separate the execution of the SQL in the activationScripts option from the postgresql/mysql systemd service so that a full restart of postgresql/mysql is not required every time someone alters activationScripts - which is a significant advantage over simply tacking more snippets onto the .postStart of the database systemd units

Very good idea! But in fact, I think it's just an implementation issue, and it has nothing to do with how the options are designed. We can definitely do this on the existing options.

We delve deeper and deeper into the rabbit hole of managing state, which we shouldn't do. Hence, we end up in a position where configuration.nix no longer reflects the state of your system.

Ensuring state consistency declaratively is indeed challenging, but this does not mean there are no simplified ways to handle it. In fact, by looking at various database libraries across different languages, we can find many approaches rather than a binary choice. For example, we can assert that the current state of the library (if it already exists) is consistent with the declared options. If not, we can throw an error and refuse to start, prompting the user to manually adjust the relevant options. This sacrifices some availability during upgrades to enhance data consistency. Which I think will be a good balance.

@Ma27
Copy link
Member

Ma27 commented Sep 21, 2024

database operation libraries in different programming languages provide similar ensure* operations

Not sure which tools you're talking about specifically, but let's not forget that we follow a declarative paradigm and changing options with nothing happening afterwards will be pretty unintuitive.

Like, it was already a surprise to people when the very first nextcloud module had configuration parameters that couldn't be changed after first install and this module has way less users than the postgresql module.

and it won't be a problem as long as we don't consider extending the ensureDatabases option to be a difficult task.

I'm afraid I don't understand what you want to say here.

Yes, I do think it's a non-trivial task. Yes, it caused pain for the maintainers during the past two releases.

If not, we can throw an error and refuse to start, prompting the user to manually adjust the relevant options.

I think it's way too harsh to break the entire DB server if a single database is having problems.

But in fact, I think it's just an implementation issue, and it has nothing to do with how the options are designed.

What's the improvement if the semantics of the options is the exact same, the code is just executed at a different time?


The core issue we have is that users and databases are not configuration, but data in case of database software. So our current tooling is limited here, unfortunately.

There are ideas laid out #206467. My preference is to have useful tooling for state-management in NixOS in general. Not some ad-hoc solution hacked down in the prestart hook that has the potential of confusing users and being a pain to maintainers.

While I have rough plans for it, I don't know if I get to that soonish. If anybody is faster, feel free!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants