-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable network config generation via netdog
#2066
Conversation
^ Push above:
|
packages/os/network-config.service
Outdated
StandardError=journal+console | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this directive is required, if you already made this unit RequiredBy=preconfigured.target
sources/api/netdog/README.md
Outdated
@@ -14,6 +14,10 @@ It contains two subcommands meant for use as settings generators: | |||
|
|||
The subcommand `set-hostname` sets the hostname for the system. | |||
|
|||
The subcommand `generate-net-config` generates the network interface configuration for the host. | |||
If a `net.toml` exists, it is used to generate the configuration, otherwise a default configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
If a `net.toml` exists, it is used to generate the configuration, otherwise a default configuration | |
If a `net.toml` file exists in `/var/lib/bottlerocket`, it is used to generate the configuration, otherwise a default configuration |
packages/os/network-config.service
Outdated
@@ -0,0 +1,20 @@ | |||
[Unit] | |||
Description=Bottlerocket network configuration system | |||
Requires=var-lib-bottlerocket.mount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe a note here saying that the mount should be there because the service will attempt to read from it? Also in other units we use this when we need a mount to be present:
Requires=var-lib-bottlerocket.mount | |
RequiresMountsFor=/var/lib/bottlerocket |
sources/api/netdog/build.rs
Outdated
if variant.starts_with("aws") { | ||
println!("cargo:rustc-cfg=bottlerocket_platform=\"aws\""); | ||
} else if variant.starts_with("vmware") { | ||
println!("cargo:rustc-cfg=bottlerocket_platform=\"vmware\""); | ||
} else if variant.starts_with("metal") { | ||
println!("cargo:rustc-cfg=bottlerocket_platform=\"metal\""); | ||
} else { | ||
eprintln!( | ||
"For local builds, you must set the 'VARIANT' environment variable so we know which data \ | ||
provider to build. Valid values are the directories in models/src/variants/, for \ | ||
example 'aws-ecs-1'." | ||
); | ||
process::exit(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to do something similar somewhere else (not yet in the code), what do you think about?
if variant.starts_with("aws") { | |
println!("cargo:rustc-cfg=bottlerocket_platform=\"aws\""); | |
} else if variant.starts_with("vmware") { | |
println!("cargo:rustc-cfg=bottlerocket_platform=\"vmware\""); | |
} else if variant.starts_with("metal") { | |
println!("cargo:rustc-cfg=bottlerocket_platform=\"metal\""); | |
} else { | |
eprintln!( | |
"For local builds, you must set the 'VARIANT' environment variable so we know which data \ | |
provider to build. Valid values are the directories in models/src/variants/, for \ | |
example 'aws-ecs-1'." | |
); | |
process::exit(1); | |
} | |
let platform = variant.split('-').collect::<Vec<&str>>()[0].to_string(); | |
println!("cargo:rustc-cfg=bottlerocket_platform=\"{}\"", platform); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely less code, and I like the fact that we don't have to specify all the variants, but I could see a few issues.
- We don't get a nice error message if someone misspells the variant - we get a more cryptic compiler message
- I think it's reasonable to assume that a variant name wouldn't be zero-length (covering the index issue), but we get the same issue as above where the build will happen with a misspelled
cargo:rustc-cfg=bottlerocket_platform="awsk8s121"
and we'd get a less than friendly error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For 1), in the current implementation I can do
aws-not-existing
and get a compiler error - For 2), in the current implementation,
awsk8s121
will cause the same error as 1)
So the question is how much validation do we want to do while parsing the variant's name to provide a friendly error?, or else we have to trade off some errors for others when the user passes an invalid variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented your suggestion, and opted to add a compile_error!()
after a cfg
directive - see wicked.rs
For the lack of a better place to add this comment: Nit: I believe the commit message in
|
sources/api/netdog/src/main.rs
Outdated
mod net_config; | ||
use net_config::NetConfig; | ||
|
||
mod wicked; | ||
use wicked::WickedInterface; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there a reason to keep these use statements separate from all the others? rust fmt will just sort the lists of mods and uses if they are aggregated.
^ The above fixes a few typos and adds a |
Flipping this out of draft since I feel pretty comfortable with the functionality. |
^ fixed the test data 🤦 |
^ The above push:
|
^ Adds a quick net config version check. This check will be removed when we handle versioning (already in progress). I believe at this point I've addressed all of the comments minus @webern 's comment about testing the |
^ This force push was a pretty good sized change, hence requesting all reviewers again.
|
I'll spin back through the variants again to make sure they're all tested, but my smoke testing of |
sources/api/netdog/src/main.rs
Outdated
if config.dhcp4.is_none() && config.dhcp6.is_none() { | ||
return error::InvalidNetConfigSnafu { | ||
interface: name.to_string(), | ||
reason: "must configure dhcp4 or dhcp6, or both", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't specify either DHCP4 or DHCP6 in the wicked interface XML file, what does wicked do? Bring it up with just local addresses? Leave it down? Error out?
Addresses all of @bcressey 's comments!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥅 🐶
This commit adds a subcommand `generate-net-config` to `netdog`. It uses a user-provided TOML file `net.toml` to generate the proper wicked configuration files for the network interfaces in the system. This file lives on the BOTTLEROCKET-PRIVATE partition. If no file is found, `netdog` looks at the kernel command line (/proc/cmdline) for the `netdog.default-interface` prefix and configures the interface listed there. A single default interface may be defined on the kernel command line with the format: `netdog.default-interface=interface name:option1,option2`. "interface name" is the name of the interface, and valid options are "dhcp4" and "dhcp6". A "?" may be added to the option to signify that the lease for the protocol is optional and the system chouldn't wait for it. If the prefix does not exist on the kernel command line, no network configuration is generated. At configuration generation time, `netdog` writes the primary interface to `/var/lib/netdog/primary_interface`. Primary interface may be set in `net.toml`; if it is absent in the file the first interface in the list is assumed to be the primary. If the interface is passed via kernel command line, it is assumed to be the default. Previously, the `netdog install` command, which is called by wicked for configured interfaces, matched on the name of the interface (eth0) before it took any action. The `install` command writes the resolv.conf, and the current_ip file. For AWS/VMware we are explicitly only configuring `eth0`, and for Metal variants we will not be able to determine the configured interface names. This commit changes the behavior. Rather than filtering on interface name in order to write `resolv.conf` and the current IP file, we only write the files if the interface currently being configured is the primary interface (read from the previously written `primary_interface` file). Note: Previously, our network configuration for `eth0` included a few options, `arp-verify` and `arp-notify`. These options were (theoretically) meant to speed up boot time in EC2 since we know our IP won't be a duplcate and therefore we don't need to ARP for it. (That's not how networking in EC2 works) In our testing, `arp-very/notify` made no significantly measureable difference in the time it took to get from "lease request" to "committed lease". From this commit on, these options aren't part of our generated config for any variant.
This commit removes the hard-coded `eth0.xml` network configuration file in favor of using `netdog` to generate the network configuration for all variants. It adds a new systemd unit file to run `netdog generate-net-config` early in boot, before the network is up. To generate the network configuration for AWS/VMware variants, we pass `netdog.default-interface=eth0:dhcp4,dhcp6?` which `netdog` interprets and generates network config similar to the hardcoded file previously used for these variants. For metal variants, we decided to use systemd-udevd's predictable device naming so we can count on network devices being named identically every boot. We currently pass `net.ifnames=0` on the kernel command line which disables predictable naming, which is fine for AWS and VMware variants as hardware is controlled and instances typically initially come up with a single interface in the same PCIe location. In order to continue using `net.ifnames=0` for AWS/VMware, we move this parameter out of the default kernel command line to the KERNEL_PARAMETERS section of each variants `Cargo.toml`. Aside: We previously passed `biosdevname=0` on the kernel command line as well. `biosdevname` is a udev helper utility written by Dell for consistent device naming based on SMBIOS info. We don't currently package the helper or include the udev rule that uses it, so we have removed the `biosdevname` parameter entirely.
^ Fixed the last couple nits! |
Description of changes:
Until now, metal variants were using the same hard-coded
eth0.xml
Wicked configuration file as other variants. This set of commits adds the initial support for user-supplied network interface configuration. Additional options and documentation will come after some additional use and feedback - this PR enables the feature.Testing done:
Boot the following variants and ensure they come up correctly, network config is written properly, and they join the cluster, run pods, etc. Also ensure that the generated network configuration matches what we currently use today, minus the arp-verify/notify bits (see netdog commit message above). Ensure
primary_interface
is written.For metal (tested on on my SuperMicro SYS-E200-8D):
net.toml
/kernel argnet.toml
and test all 4 interfaces workprimary
key the first interface is chosencurrent_ip
is written with the IP of the primary interfacenet.toml
get a configuration file written, and if they do not have an active link that wicked waits for themTerms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.