Skip to content

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Apr 24, 2022

Addressing

  • Create common "address" module for determining well-known addresses
  • Provide utilities for
    • Accessing the "well-known IP addresses" of a DNS servers within an AZ.
  • Update all addresses (once again) to make room for a "reserved" rack subnet which can be "known" within an AZ.

Internal DNS service

  • Consume IPv6 addresses through the CLI, as their configuration may be deferred until runtime. They are now dependent on the rack AZ IP address.
  • During rack setup, assign the internal DNS service to the first seen sleds, explicitly

Sled Agent

  • Provide plumbing to request the allocation of GZ IP addresses when creating services. This is necessary for routing between subnets within the rack - for example, from the "reserved" rack subnet (containing the DNS service) to the "sled-specific" subnets (Nexus, Sled Agent, etc) and vice versa.

@smklein smklein changed the base branch from main to service-discovery-in-a-zone April 24, 2022 04:12
@smklein smklein changed the title Internal dns assigned ips [internal-dns][sled agent] Assign IPs to the internal DNS service from the reserved rack subnet Apr 24, 2022
@smklein smklein changed the title [internal-dns][sled agent] Assign IPs to the internal DNS service from the reserved rack subnet [internal-dns][sled agent] Assign IPs to DNS service from the reserved subnet Apr 24, 2022
@smklein smklein changed the title [internal-dns][sled agent] Assign IPs to DNS service from the reserved subnet [internal-dns][sled agent] Assign IPs to DNS service from reserved subnet Apr 24, 2022
@smklein smklein marked this pull request as ready for review April 25, 2022 01:23
Base automatically changed from service-discovery-in-a-zone to main April 25, 2022 01:24
}

info!(self.log, "GZ addresses: {:#?}", service.gz_addresses);
for addr in &service.gz_addresses {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the new GZ address code a bit more? We were previously creating a GZ IPv6 address for the sled agent itself, which I think still happens when the sled agent is started by the RSS. What is the distinction between that address and these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment on the struct:

// The addresses in the global zone which should be created, if necessary
// to route to the service.
#[serde(default)]
pub gz_addresses: Vec<Ipv6Addr>,

But I'll elaborate more on that comment. The TL;DR: Most services don't need to use this field, but services using addresses outside the sled's typical /64 - such as the DNS service - do need the extra address for routing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand, but let me double-check. The DNS server may be listening on an address that's in a distinct /64 from the sled's actual /64 prefix, since it's an AZ-wide service. This address is then the actual IPv6 for that DNS server. For example, the sled may have prefix fd00:1122:3344:0101::/64, and this address could be fd00:1122:3344:0202::/64. (I may have the shared prefix length wrong there, can't remember if it's /48 or /56.) The DNS server would be listening on [fd00:1122:3344:0202::1]:53 in that case, and that address (minus the port) is in this gz_addresses field. Is that all right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AZ-wide address will always be in the 0th "reserved rack". So changing from fd00:1122:3344:0202::/64 to fd00:1122:3344:0002::/64 would make the example align a bit more with the current scheme.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DNS server may be listening on an address that's in a distinct /64 from the sled's actual /64 prefix, since it's an AZ-wide service.

That's correct.

This address is then the actual IPv6 for that DNS server.

I'm not sure I agree with this part - there are two distinct addresses here. One is for the DNS server, and one in the Global Zone is purely for routing. Outside of routing, the GZ address is currently unused.

The structures in common/src/address.rs define these conversions, which should hopefully centralize it.

To clarify:

  • The AZ subnet is a /48

  • Rack subnets are /56

  • Sled subnets are /64.

  • If we have an AZ of fd00:1122:3344::/48...

  • ... then the "reserved", zeroeth rack subnet would be fd00:1122:3344:0000::/56

  • ... the first "rack subnet which would actually be used by a rack" would be fd00:1122:3344:0100::/56.

  • ... the first "sled subnet which would actually be used by a sled" would be fd00:1122:3344:0101::/64.

Within that reserved rack subnet of fd00:1122:3344:0000::/56, each DNS server can sit on a distinct /64. This is encapsulated in the ReservedRackSubnet::get_dns_subnets method in address.rs. If we had three DNS servers, they would be using:

  • fd00:1122:3344:0001::/64
  • fd00:1122:3344:0002::/64
  • fd00:1122:3344:0003::/64

These DNS subnets are fully independent of the sleds on which the servers will execute.

As an arbitrary policy, the first address (skipping anycast) in each subnet is used for the DNS server, and the "second" address is assigned to the GZ so requests to the DNS server can be routed through the GZ to the designated zone.

For the aforementioned example subnets, that would look like the following:

  • Subnet: fd00:1122:3344:0001::/64
    • DNS address: fd00:1122:3344:0001::1
    • GZ address : fd00:1122:3344:0001::2
  • Subnet: fd00:1122:3344:0002::/64
    • DNS address: fd00:1122:3344:0002::1
    • GZ address : fd00:1122:3344:0002::2
  • Subnet: fd00:1122:3344:0003::/64
    • DNS address: fd00:1122:3344:0003::1
    • GZ address : fd00:1122:3344:0003::2

Copy link
Collaborator Author

@smklein smklein May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this out because in your example, you mentioned that the DNS address might be part of fd00:1122:3344:0202::/64.

I don't think this could happen - If this was a sled subnet, the AZ subnet would be:
fd00:1122:3344::/48
So the "reserved rack subnet" would be the first /56 in this /48, which would be:
fd00:1122:3344:0000::/56

Meaning the reserved rack subnet address range stretches from
fd00:1122:3344:: - fd00:1122:3344:00FF:FFFF:FFFF:FFFF:FFFF

So fd00:1122:3344:0202::1 would be outside this range - that's an address belonging to a "real" rack and a "real" sled, not the reserved rack subnet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry -- I didn't mean to imply that the DNS prefixes aren't the zeroth subnet. That was a (bad) example.

But I think this does clarify the point I was missing: DNS is run in a non-global zone. That means the GZ address here is needed to be able to route any traffic from the GZ into the non-global DNS zone. It seems like this GZ address bit will go away when we get the etherstub-based VNICs for all our services plumbed through.

Thanks, all good now.

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, a few clarifying questions and one nit. Thanks for the work here!

@smklein smklein merged commit 962b66b into main May 4, 2022
@smklein smklein deleted the internal-dns-assigned-ips branch May 4, 2022 18:35
leftwo pushed a commit that referenced this pull request Nov 11, 2025
Crucible changes are:
Print file name for extents (#1811)
Add threads argument to `crucible-downstairs verify` (#1807)
Add `--verbose` option to `crucible-verify-raw` (#1806)
Restore `--gen` argument for binaries (#1805)
Bump to 2024 edition (#1799)
Perform reconciliation if all three downstairs are in live-repair (#1784)
Rename crucible-dtrace -> crucible-utils (#1803)
Add `crucible-verify-raw` and `crucible-raw-extent packages` (#1800)
Added extent-info to dump out region/extent/block specific offsets (#1797)

Propolis changes are:
Rework resource accessors to alleviate lock contention
Implement NVMe Doorbell Buffer feature
Overhaul block attachment and request dispatch
propolis-cli should be able to send TOML-defined CPU profiles (#943)
nvme: CQEs with command-specific error 0 are acceptable (#965)
leftwo added a commit that referenced this pull request Nov 13, 2025
Update Propolis and Crucible
    
Crucible changes are:
Print file name for extents (#1811)
Add threads argument to `crucible-downstairs verify` (#1807)
Add `--verbose` option to `crucible-verify-raw` (#1806)
Restore `--gen` argument for binaries (#1805)
Bump to 2024 edition (#1799)
Perform reconciliation if all three downstairs are in live-repair
(#1784)
Rename crucible-dtrace -> crucible-utils (#1803)
Add `crucible-verify-raw` and `crucible-raw-extent packages` (#1800)
Added extent-info to dump out region/extent/block specific offsets
(#1797)
    
Propolis changes are:
Rework resource accessors to alleviate lock contention
Implement NVMe Doorbell Buffer feature
Overhaul block attachment and request dispatch
propolis-cli should be able to send TOML-defined CPU profiles (#943)
nvme: CQEs with command-specific error 0 are acceptable (#965)

I also changed a bunch of `gen` -> `generation` as that is now what
Crucible has.

---------

Co-authored-by: Alan Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants