-
Notifications
You must be signed in to change notification settings - Fork 66
[nexus] Add a new user for service balancing #1234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 51 commits
b78ff98
cca5795
fccc15c
2443215
a077bd4
f91cea1
d16eda2
8db30b7
3a0c6ba
33b3e02
3eb57dc
39aa9ff
dd04a67
63b6379
e1dc941
a4309ac
02f592d
ff2d7b9
a261155
6cc7864
2a035a5
e84faaf
c3a49bb
1e0b8fe
da4a2b8
d7b10cf
bb9a3af
fed4a3d
4df23c2
71f3aac
5556d5f
226fd94
6126e41
b01bffd
d09c8d5
e4f434f
62fccb2
1905985
1a0b61b
f5ee394
d6e3c9d
fd8286a
bed0269
ef6072d
b959c39
470da8b
a23a036
56d2e1c
13b9825
e1a912f
28d87f5
5fa89fe
a5fb65a
a5784c1
f7d7796
01a5fa5
52357a6
71b40e7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,6 +24,13 @@ lazy_static! { | |
| *FLEET_ID, | ||
| role_builtin::FLEET_ADMIN.role_name, | ||
| ), | ||
| RoleAssignment::new( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we can limit privileges more than this...but I imagine it's not worth much of our time right now to pick this apart. Up to you.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm gonna defer this while we're still sorting out the fundamental operations Nexus needs to take. Right now, everything seems to be lumped into the |
||
| IdentityType::UserBuiltin, | ||
| user_builtin::USER_SERVICE_BALANCER.id, | ||
| role_builtin::FLEET_ADMIN.resource_type, | ||
| *FLEET_ID, | ||
| role_builtin::FLEET_ADMIN.role_name, | ||
| ), | ||
|
|
||
| // The "internal-read" user gets the "viewer" role on the sole | ||
| // Fleet. This will grant them the ability to read various control | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really supposed to be a rack, or is this the instance of the control plane (which we currently take to be the Fleet, although that's not quite right either)? (I think it'd be a bad idea to depend on rack_id == a unique identifier for the control plane and I've been trying to avoid that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intended to identify the rack on which this Nexus is running.
RSS initializes this value, and transfers it to nexus , as of #1217
Note, the major change of that patch is the source of that rack ID, rather than the existence of a rack ID at all. Previously the rack ID was randomly generated by nexus in
nexus/server/lib.rs:omicron/nexus/src/lib.rs
Line 170 in 9a4e7bf
So, that change moves the source of the rack ID from "random every time nexus boots" to "set once during RSS initialization, then stored in the DB".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. If I'm reading the code in "main" correctly right now, it seems like maybe this can go away entirely? We store "rack_id" in the
Nexusstruct but only so that we can implementracks_listandrack_get. If we're moving the rack record to the database, then I imagine we don't need these any more?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need a way to instruct Nexus "which rack should we be querying from the DB" to check the RSS handoff described in RFD 278. This is currently done by passing this
rack_idvalue, and checking it on boot.Additionally, when performing background tasks in a subsequent PR (#1241), Nexus attempts to manipulate state which is local to the current rack - such as asking the question, "do I have enough CRDB instances on my rack?"
Some state is still larger than rack scope - for example, internal DNS servers are allocated within a subnet that is shared AZ-wide - these are allocated without referencing the rack ID.
However, in general, I had the expectation that "each rack would be running at least one Nexus," so it could be in charge of managing rack-wide state. Is this wrong? Should a single Nexus be trying to ensure that all services are running across all racks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about this part of RFD 278:
I think the scope of this is "the control plane instance", not "the rack". If we had two racks, whether they were set up together or not, I think we're only going to go through the RFD 57 process (and Nexus handoff) once. So I think this state should probably be global in CockroachDB, not scoped to a rack.
I'd assumed it was the latter. The original goal was that Nexus instances are fungible at least within an AZ. That's also why RFD 248 considers "nexus" a single DNS service, rather than having each group-of-Nexuses-within-a-rack have its own DNS name.
I gather you're assuming (1) there's at least one Nexus in each rack and (2) Nexus instances are only responsible for managing their own rack. I think that's got several downsides. We can have multiple Nexus instances within a rack, so we still need to build some coordination to avoid them stomping over each other. Once we've built that, it's extra work (that we otherwise wouldn't need to do) to constrain their responsibilities to only cover their own rack. It's also extra work to enforce the deployment constraints and verify them with runtime monitoring. I think it'd be a lot simpler to say that the Nexus instances need to coordinate on changes (which they need to do anyway, as I mentioned) and then all we need to ensure is that there are enough Nexus instances in enough different racks to maintain our availability goals. That leaves the deployment system with much more flexibility about where to put them.
I think eliminating constraint (2) also has a big impact on efficiency. With that constraint, we basically need two Nexus instances in each rack to survive a single failure and we need three instances to survive a double failure. So a 10-rack deployment would need 30 Nexus instances to survive 2 failures. Without this constraint (i.e., letting Nexus instances operate on all racks, with coordination) you'd only need 5 instances to survive any two failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called a
ClusterIdin RFD 238.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand how you want me to eliminate this constraint in the current implementation.
To fully satisfy the one-to-many relationship of Nexuses to racks, we'd need to:
To be clear, I think that is the right long-term direction, but I'm also concerned it's a lot of work for something that isn't relevant for v1.
My prototype implementation of service management within Nexus can specify
ServiceRedundancyas an enum, and it explicitly specifiesPer-Rack, because that's all we have right now. When we approach the multi-rack case, we could change this redundancy value to "per-fleet" or "per-az" - but I hesitate to include that now, since we won't really be able to validate that without building out more genuine end-to-end multi-rack support.So, backing up: what's the action item here? Should I be deferring the RSS handoff patches, and instead favor deleting all locally-stored notions of "Rack ID"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like I'm missing something important about RSS's involvement in a multi-rack world, so let's table this for now.
I remain a little worried about this:
I think we're absolutely going to need to support multiple Nexuses in the v1 product, even being only one rack. Otherwise, the control plane can't survive a single failure. I don't really understand the pieces you said have to happen. Maybe we can discuss at some point.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, to be explicit, in the current implementation:
When RSS initializes a new rack, it executes operations like "create a new CRDB instance, and format all storage - resulting in an empty cluster".
For a single-rack use-case, this works fine - if we're initializing the rack, there is no other fleet-wide metadata storage to consider.
For a multi-rack scenario, it isn't always appropriate to initialize CRDB "from scratch" here. If other racks exist, we wouldn't want to create partitioned clusters of CRDB, As you referenced in RFD 61:
I assume we'd want some way of specifying: "here is the region, here is the existing fleet, if CRDB nodes already exist, join them instead of starting a totally new cluster".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #1276