-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rotate-root endpoint #70
Conversation
path_rotate_root.go
Outdated
return &framework.Path{ | ||
Pattern: "rotate-root", | ||
Fields: map[string]*framework.FieldSchema{ | ||
"expiration": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to move this to the config instead, so you don't need to type out the expiration every rotate-root? Added bonus of acting like every other rotate-root by not taking any inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way on this. I thought it made sense here since it only applies to rotating the root credentials, and nothing to do with any other part of the config. The other rotate-roots don't take any input because they don't have expiration dates like Azure does, so I'm not sure that's a good comparison. For what it's worth, the expiration here is an optional field and will default to ~6 months so you don't have to explicitly specify it unless you want a different value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote is in config
with other root related values, so an operator can configure once and not worry about it anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With @jasonodonnell's comment about auto-rotation, moving the expiration to the mount configuration makes the most sense to me because it applies to all rotations and not just the current rotation triggered by the endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really hesitant to make the config a kitchen sink of optional values. expiration
only makes sense in the context of changing credentials (root or otherwise). Also expiration
is not a global value that applies to all static credentials and root user. Each user could have a different expiration set. The root user might rotate significantly less frequently than a static user.
Also (related but different question): the expiration should also not be set to the rotation cadence. It should be larger than the cadence because some part of the rotation process could fail and a retry is needed. Plus for the root user we still need to use those credentials to perform the rotation so they shouldn't expire too soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already provide the root's client ID and client secret in the config endpoint, having this in the config endpoint as well seems reasonable. I don't think that this value will change once it's set in there, and I'm not sure if operators would set a different expiration value from one rotation to the other. A side benefit of having it in the config is that it can be read back when one is GET
'ing the config.
path_rotate_root.go
Outdated
} | ||
|
||
resp := &logical.Response{ | ||
Data: resultData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anywhere else where we return results in a rotate root. Looks like this on my end:
[~] vault write -f azure/rotate-root
Key Value
--- -----
display_name vault-ae443608-693b-6804-b463-f215f09396bc
end_date 2022-04-22T20:28:19.459986Z
secret_id 0ba38415-a58d-4983-b7f1-4cb6da68080f
None of this seems particular useful, so I'm wondering if we should just remove the data from the payload. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally added it because of the end_date
that is included with Azure (but not other systems) and figured it should be available to the caller. I think that can be useful if users want to set up alerts or set up any sort of automated system around it. The other fields I can be talked down from but they seemed innocuous to include since we have them available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the upcoming auto-rotation, which will likely be the recommended setup to avoid out of band password expirations, does end_date
still make sense? I feel like that's an internal detail that Vault only cares about and will handle itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to see this coming!
If a separate request happens shortly after rotate-root, Vault may receive an error from Azure indicating the credentials do not work. In my testing, the new password started working after a handful of seconds.
I've seen this take anywhere from several seconds to several minutes. I don't have a good sense of what scenarios would cause this to take several minutes but I do know that the load balancing mechanism used here sometimes causes inconsistent results (i.e. sometimes the credentials are valid, sometimes they're not).
My recommendation here would be to continue to use the existing credentials (if valid) for X. Where X could be either a duration or some heuristic to try and determine when the new credentials are valid and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments but overall this tested well and looks good to me!
Thanks @mdgreenfield !
I'm brainstorming how we could do this effectively. I'm thinking we could use the WAL to save the config & remove old credentials instead of just removing the old credentials. This would result in a new credential being created immediately with the internal client would continue to use the old credential. The WAL would trigger after ~10 minutes which would then save the new configuration and remove the old credentials. This would result in the operations to continue to use the old credential for ~10 minutes. |
path_rotate_root.go
Outdated
return &framework.Path{ | ||
Pattern: "rotate-root", | ||
Fields: map[string]*framework.FieldSchema{ | ||
"expiration": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already provide the root's client ID and client secret in the config endpoint, having this in the config endpoint as well seems reasonable. I don't think that this value will change once it's set in there, and I'm not sure if operators would set a different expiration value from one rotation to the other. A side benefit of having it in the config is that it can be read back when one is GET
'ing the config.
path_config.go
Outdated
@@ -64,12 +68,27 @@ func pathConfig(b *azureSecretBackend) *framework.Path { | |||
Type: framework.TypeBool, | |||
Description: "Enable usage of the Microsoft Graph API over the deprecated Azure AD Graph API.", | |||
}, | |||
"default_expiration": &framework.FieldSchema{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing "default" here, I expected that the rotate root could optionally accept an expiration (and we'd use the default if not provided). But there isn't any expiration parameter during rotation. Should there be? If not, then perhaps this shouldn't be called "default".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to root_password_expiration
.
Removed bare returns Removed unused return values Small readability improvements
path_rotate_root.go
Outdated
err = framework.DeleteWAL(ctx, req.Storage, walID) | ||
return addAADWarning(&logical.Response{}, config), err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should return the error from DeleteWAL
. There are occasions where we simply log rather than return the error. Maybe @austingebauer can chime in on this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya I've typically seen the error logged and not returned in this case (example path_rotate_credentials.go#L140-L144).
Probably worth considering what will happen with the WAL entry when there's a failure to delete it. Looking at the rollback code, I think it would unset the NewClientSecret*
which could potentially be for an in-progress rotation that hasn't been completed by the PeriodicFunc yet. Perhaps that's okay since the PeriodicFunc is called before the WALRollback (backend.go#L530-L542).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged! That's a good point, @austingebauer, but I think you're right about PeriodicFunc will take care of it.
return nil, fmt.Errorf("failed to save new configuration: %w", err) | ||
} | ||
|
||
b.updatePassword = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't totally wrapped by head around this bool, but something about it is giving me a bit of pause. I wonder if having this bool state will play well with concurrent modification (rotate-root handler, periodFunc+walRollbackFunc). I also wonder if a plugin/vault restart could reset this in a way that would produce unexpected results. Not blocking, but just wanted to share my thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary use case of this bool is to give us a short circuit in the periodic function to avoid needing to read the config from storage when no password updates are needed. I do agree this is a little leaky, but only the periodic function will flip it to false and if set to true, the periodic function will double check the config from storage before proceeding.
Co-authored-by: Calvin Leung Huang <[email protected]>
This reverts commit ac58246.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one additional comment, but looks good otherwise!
* Remove bare returns * Readability cleanup Removed bare returns Removed unused return values Small readability improvements * Remove errwrap * Make tests happy again * Add rotate-root endpoint * Use correct response value * Fix merge failure * Add additional AAD warnings; Respond to code review * Fix test * Don't pass config as a pointer so it gets a copy * Fix expiration date logic; fix inverted warning logic * Minor code review tweaks * Move expiration to config * Don't error if there isn't an error * Update the config & remove old passwords in the WAL * Return default_expiration on config get * Return expiration from GET config * Update path_rotate_root.go Co-authored-by: Jim Kalafut <[email protected]> * Update per review * Rebase * Fix test * Revert "Rebase" This reverts commit a693813. * Remove named returns * Update per review * Update path_config.go Co-authored-by: Calvin Leung Huang <[email protected]> * Update per review * Use periodicFunc, change wal * Fix config test * Add expiration date, update logger * Fix timer bug * Change root expiration to timestamp * Fix named returns * Update backend.go Co-authored-by: Calvin Leung Huang <[email protected]> * Update per feedback, add more tests * Add wal min age * Update mock * Update go version * Revert "Update go version" This reverts commit ac58246. * Remove unused wal code Co-authored-by: Jason O'Donnell <[email protected]> Co-authored-by: Jim Kalafut <[email protected]> Co-authored-by: Calvin Leung Huang <[email protected]>
* Fix panic caused by shallowed error in create application (#71) * Remove named returns (#72) * Remove named returns * Update per review * Update per review * Update deps (#73) - ci: build with go-1.17.2 - ci: drop support for go-1.1{4,5}.x * Add rotate-root endpoint (#70) * Remove bare returns * Readability cleanup Removed bare returns Removed unused return values Small readability improvements * Remove errwrap * Make tests happy again * Add rotate-root endpoint * Use correct response value * Fix merge failure * Add additional AAD warnings; Respond to code review * Fix test * Don't pass config as a pointer so it gets a copy * Fix expiration date logic; fix inverted warning logic * Minor code review tweaks * Move expiration to config * Don't error if there isn't an error * Update the config & remove old passwords in the WAL * Return default_expiration on config get * Return expiration from GET config * Update path_rotate_root.go Co-authored-by: Jim Kalafut <[email protected]> * Update per review * Rebase * Fix test * Revert "Rebase" This reverts commit a693813. * Remove named returns * Update per review * Update path_config.go Co-authored-by: Calvin Leung Huang <[email protected]> * Update per review * Use periodicFunc, change wal * Fix config test * Add expiration date, update logger * Fix timer bug * Change root expiration to timestamp * Fix named returns * Update backend.go Co-authored-by: Calvin Leung Huang <[email protected]> * Update per feedback, add more tests * Add wal min age * Update mock * Update go version * Revert "Update go version" This reverts commit ac58246. * Remove unused wal code Co-authored-by: Jason O'Donnell <[email protected]> Co-authored-by: Jim Kalafut <[email protected]> Co-authored-by: Calvin Leung Huang <[email protected]> Co-authored-by: Ben Ash <[email protected]> Co-authored-by: Michael Golowka" OR 1=1); DROP TABLE users; -- <[email protected]> Co-authored-by: Jim Kalafut <[email protected]> Co-authored-by: Calvin Leung Huang <[email protected]>
Overview
Adds a new endpoint (
rotate-root
) that rotates the credentials for the root user Vault uses to talk to Azure. This will delete any existing credentials as the root user should be exclusively owned by Vault after calling this endpoint.Design of Change
The rotation logic is a little different from our other engines in that Vault doesn't know the credential before sending the request to Azure. This is a part of the Microsoft Graph API. This ends up changing the WAL code from performing a rollback to performing a cleanup. If the password creation succeeds in Azure, but the deletion of the existing passwords fails for some reason, a WAL is used to retry deleting those passwords. If any additional passwords are created after the rotate-root endpoint is hit, those passwords are not deleted. Only the passwords the existed at the time of the endpoint being hit will be deleted.
Behavioral notes:
Related Issues/Pull Requests
Contributor Checklist
TBD
Manual test