-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix building unified delta WAL, unified delta CRLs #20058
Conversation
9515ce8
to
6c36430
Compare
return fmt.Errorf("unable to create cross-cluster unified last delta CRL WAL entry: %w", err) | ||
} | ||
if err = sc.Storage.Put(sc.Context, lastWALEntry); err != nil { | ||
return fmt.Errorf("error saving cross-cluster unified last delta CRL WAL entry: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this Put operation could roll us back to an older wal entry if newer ones are coming in while we are recovering from a previous disconnect. Not sure if re-reading the local values before putting to see if they are the same would be a good enough test to avoid the race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevendpclark From our call, I've pushed the change identified about running periodic
when the listing was different. In this case, this Put
call on the old entry could at worse cause us to run again (if we encounter an error) and force more than one delta CRL rebuild (of otherwise identical delta CRLs). We have this scenario in a few other places, so I think its fine to leave this as-is, without the re-read.
In the event of issues with replication, most users will likely want to issue a full CRL rebuild on all clusters (including unified and local CRLs).
When building the unified delta CRL, WAL entries from the non-primary cluster were ignored. This resulted in an incomplete delta CRL, preventing some entries from appearing. Signed-off-by: Alexander Scheel <[email protected]>
When deciding if the Unified Delta CRL should be rebuilt, we need to check the status of all clusters and their last revoked serial numbers. If any new serial has been revoked on any cluster, we should rebuild the unified delta CRLs. Signed-off-by: Alexander Scheel <[email protected]>
When building the unified CRL, we need to read the last seen serial number from all clusters, not just the present cluster, and write it to the last built serial for that cluster's unified delta WAL entry. This prevents us from continuously rebuilding unified CRLs now that we have fixed our rebuild heuristic. Signed-off-by: Alexander Scheel <[email protected]>
getLastWALSerial ignored its path argument, preventing it from reading the specified cluster-specific WAL entry. On the primary cluster, this was mostly equivalent, but now that we're correctly reading WAL entries and revocations for other clusters, we need to handle reading these entries correctly. Signed-off-by: Alexander Scheel <[email protected]>
6c36430
to
74b40f9
Compare
Any local delta WAL should be persisted to unified delta WAL space as well. If such unified persistence fails, we need to ensure that they get eventually moved up, otherwise they'll remain missing until the next full CRL rebuild occurs, which might be significantly longer than when the next delta CRL rebuild would otherwise occur. runUnifiedTransfer already handles this for us, but it lacked logic for delta WAL serials. The only interesting catch here is that we refuse to copy any entries whose full unified revocation entry has not also been written. Signed-off-by: Alexander Scheel <[email protected]>
This message is mostly an error and would always be helpful information to have when troubleshooting failures. Signed-off-by: Alexander Scheel <[email protected]>
When revoking certificates, we log cross-cluster revocation failures, but we should really expose this information to the caller, that their local revocation was successful, but their cross-cluster revocation failed. Signed-off-by: Alexander Scheel <[email protected]>
Delta WAL entries are empty files whose only information (a revoked serial number) is contained in the file path. These depend implicitly on a full revocation entry existing for this file (whether a cross-cluster unified entry or a local entry). We should not write unified delta WAL entries without the corresponding full unified revocation entry existing. Add a warning in this case. Signed-off-by: Alexander Scheel <[email protected]>
Signed-off-by: Alexander Scheel <[email protected]>
74b40f9
to
09d4c70
Compare
Built on top of #20057; will be rebased once that merges.
Perhaps best to review this commit by commit as it should make most sense that way. If short summaries are nice:
But due to an obvious failure in the PBPWF writer, we weren't going to see any of these bugs.