Skip to content

Connect: Detect & reissue expired db certs#17950

Merged
ravicious merged 20 commits intomasterfrom
ravicious/expired-db-certs
Dec 6, 2022
Merged

Connect: Detect & reissue expired db certs#17950
ravicious merged 20 commits intomasterfrom
ravicious/expired-db-certs

Conversation

@ravicious
Copy link
Copy Markdown
Member

@ravicious ravicious commented Oct 31, 2022

Closes gravitational/webapps.e#299 & closes gravitational/webapps.e#360.

Needs #18740 to be merged first.

webapps counterpart: gravitational/webapps#1383

Out of 2332 additions, ~1500 are proto updates and 481 are tests, leaving us with about 350 lines of actual implementation code. Best reviewed commit by commit.

Why

If the user attempts to make a connection through a local db proxy but the db & user certs have expired, we want Connect to bring focus to the Connect app, let the user relogin and then proxy the connection.

This is how the end result looks like:

connect-cert-renewal.mov

#18259 added a custom middleware for alpnproxy.LocalProxy which checks the cert used by the gateway when there's an incoming connection. If the cert has expired, the middleware fires the callback it received.

#18259 added that middleware but it wasn't actually used in lib/teleterm. This PR makes use of that middleware to implement the business logic required to reissue the gateway cert.

How

The happy path of the whole flow looks like this:

sequenceDiagram
    participant user as db client
    participant app
    participant tsh as gateway goroutine
    participant cluster
    note over user,cluster: Expired gateway cert, expired cluster cert
    user->>tsh: Make a connection to the db proxy
    activate user
    tsh->>tsh: Notice that the db cert has expired
    tsh->>cluster: Reissue the cert
    cluster->>tsh: Error (expired cluster cert)
    tsh->>app: Ask for relogin
    activate app
    app->>cluster: Log in
    cluster->>app: Success
    app->>tsh: Success
    deactivate app
    tsh->>cluster: Reissue the cert
    cluster->>tsh: Success
    tsh->>user: Proxy the connection
    deactivate user
Loading

The initial call to reissue the certs is done in case the cert used by the gateway has expired but the user cert is valid. This can happen if you have two gateways running. The first one will run through the flow shown above, while the second will go through the one below:

sequenceDiagram
    participant user as db client
    participant app
    participant tsh as gateway goroutine
    participant cluster
    note over user,cluster: Expired gateway cert, valid cluster cert
    user->>tsh: Make a connection to the db proxy
    activate user
    tsh->>tsh: Notice that the db cert has expired
    tsh->>cluster: Reissue the cert
    cluster->>tsh: Success
    tsh->>user: Proxy the connection
    deactivate user
Loading

Error handling

Any action taken by the gateway goroutine (outgoing arrows) can fail. If it fails, the connection to the db is closed. Since the action is performed in a totally separate goroutine, the user would have no way of seeing the exact error, other than looking at the logs.

To make it clearer for the user as to why the connection has failed, the struct performing the actual action forwards any error to the Electron app so that it can be shown to the user in form of a notification.

Mutex on relogin request

Since we can only show one modal at a time but multiple clients can attempt to make simultaneous connections through proxies set up by Connect, the action of asking the Electron app to show the login modal is guarded by a mutex. If there's already a relogin request in flight, any other attempt will fail immediately with an appropriate error message.

@ravicious ravicious force-pushed the ravicious/expired-db-certs branch 2 times, most recently from 9bb2795 to 9338cf1 Compare November 3, 2022 14:58
@ravicious ravicious changed the title ravicious/expired-db-certs Connect: Reissue certs for db proxies Nov 8, 2022
@ravicious ravicious force-pushed the ravicious/expired-db-certs branch 2 times, most recently from 499797b to f9dfbf5 Compare November 8, 2022 16:39
@ravicious ravicious force-pushed the ravicious/expired-db-certs branch 4 times, most recently from d094631 to f8ff02f Compare November 25, 2022 17:14
@ravicious ravicious changed the base branch from master to ravicious/check-cert-subject November 25, 2022 17:15
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we pass a structured message as the reason so that the Electron app can show an appropriately formatted message in the login modal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the notification subject: instead of accepting a string, we accept a structured message and let the Electron app handle exact message in the UI.

Comment thread lib/teleterm/daemon/gateway_cert_reissuer_test.go Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the method that implements the logic described in the PR description.

@ravicious ravicious changed the title Connect: Reissue certs for db proxies Connect: detect & reissue expired db certs Nov 25, 2022
@ravicious ravicious changed the title Connect: detect & reissue expired db certs Connect: Detect & reissue expired db certs Nov 25, 2022
@ravicious ravicious marked this pull request as ready for review November 25, 2022 18:14
@github-actions github-actions Bot requested review from AntonAM and r0mant November 25, 2022 18:14
Copy link
Copy Markdown
Contributor

@AntonAM AntonAM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comment thread lib/teleterm/daemon/daemon.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then rename it to something like ResolveClusterFromStorage?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then rename it to something like ResolveClusterFromStorage?

Good idea! I'll create a separate PR for that to limit the number of unrelated changes in this one.

Comment thread lib/teleterm/api/proto/v1/tshd_events_service.proto Outdated
Comment thread lib/teleterm/daemon/daemon.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer_test.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go Outdated
Base automatically changed from ravicious/check-cert-subject to master November 29, 2022 11:20
alpnproxy.LocalProxy.CheckDBCerts already emits a log statement, so this
line was just noise.
This is irrelevant to this PR but I figured I'd do this while I'm editing
this file.
This will let us create expired user certs by providing a negative TTL.
@ravicious ravicious force-pushed the ravicious/expired-db-certs branch from f8ff02f to 3dc875a Compare November 29, 2022 11:28
@ravicious ravicious force-pushed the ravicious/expired-db-certs branch from 2f21e15 to 90cca1e Compare November 29, 2022 11:33
@ravicious ravicious requested a review from AntonAM November 29, 2022 12:17
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go
Copy link
Copy Markdown
Contributor

@GavinFrazar GavinFrazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just have a few nits and suggestions

Comment thread lib/teleterm/daemon/daemon.go Outdated
Comment thread lib/teleterm/daemon/daemon.go
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer_test.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer_test.go Outdated
Comment thread lib/teleterm/daemon/gateway_cert_reissuer_test.go Outdated
Comment thread integration/teleterm_test.go Outdated
"github.com/gravitational/teleport/lib/teleterm/daemon"
)

func TestTeletermGatewayCertRenewal(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test took 17s on my machine, which is bit heavy. Ideally we could speed it up a bit, but at a minimum we should see if we can make it a parallel test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test took 17s on my machine, which is bit heavy. Ideally we could speed it up a bit, but at a minimum we should see if we can make it a parallel test.

I have some ideas on how to make it faster, I have just asked the db access team about feedback on them.

Is it okay if I address that in a separate PR or should wait with merging this? Making the test faster would likely require changes in shared helpers for which I'd want to get some feedback from the db access team.

GCB says the past two runs of this test took ~19s. I think I could cut this time by about 40%

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw 17s as well, but 16.5s of that was just setting up the test clusters. This test doesn't actually need the leaf cluster either, and by commenting out the code that sets up leaf cluster and trust in the setup the time was reduced to ~8 seconds. We should really make that leaf cluster optional.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, we should test Connect gateways with a leaf cluster too. I'll move this test into a separate file in integration/proxy and run it as part of TestALPNSNIProxyDatabaseAccess so that the setup time is amortized.

@github-actions github-actions Bot removed the request for review from r0mant December 5, 2022 16:02
@ravicious ravicious merged commit cb3e4d9 into master Dec 6, 2022
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 6, 2022

@ravicious See the table below for backport results.

Branch Result
branch/v11 Failed

@ravicious ravicious deleted the ravicious/expired-db-certs branch December 6, 2022 11:33
ravicious added a commit that referenced this pull request Dec 6, 2022
* Add TTL field to integration/helpers.UserCredsRequest

This will let us create expired user certs by providing a negative TTL.

* Reissue gateway cert if middleware detects it expired

* Add integration test for gateway cert renewal
ravicious added a commit that referenced this pull request Dec 12, 2022
* Add TTL field to integration/helpers.UserCredsRequest

This will let us create expired user certs by providing a negative TTL.

* Reissue gateway cert if middleware detects it expired

* Add integration test for gateway cert renewal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants