Skip to content

Cache PIV connections to share across the program execution#47091

Merged
gzdunek merged 18 commits intomasterfrom
gzdunek/concurrent-yubikey-access
Oct 16, 2024
Merged

Cache PIV connections to share across the program execution#47091
gzdunek merged 18 commits intomasterfrom
gzdunek/concurrent-yubikey-access

Conversation

@gzdunek
Copy link
Copy Markdown
Contributor

@gzdunek gzdunek commented Oct 2, 2024

Closes #46372
I think this fix should close the linked issue, but unfortunately I couldn't reproduce it.
This also provides a foundation for the hardware keys support in Connect which issues many requests asynchronously.

This PR is in 90% based on @Joerger work on caching PIV connections. Thanks to it, subsequent calls to the YubiKey don't have to constantly open & close a connection (which is time-consuming and blocks the key), but instead use a shared connection directly.
However, the shared connection can't be kept open for the lifetime of the program, because it would block access to the YubiKey for other programs (for example, another instance of tsh or tctl). Therefore, the connection is released after 5 seconds.

In addition to Brian's changes, I added a "warm up" call to the YubiKey, as proposed by Nic. This is mainly needed for Connect so that we can wait indefinitely for the user to enter a PIN/touch. But what's more important, if the user cancels the prompt, we can return early, before we start dialing. Otherwise, the gRPC retry mechanism will try to establish a connection, showing the same prompt to the user over and over again.
I think "warming up" the key is useful for tsh too.

Best to review commit by commit.

changelog: Resolved an issue that caused false positive errors incorrectly indicating that the YubiKey was in use by another application, while only tsh was accessing it

Comment thread api/utils/keys/yubikey.go
// Backoff and retry for up to 1 second.
retryCtx, cancel := context.WithTimeout(context.Background(), time.Second)
// Backoff and retry for a time slightly greater than releaseConnectionDelay.
retryCtx, cancel := context.WithTimeout(context.Background(), releaseConnectionDelay+100*time.Millisecond)
Copy link
Copy Markdown
Contributor Author

@gzdunek gzdunek Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this timeout needs to greater than the timeout for releasing a connection. Otherwise, a caller that doesn't use the shared connection has basically no chance to open the YubiKey.

Unfortunately, this also made the tests take much longer (6-11 seconds for a test). I tried to reuse the same YubiKey object as much as possible in the tests, but keys.GetYubiKeyPrivateKey() opens its own connection, and I'm not able to easily pass YubiKey that I get from keys.FindYubiKey() earlier.

Comment thread api/client/proxy/client.go Outdated
Comment thread api/client/proxy/client.go Outdated
@gzdunek gzdunek requested review from Joerger and nklaassen October 2, 2024 16:19
@gzdunek gzdunek marked this pull request as ready for review October 2, 2024 16:19
@github-actions github-actions Bot requested review from rudream and zmb3 October 2, 2024 16:20
@gzdunek gzdunek removed request for rudream and zmb3 October 2, 2024 16:21
@gzdunek
Copy link
Copy Markdown
Contributor Author

gzdunek commented Oct 7, 2024

Friendly ping @nklaassen, @Joerger

Comment thread api/client/proxy/client.go Outdated
Comment thread api/utils/keys/yubikey.go Outdated
Comment thread api/client/proxy/client.go
Copy link
Copy Markdown
Contributor

@Joerger Joerger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it with tsh, LGTM, barring some minor comments.

Comment thread api/utils/keys/yubikey.go Outdated
Comment thread api/utils/keys/yubikey.go Outdated
Comment thread api/client/proxy/client.go
Comment thread api/utils/keys/yubikey.go Outdated
@gzdunek gzdunek requested a review from Joerger October 10, 2024 10:44
Comment thread api/utils/keys/yubikey.go
Comment on lines +694 to +718
release := func() {
c.mu.Lock()
defer c.mu.Unlock()
if c.activeConnections > 0 {
c.activeConnections--
}

// If this is the last active connection, close after a delay,
// giving other callers a chance to claim the connection first.
if c.activeConnections == 0 {
go func() {
time.Sleep(releaseConnectionDelay)

c.mu.Lock()
defer c.mu.Unlock()

// Close the shared connection if there are still no new active connections,
// and the connection hasn't been yet closed by another goroutine.
if c.activeConnections == 0 && c.conn != nil {
c.conn.Close()
c.conn = nil
}
}()
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fix and a couple nits.

fix: We should actually sleep from the start, before decrementing active connections. Otherwise, we could have the following situation: client connects, client disconnects. 4 seconds later, second client connects, and disconnects. 1 second later, the first client closes the connection. Instead, we should wait 5 seconds from the second client disconnect before closing. So we need to run the goroutine+sleep before decrementing the active connections.

nit: Now that we are decrementing and closing under the same lock, we can safely remove the c.conn != nil check, this should never be the case anymore.

nit2: We also don't need the if c.activeConnections > 0 check. Every connect should be accompanied by release, so active connections will always be > 0 in this check, before it's decremented to its minimum value 0.

IMO the nits make it easier to implement bugs in the future, I'd rather the system fail loudly with faulty code (panic) so we fix it promptly.

Suggested change
release := func() {
c.mu.Lock()
defer c.mu.Unlock()
if c.activeConnections > 0 {
c.activeConnections--
}
// If this is the last active connection, close after a delay,
// giving other callers a chance to claim the connection first.
if c.activeConnections == 0 {
go func() {
time.Sleep(releaseConnectionDelay)
c.mu.Lock()
defer c.mu.Unlock()
// Close the shared connection if there are still no new active connections,
// and the connection hasn't been yet closed by another goroutine.
if c.activeConnections == 0 && c.conn != nil {
c.conn.Close()
c.conn = nil
}
}()
}
}
release := func() {
go func() {
time.Sleep(releaseConnectionDelay)
c.mu.Lock()
defer c.mu.Unlock()
c.activeConnections--
if c.activeConnections == 0 {
c.conn.Close()
c.conn = nil
}
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, we could have the following situation: client connects, client disconnects. 4 seconds later, second client connects, and disconnects. 1 second later, the first client closes the connection. Instead, we should wait 5 seconds from the second client disconnect before closing.

Yeah, I thought about it, but for some reason it seemed to me that the correct behavior is to close the connection only after waiting for the first client to release 🤦. But I agree, it makes much more sense to extend the duration with each connected client.
Fix applied.

Copy link
Copy Markdown
Contributor

@nklaassen nklaassen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm assuming Brian's suggesting here gets accepted/resolved https://github.com/gravitational/teleport/pull/47091/files#r1797232730

@gzdunek gzdunek requested a review from Joerger October 14, 2024 13:17
@gzdunek gzdunek enabled auto-merge October 16, 2024 06:37
@gzdunek gzdunek added this pull request to the merge queue Oct 16, 2024
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Oct 16, 2024
@gzdunek gzdunek added this pull request to the merge queue Oct 16, 2024
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Oct 16, 2024
@gzdunek gzdunek added this pull request to the merge queue Oct 16, 2024
Merged via the queue into master with commit bd6fdbf Oct 16, 2024
@gzdunek gzdunek deleted the gzdunek/concurrent-yubikey-access branch October 16, 2024 08:20
@public-teleport-github-review-bot
Copy link
Copy Markdown

@gzdunek See the table below for backport results.

Branch Result
branch/v14 Failed
branch/v15 Failed
branch/v16 Failed

gzdunek added a commit that referenced this pull request Oct 16, 2024
* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)
mvbrock pushed a commit that referenced this pull request Oct 16, 2024
* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>
gzdunek added a commit that referenced this pull request Oct 25, 2024
* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)
gzdunek added a commit that referenced this pull request Oct 25, 2024
* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)
gzdunek added a commit that referenced this pull request Oct 25, 2024
* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)
@Joerger Joerger mentioned this pull request Nov 5, 2024
github-merge-queue Bot pushed a commit that referenced this pull request Nov 21, 2024
…7953)

* Cache PIV connections to share across the program execution (#47091)

* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)

* Sign a hashed message in hardware key warmup call (#48206)

Otherwise, signing may fail with "input must be a hashed message" error.

(cherry picked from commit 47494db)

* Remove delayed closing of yubikey connection to prevent the connection from leaking after program execution. (#48414)

(cherry picked from commit b7c0e79)

---------

Co-authored-by: Brian Joerger <bjoerger@goteleport.com>
github-merge-queue Bot pushed a commit that referenced this pull request Nov 21, 2024
…7952)

* Cache PIV connections to share across the program execution (#47091)

* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)

* Sign a hashed message in hardware key warmup call (#48206)

Otherwise, signing may fail with "input must be a hashed message" error.

(cherry picked from commit 47494db)

* Remove delayed closing of yubikey connection to prevent the connection from leaking after program execution. (#48414)

(cherry picked from commit b7c0e79)

---------

Co-authored-by: Brian Joerger <bjoerger@goteleport.com>
github-merge-queue Bot pushed a commit that referenced this pull request Nov 21, 2024
…7954)

* Cache PIV connections to share across the program execution (#47091)

* Cache yubikey objects.

* Cache PIV connections to share across the program execution.

* Do not release the connection until `sign` returns

* Do not ignore errors

* Perform a "warm up" call to YubiKey

* Fix tests

* Use a specific interface to check if the key can be "warmed up"

* Allow abandoning `signer.Sign` call when context is canceled

* Make sure that the cached key is valid for the given private key policy

The reason for adding this check was failing `invalid key policies` test.

* Make `hardwareKeyWarmer` private

* Force callers to release connection

* Improve comments

* Fix lint

* Improve `connect` comment

* Fix race condition

* Simplify `release` logic

* Trigger license/cla

---------

Co-authored-by: joerger <bjoerger@goteleport.com>

(cherry picked from commit bd6fdbf)

* Sign a hashed message in hardware key warmup call (#48206)

Otherwise, signing may fail with "input must be a hashed message" error.

(cherry picked from commit 47494db)

* Remove delayed closing of yubikey connection to prevent the connection from leaking after program execution. (#48414)

(cherry picked from commit b7c0e79)

---------

Co-authored-by: Brian Joerger <bjoerger@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Yubikey race condition using tsh

3 participants