Skip to content

Properly apply client_idle_timeout to database access sessions#32485

Merged
Tener merged 3 commits intomasterfrom
tener/db-monitor-conn-fix
Sep 27, 2023
Merged

Properly apply client_idle_timeout to database access sessions#32485
Tener merged 3 commits intomasterfrom
tener/db-monitor-conn-fix

Conversation

@Tener
Copy link
Copy Markdown
Contributor

@Tener Tener commented Sep 25, 2023

In database service, clientConn returned from MonitorConn was never used, causing unjustified idle timeouts.

Fixes #32073.

@github-actions github-actions Bot added database-access Database access related issues and PRs size/sm labels Sep 25, 2023
@Tener Tener changed the title Properly apply client_idle_timeout to database access sessions. Properly apply client_idle_timeout to database access sessions Sep 25, 2023
@Tener Tener requested review from greedy52, r0mant and smallinsky and removed request for ryanclark and xacrimon September 25, 2023 14:53
Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test coverage?

Copy link
Copy Markdown
Contributor

@greedy52 greedy52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a result from this change but when reviewing this function, should _, diagErr := s.cfg.AuthClient.AppendDiagnosticTrace(ctx, use ctx? I am worried ctx can be canceled when uploading diagnostic.

@Tener
Copy link
Copy Markdown
Contributor Author

Tener commented Sep 25, 2023

Can we add test coverage?

Sure, will do.

@Tener
Copy link
Copy Markdown
Contributor Author

Tener commented Sep 25, 2023

@greedy52

not a result from this change but when reviewing this function, should _, diagErr := s.cfg.AuthClient.AppendDiagnosticTrace(ctx, use ctx? I am worried ctx can be canceled when uploading diagnostic.

Could be a valid worry, although I'm not 100% certain. I think cancelCtx or s.closeContext might be safer here?

@greedy52
Copy link
Copy Markdown
Contributor

greedy52 commented Sep 25, 2023

Could be a valid worry, although I'm not 100% certain. I think cancelCtx or s.closeContext might be safer here?

I believe ctx can be cancelled for example when the user is locked. cancelCtx should be good.

@Tener Tener requested a review from r0mant September 26, 2023 13:59
@Tener
Copy link
Copy Markdown
Contributor Author

Tener commented Sep 26, 2023

Can we add test coverage?

@r0mant Added, PTAL.

@Tener
Copy link
Copy Markdown
Contributor Author

Tener commented Sep 26, 2023

Hmm...

WARNING: DATA RACE
Read at 0x00c005f72880 by goroutine 7359:
  github.com/jackc/pgconn.(*PgConn).IsClosed()
      /go/pkg/mod/github.com/jackc/pgconn@v1.14.1/pgconn.go:695 +0x49d
  github.com/gravitational/teleport/lib/srv/db/postgres.(*Engine).receiveFromServer.func1()
      /__w/teleport/teleport/lib/srv/db/postgres/engine.go:436 +0x4aa

Previous write at 0x00c005f72880 by goroutine 7325:
  github.com/jackc/pgconn.(*PgConn).Close()
      /go/pkg/mod/github.com/jackc/pgconn@v1.14.1/pgconn.go:627 +0x92
  github.com/gravitational/teleport/lib/srv/db/postgres.(*Engine).HandleConnection.func2()
      /__w/teleport/teleport/lib/srv/db/postgres/engine.go:167 +0x64
  runtime.deferreturn()
      /opt/go/src/runtime/panic.go:477 +0x30
  github.com/gravitational/teleport/lib/srv/db/common.(*reportingEngine).HandleConnection()
      /__w/teleport/teleport/lib/srv/db/common/reporting.go:87 +0x21a
  github.com/gravitational/teleport/lib/srv/db.(*Server).handleConnection()
      /__w/teleport/teleport/lib/srv/db/server.go:1021 +0x764
  github.com/gravitational/teleport/lib/srv/db.(*Server).HandleConnection()
      /__w/teleport/teleport/lib/srv/db/server.go:945 +0x95a
  github.com/gravitational/teleport/lib/srv/db.(*testContext).startHandlingConnections.func1()
      /__w/teleport/teleport/lib/srv/db/access_test.go:1488 +0x4f

Goroutine 7359 (running) created at:
  github.com/gravitational/teleport/lib/srv/db/postgres.(*Engine).receiveFromServer()
      /__w/teleport/teleport/lib/srv/db/postgres/engine.go:[421](https://github.com/gravitational/teleport/actions/runs/6313772578/job/17142603054?pr=32485#step:10:425) +0x4cf
  github.com/gravitational/teleport/lib/srv/db/postgres.(*Engine).HandleConnection.func6()
      /__w/teleport/teleport/lib/srv/db/postgres/engine.go:180 +0x5d

Goroutine 7325 (running) created at:
  github.com/gravitational/teleport/lib/srv/db.(*testContext).startHandlingConnections()
      /__w/teleport/teleport/lib/srv/db/access_test.go:1488 +0x84
  github.com/gravitational/teleport/lib/srv/db.TestDatabaseServerAutoDisconnect.func4()
      /__w/teleport/teleport/lib/srv/db/server_test.go:201 +0x33

@Tener
Copy link
Copy Markdown
Contributor Author

Tener commented Sep 26, 2023

The race above is preexisting and genuine, if benign. I'll raise a separate PR with the fix. This one will be blocked before that one lands.

Comment thread lib/srv/db/server_test.go
@Tener
Copy link
Copy Markdown
Contributor Author

Tener commented Sep 27, 2023

After changes, the test above no longer triggers the data race. Therefore, I think it is OK to merge it as is. The data race fix is pending in another PR.

@Tener Tener added this pull request to the merge queue Sep 27, 2023
@Tener Tener removed this pull request from the merge queue due to a manual request Sep 27, 2023
@Tener Tener enabled auto-merge September 27, 2023 14:18
@Tener Tener added this pull request to the merge queue Sep 27, 2023
Merged via the queue into master with commit f47a1b4 Sep 27, 2023
@Tener Tener deleted the tener/db-monitor-conn-fix branch September 27, 2023 14:37
@public-teleport-github-review-bot
Copy link
Copy Markdown

@Tener See the table below for backport results.

Branch Result
branch/v12 Failed
branch/v13 Failed
branch/v14 Create PR

Tener added a commit that referenced this pull request Sep 28, 2023
)

* Add test for auto disconnect (disconnect happens, query updates the timer).

* Fix: In database service, `clientConn` returned from `MonitorConn`
was never used, causing unjustified idle timeouts.
Tener added a commit that referenced this pull request Sep 28, 2023
)

* Add test for auto disconnect (disconnect happens, query updates the timer).

* Fix: In database service, `clientConn` returned from `MonitorConn`
was never used, causing unjustified idle timeouts.
github-merge-queue Bot pushed a commit that referenced this pull request Sep 28, 2023
) (#32726)

* Add test for auto disconnect (disconnect happens, query updates the timer).

* Fix: In database service, `clientConn` returned from `MonitorConn`
was never used, causing unjustified idle timeouts.
github-merge-queue Bot pushed a commit that referenced this pull request Sep 28, 2023
) (#32725)

* Add test for auto disconnect (disconnect happens, query updates the timer).

* Fix: In database service, `clientConn` returned from `MonitorConn`
was never used, causing unjustified idle timeouts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

database-access Database access related issues and PRs size/sm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

client_idle_timeout is not applied correctly to database access sessions

5 participants