Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"error: database is closed" when running with SQLite backend in Kubernetes #4038

Open
webvictim opened this issue Jul 14, 2020 · 6 comments
Open
Labels
bug robustness Resistance to crashes and reliability

Comments

@webvictim
Copy link
Contributor

webvictim commented Jul 14, 2020

What happened: A couple of customers have reported the same issues around running Teleport in Kubernetes with an SQLite backend.

Customer 1 (using their own Helm chart, the Deployment is functionally very similar to ours):

INFO [PROC:2]    The new service has started successfully. Starting syncing rotation status with period 10m0s. service/connect.go:433
INFO [PROXY:AGE] Starting reverse tunnel agent pool. service/service.go:2226
INFO [PROXY:KUB] Starting Kube proxy on 0.0.0.0:3026. service/service.go:2274
INFO [PROC]      New service has started successfully. service/service.go:468
INFO [PROC]      Shutting down the old service with timeout 5m0s. service/service.go:478
WARN [AUTH:1:CA] Re-init the cache on error: context canceled. cache/cache.go:333
WARN [REVERSE:T] Re-init the cache on error: context canceled. cache/cache.go:333
WARN [PROXY:1:C] Re-init the cache on error: context canceled. cache/cache.go:333
WARN [PROXY:1:C] Re-init the cache on error: context canceled. cache/cache.go:333
WARN [PROXY:KUB] Kube TLS server exited with error: accept tcp [::]:3026: use of closed network connection. service/service.go:2276
INFO [PROXY:SER] Shutting down gracefully. service/service.go:2311
INFO [AUTH:1]    Shutting down gracefully. service/service.go:1210
INFO [AUDIT:1]   Shutting down. events/uploader.go:316
INFO [AUDIT:1]   Exited. service/service.go:1690
WARN [PROXY:1]   Re-init the watcher on error: context canceled. services/proxywatcher.go:180
INFO             [WEB] closing session cache web/sessions.go:354
INFO [PROXY:SER] Exited. service/service.go:2327
INFO [PROXY:SER] Exited. service/service.go:2164
INFO             [WEB] closing session cache web/sessions.go:354
WARN [PROXY:BEA] Heartbeat failed Post https://teleport.cluster.local/v2/proxies: EOF. srv/heartbeat.go:240
WARN [PROXY:2:C] Re-init the cache on error: EOF. cache/cache.go:333
WARN [PROXY:2]   Re-init the watcher on error: EOF. services/proxywatcher.go:180
WARN [REVERSE:T] Re-init the cache on error: context is closing. cache/cache.go:333
WARN [PROXY:2:C] Re-init the cache on error: context is closing. cache/cache.go:333
WARN [PROXY:2]   Re-init the watcher on error: context is closing. services/proxywatcher.go:180
INFO [AUTH:1]    Exited. service/service.go:1214
INFO [PROC]      The old service was successfully shut down gracefully. service/service.go:501
INFO [KEYGEN]    Stopping key precomputation routine. native/native.go:144
INFO [AUTH]      Updated rotation state, set current phase to: "init". type:host auth/rotate.go:235
INFO [AUTH]      Updated rotation state, set current phase to: "init". type:user auth/rotate.go:235
ERRO             "\nERROR REPORT:\nOriginal Error: *trace.ConnectionProblemError sql: database is closed\nStack Trace:\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:881 github.com/gravitational/teleport/lib/backend/lite.convertError\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:822 github.com/gravitational/teleport/lib/backend/lite.(*LiteBackend).inTransaction\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:569 github.com/gravitational/teleport/lib/backend/lite.(*LiteBackend).Get\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/sanitize.go:97 github.com/gravitational/teleport/lib/backend.(*Sanitizer).Get\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/report.go:130 github.com/gravitational/teleport/lib/backend.(*Reporter).Get\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:361 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).getLicenseCheckHeartbeat\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:374 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).GetLicenseCheckResult\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:400 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).processLicenseCheckResult\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:125 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).startRecordingUsage\n\t/opt/go/src/runtime/asm_amd64.s:1358 runtime.goexit\nUser Message: database is closed\n" pro/enforcer.go:127
ERRO             Failed to retrieve existing usage record. error:database is closed pro/enforcer.go:165
ERRO             "\nERROR REPORT:\nOriginal Error: *trace.ConnectionProblemError sql: database is closed\nStack Trace:\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:881 github.com/gravitational/teleport/lib/backend/lite.convertError\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:822 github.com/gravitational/teleport/lib/backend/lite.(*LiteBackend).inTransaction\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:569 github.com/gravitational/teleport/lib/backend/lite.(*LiteBackend).Get\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/sanitize.go:97 github.com/gravitational/teleport/lib/backend.(*Sanitizer).Get\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/report.go:130 github.com/gravitational/teleport/lib/backend.(*Reporter).Get\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:361 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).getLicenseCheckHeartbeat\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:374 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).GetLicenseCheckResult\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:400 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).processLicenseCheckResult\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:125 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).startRecordingUsage\n\t/opt/go/src/runtime/asm_amd64.s:1358 runtime.goexit\nUser Message: database is closed\n" pro/enforcer.go:127
ERRO             Failed to retrieve existing usage record. error:database is closed pro/enforcer.go:165

Customer 2 (using our Helm chart):

INFO [AUTH] Updating cluster configuration: StaticTokens([]). auth/init.go:273
INFO [AUTH] Updating cluster configuration: AuthPreference(Type="local",SecondFactor="otp"). auth/init.go:280
INFO [AUTH] Created namespace: "default". auth/init.go:287
INFO [AUTH] Auth server is running periodic operations. auth/init.go:427
INFO [PROC:1] Service auth is creating new listener on 0.0.0.0:3025. service/signals.go:213
INFO [AUTH:1] Starting Auth service with PROXY protocol support. service/service.go:1069
INFO [PROCESS] Using license from /var/lib/license/license-enterprise.pem expires at 2119-07-06 15:59:03.018040894 +0000 UTC,reports usage,supports kubernetes. pro/process.go:182
[AUTH] Auth service is starting on 0.0.0.0:3025.
INFO [AUTH] Auth service is starting on 0.0.0.0:3025. utils/cli.go:177
INFO [PROC:1] The new service has started successfully. Starting syncing rotation status with period 10m0s. service/connect.go:433
INFO [PROC:1] Service Admin has updated principals to ["teleport-tic-04-fra-02-f6fd8bf5d-tfc4z" "teleport.tic-04.fra-02.eu.example.com"], DNS Names to [], going to request new principals and update. service/co
nnect.go:680
INFO [CA] Generating TLS certificate {0x3771a40 0xc00074e7f0 CN=8abd4389-a9fa-44a7-9236-e9c7b2b14af3.tic-04-fra-02,O=Admin,POSTALCODE=null,STREET= 2030-04-08 09:31:08.843048279 +0000 UTC [teleport-tic-04-
fra-02-f6fd8bf5d-tfc4z teleport.tic-04.fra-02.eu.example.com *.teleport.cluster.local teleport.cluster.local]}. common_name:8abd4389-a9fa-44a7-9236-e9c7b2b14af3.tic-04-fra-02 dns_names:[teleport-tic-04-fra-02-f6f
d8bf5d-tfc4z teleport.tic-04.fra-02.eu.example.com *.teleport.cluster.local teleport.cluster.local] locality:[] not_after:2030-04-08 09:31:08.843048279 +0000 UTC org:[Admin] org_unit:[] tlsca/ca.go:266
INFO [PROC:1] Exiting signal handler: process has started internal reload. service/signals.go:126
INFO [PROC] Started in-process service reload. service/service.go:434
INFO [AUTH] Updating cluster configuration: StaticTokens([]). auth/init.go:273
INFO [AUTH] Updating cluster configuration: AuthPreference(Type="local",SecondFactor="otp"). auth/init.go:280
INFO [AUTH] Created namespace: "default". auth/init.go:287
INFO [AUTH] Auth server is running periodic operations. auth/init.go:427
INFO [PROC:2] Using file descriptor auth 0.0.0.0:3025 passed by the parent process. service/signals.go:207
INFO [AUTH:2] Starting Auth service with PROXY protocol support. service/service.go:1069
INFO [PROCESS] Using license from /var/lib/license/license-enterprise.pem expires at 2119-07-06 15:59:03.018040894 +0000 UTC,reports usage,supports kubernetes. pro/process.go:182
INFO [PROC] Created new process. service/service.go:447
INFO [AUTH] Auth service is starting on 0.0.0.0:3025. utils/cli.go:177
[AUTH] Auth service is starting on 0.0.0.0:3025.
INFO [PROC:2] The new service has started successfully. Starting syncing rotation status with period 10m0s. service/connect.go:433
INFO [PROC] New service has started successfully. service/service.go:465
INFO [PROC] Shutting down the old service with timeout 5m0s. service/service.go:475
WARN [AUTH:1:CA] Re-init the cache on error: watcher is closed. cache/cache.go:333
INFO [AUTH:1] Shutting down gracefully. service/service.go:1195
INFO [AUTH:1] Exited. service/service.go:1199
INFO [PROC] The old service was successfully shut down gracefully. service/service.go:498
INFO [KEYGEN] Stopping key precomputation routine. native/native.go:144
2020-04-10 09:33:01.996576 I | http: TLS handshake error from 10.156.15.196:57716: read tcp 10.4.2.233:3025->10.156.15.196:57716: read: connection reset by peer
ERRO "\nERROR REPORT:\nOriginal Error: *trace.ConnectionProblemError sql: database is closed\nStack Trace:\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:888 github.com/gravitational/teleport/lib/backend/lite.convertError\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:824 github.com/gravitational/teleport/lib/backend/lite.(*LiteBackend).inTransaction\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/lite/lite.go:571 github.com/gravitational/teleport/lib/backend/lite.(*LiteBackend).Get\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/sanitize.go:103 github.com/gravitational/teleport/lib/backend.(*Sanitizer).Get\n\t/gopath/src/github.com/gravitational/teleport/lib/backend/report.go:130 github.com/gravitational/teleport/lib/backend.(*Reporter).Get\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:361 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).getLicenseCheckHeartbeat\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:374 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).GetLicenseCheckResult\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:400 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).processLicenseCheckResult\n\t/gopath/src/github.com/gravitational/teleport/e/lib/pro/enforcer.go:125 github.com/gravitational/teleport/e/lib/pro.(*Enforcer).startRecordingUsage\n\t/opt/go/src/runtime/asm_amd64.s:1358 runtime.goexit\nUser Message: database is closed\n" pro/enforcer.go:127
ERRO Failed to retrieve existing usage record. error:database is closed pro/enforcer.go:165

What you expected to happen: No error.

How to reproduce it (as minimally and precisely as possible): Run Teleport in Kubernetes. Rest is as yet uncertain.

Environment

  • Teleport version (use teleport version):
    Customer 1: Teleport 4.3.0
    Customer 2: Teleport Enterprise v4.2.7git:v4.2.7-0-gbc57b85f go1.13.2
@webvictim webvictim added the bug label Jul 14, 2020
@russjones russjones added this to the 4.4 "Rome" milestone Jul 21, 2020
@russjones
Copy link
Contributor

Best: 2
Worst: 5

@webvictim
Copy link
Contributor Author

I've had a brief look into this. At a guess, when we reload the Teleport process (using kill -HUP $(pidof teleport) or similar), the sqlite backend gets closed and the enforcer attempts to get a usage record before the new backend has spun up. We might need to look at adding some kind of backoff to the enforcer which will wait a few seconds if the database is closed and then retry, rather than immediately erroring out.

@dove-young
Copy link

I also found this kind of msg in my teleport running in Kubernetes

@webvictim
Copy link
Contributor Author

@espadolini Do you think your recent fixes might have addressed this?

@espadolini
Copy link
Contributor

Looks unrelated, nothing prevents the enforcer from being halfway through a call when we hit process.storage.Close() in (*TeleportProcess).StartShutdown() or (*TeleportProcess).Close().

The latter is only called during a scram so it's not a big deal to have some errors, but for the former we need to tie a clean enforcer shutdown (which needs to be written) into the supervisor machinery, probably with an OnExit call.

@espadolini
Copy link
Contributor

...or we could just catch "backend is closed" and swallow it, and stop there, as we can't possibly proceed anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug robustness Resistance to crashes and reliability
Projects
None yet
Development

No branches or pull requests

4 participants