Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.10.3 re-seals if it cant revoke DB creds #4846

Closed
myoung34 opened this issue Jun 28, 2018 · 8 comments
Closed

0.10.3 re-seals if it cant revoke DB creds #4846

myoung34 opened this issue Jun 28, 2018 · 8 comments
Milestone

Comments

@myoung34
Copy link

Describe the bug

2018-06-28T03:00:03.448Z [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/c273036b-552f-9977-e17a-c552f83f3b2c error="failed to revoke entry: resp: (*logical.Response)(nil) err: error during revoke: could not find role with name "rds-staging" or embedded revocation db name data"                                   
2018-06-28T03:00:03.448Z [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/a7d43492-1646-e0b9-2621-9390666d81ea error="failed to revoke entry: resp: (*logical.Response)(nil) err: error during revoke: could not find role with name "rds-staging" or embedded revocation db name data"                                                                            
2018-06-28T03:00:03.449Z [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/c0e9bd84-44b3-d3a5-aa5b-d3318ddb0a97 error="failed to revoke entry: resp: (*logical.Response)(nil) err: error during revoke: could not find role with name "rds-staging" or embedded revocation db name data"                                  
panic: interface conversion: interface {} is []interface {}, not []string                                                                                                                                                                                                                                                     
goroutine 1305 [running]:                                                                                                                                                                                                                     github.com/hashicorp/vault/builtin/logical/database.(*databaseBackend).secretCredsRevoke.func1(0x2586340, 0xc420de3f40, 0xc4219217a0, 0xc420770e30, 0x0, 0x0, 0x0)
        /gopath/src/github.com/hashicorp/vault/builtin/logical/database/secret_creds.go:114 +0x7e8                                                                                                                                            github.com/hashicorp/vault/logical/framework.(*Secret).HandleRevoke(0xc420d84870, 0x2586340, 0xc420de3f40, 0xc4219217a0, 0xc4219214f8, 0x10094709b11fb01, 0x0)
        /gopath/src/github.com/hashicorp/vault/logical/framework/secret.go:87 +0x9a                                                                                                                                                           github.com/hashicorp/vault/logical/framework.(*Backend).handleRevokeRenew(0xc420972a90, 0x2586340, 0xc420de3f40, 0xc4219217a0, 0x64363966c156b618, 0x6664612d37613835, 0xc421495988)
        /gopath/src/github.com/hashicorp/vault/logical/framework/backend.go:394 +0x1fb                                                                                                                                                        github.com/hashicorp/vault/logical/framework.(*Backend).HandleRequest(0xc420972a90, 0x2586340, 0xc420de3f40, 0xc4219217a0, 0x0, 0x0, 0x0)
        /gopath/src/github.com/hashicorp/vault/logical/framework/backend.go:169 +0x6fb                                                                                                                                                        github.com/hashicorp/vault/vault.(*Router).routeCommon(0xc4208e0f00, 0x2586340, 0xc420de3f40, 0xc4219217a0, 0x410600, 0x0, 0x22e0000, 0x0, 0x0)
        /gopath/src/github.com/hashicorp/vault/vault/router.go:523 +0x7c2                                                                                                                                                                     github.com/hashicorp/vault/vault.(*Router).Route(0xc4208e0f00, 0x2586340, 0xc420de3f40, 0xc4219217a0, 0x0, 0xc421495dc0, 0x19de3e1)
        /gopath/src/github.com/hashicorp/vault/vault/router.go:378 +0x4e                                                                                                                                                                      github.com/hashicorp/vault/vault.(*ExpirationManager).revokeEntry(0xc420621200, 0xc4213550e0, 0x3f, 0xc4213550e0)
        /gopath/src/github.com/hashicorp/vault/vault/expiration.go:1092 +0x1c7                                                                                                                                                                github.com/hashicorp/vault/vault.(*ExpirationManager).revokeCommon(0xc420621200, 0xc42011b440, 0x3f, 0x0, 0x0, 0x0)                     
        /gopath/src/github.com/hashicorp/vault/vault/expiration.go:510 +0x491                                                                                                                                                                 github.com/hashicorp/vault/vault.(*ExpirationManager).Revoke(0xc420621200, 0xc42011b440, 0x3f, 0x0, 0x0) 
        /gopath/src/github.com/hashicorp/vault/vault/expiration.go:489 +0x112                                                                                                                                                                 github.com/hashicorp/vault/vault.(*ExpirationManager).expireID(0xc420621200, 0xc42011b440, 0x3f)          
        /gopath/src/github.com/hashicorp/vault/vault/expiration.go:1066 +0x1ce                                                                                                                                                                github.com/hashicorp/vault/vault.(*ExpirationManager).updatePendingInternal.func1()                        
        /gopath/src/github.com/hashicorp/vault/vault/expiration.go:1031 +0x3f                                                                                                                                                                 created by time.goFunc                                                                                                                              
        /goroot/src/time/sleep.go:172 +0x44                                                                                                                                                                                                   ==> Vault server configuration:    

To Reproduce
Steps to reproduce the behavior:

  1. create temp db creds (psql in this case) via database backend
  2. Tear down the database

Expected behavior
in 0.9.6 nothing happened.
In 0.10.3 the server bounces and re-seals

Environment:

  • 0.10.3

Vault server configuration file(s):

n/a

Additional context

We tear down staging nightly. During the day we let devs create temp creds using the database provider

@myoung34
Copy link
Author

myoung34 commented Jun 28, 2018

Downgrading the server back to 0.9.6 works fine

 find role with name rds-staging
2018/06/28 03:09:29.776099 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/ffb365de-abea-6b62-e05a-6a398facb379 error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:29.777720 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/redshift-staging/81f55018-faed-482f-7283-c70ad985501c error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name redshift-staging
2018/06/28 03:09:29.781611 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/dbbdf37a-750a-383b-cd06-a5c94057737d error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:29.781818 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/659f8b2d-dc83-76e0-c7be-529933c6c058 error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:29.782111 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/150c2ec6-dfcd-7abe-2dea-09e8ff39536d error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:29.785203 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/12e35756-2745-ff5e-ae59-d4ce60bc33f9 error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:30.031191 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/f6116095-fecc-995b-aea6-1cb352ce133e error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:30.037821 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/d69d4920-ddbb-4308-e16f-1fff0b6d87be error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:30.059769 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/fdcade8d-ad60-5cc5-90d5-ab5818bc6e88 error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:30.074162 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/e636bad8-0df6-b3c7-be69-3d044a752800 error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging
2018/06/28 03:09:30.241451 [ERROR] expiration: failed to revoke lease: lease_id=database/creds/rds-staging/9c6da87a-0169-4800-6e23-59820214b9d9 error=failed to revoke entry: resp:(*logical.Response)(nil) err:error during revoke: could not find role with name rds-staging

however the server never bounces and stays unsealed as expected

kalafut pushed a commit that referenced this issue Jun 28, 2018
@briankassouf briankassouf added this to the 0.10.4 milestone Jun 28, 2018
@briankassouf
Copy link
Contributor

@myoung34 Thanks for the report, we will work on a fix for this.

This code is only hit when a lease is revoked after the role has been deleted from vault. As a work around you could make sure all leases are removed prior to deleting the role.

Just out of curiosity how are you tearing down the database?

@myoung34
Copy link
Author

@briankassouf We spin up staging daily from prod snapshots using terraform, database included.

The vault stuff (secrets/roles/etc) are created from terraform as well, similar to:

resource "vault_generic_secret" "rds_access_config" {
  path = "database/config/rds-staging"

  data_json = <<EOF
{
  "plugin_name": "postgresql-database-plugin",
  "allowed_roles": "rds-staging",
  "connection_url": "postgresql://${data.vault_generic_secret.rds_user.data["value"]}:....snip...."
}
EOF
}

resource "vault_generic_secret" "rds_access_role" {
  path = "database/roles/rds-staging"

  data_json = <<EOF
{
    "db_name": "rds-staging",
    "creation_statements": "...snip...",
    "default_ttl": "10h",
    "max_ttl": "24h",
    "revocation_statements": "...snip..."
}
EOF
}

resource "vault_generic_secret" "rds_hostname" {
  path = "secret/staging/shared/DB_DEFAULT_HOST"

  data_json = <<EOF
{
  "value": "${module.rds.address}"
}
EOF
}

@josegonzalez
Copy link
Contributor

This code is only hit when a lease is revoked after the role has been deleted from vault. As a work around you could make sure all leases are removed prior to deleting the role.

You would see this sort of issue if - for whatever reason - you had to rollback a database to an earlier snapshot without those roles. I can see that making disaster recovery extremely painful.

@briankassouf
Copy link
Contributor

@myoung34 I see, so terraform is likely removing the role and then unmounting the backend, which will try to revoke all the leases.

This was introduced in a recent improvement that attempted to revoke the lease (and subsequently remove the users from the configured database) if the role didn't exist. Prior to this change tearing down the database backend like that would have left users lingering in the database.

@myoung34
Copy link
Author

We tear down the database so lingering users are a non issue in this context. The production database does not tear down so leases work there.

As long as vault doesn't panic and bounce, resealing, it resolved my current issues with upgrading

@briankassouf
Copy link
Contributor

briankassouf commented Jun 28, 2018

@josegonzalez

You would see this sort of issue if - for whatever reason - you had to rollback a database to an earlier snapshot without those roles. I can see that making disaster recovery extremely painful.

If you rolled back the whole datastore then the leases would also be rolled back. But this panic is now fixed so that shouldn't be an issue going forward.

@Jeff-Hanson
Copy link

We just encountered this issue in our dev environment. We created the issue by renaming a role for database creds using terraform. Then applied those changes to vault using terraform while there were outstanding leases for the old role.
Terraform removed the old role which caused Vault to hit this issue and not be able to stay unsealed. We experimented with a work around for this by going into the consul backend for vault and deleting the consul folder for those leases. The path in consul was something like /vault/sys/expire/id/database/creds/ . This appeared to work as vault was able to come up and stay unsealed. I did need to cleanup the users in the DB manually. But It appears to be using the new role just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants