Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra database plugin intermittent timeouts connection to Cassandra #8527

Closed
nmylesnbx opened this issue Mar 10, 2020 · 6 comments
Closed

Comments

@nmylesnbx
Copy link

Describe the bug
I'm getting intermittent timeout errors from the Cassandra database plugin connecting to Cassandra. This happens when calling database/creds/<role>. Most of the time it works but I'm hitting this every 10 or so calls to get new database credentials.

hvac python stacktrace:

File "/usr/local/lib/python3.7/site-packages/hvac/v1/__init__.py", line 195, in read
   return self._adapter.get('/v1/{0}'.format(path), wrap_ttl=wrap_ttl).json()
File "/usr/local/lib/python3.7/site-packages/hvac/adapters.py", line 93, in get
   return self.request('get', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/hvac/adapters.py", line 276, in request
  utils.raise_for_error(response.status_code, text, errors=errors)
File "/usr/local/lib/python3.7/site-packages/hvac/utils.py", line 39, in raise_for_error
   raise exceptions.InternalServerError(message, errors=errors)
hvac.exceptions.InternalServerError: 1 error occurred:
  * read tcp 192.168.144.4:37610->192.168.144.2:9042: i/o timeout

To Reproduce
Steps to reproduce the behavior:

  1. Run vault secrets enable database
  2. Run:
vault write database/config/cassandra \
    plugin_name="cassandra-database-plugin" \
    hosts=cassandra \
    protocol_version=4 \
    username=username \
    password=password \
    allowed_roles='*'
  1. Run:
vault write database/roles/role \
    db_name=cassandra \
    creation_statements="CREATE USER '{{username}}' WITH PASSWORD '{{password}}' NOSUPERUSER; \
          GRANT SELECT ON ALL KEYSPACES TO {{username}};" \
    default_ttl="1h" \
    max_ttl="24h"
  1. Call API multiple times to retrieve creds: GET http://vault:8200/v1/database/creds/role
  2. See intermittent 500 errors

Expected behavior
I would expect timeouts to be retried to avoid intermittent 500's

Environment:
This happens both locally (docker-compose) and in a deployed environment

  • Vault Server Version (retrieve with vault status): 1.2.2
  • Vault CLI Version (retrieve with vault version): 1.2.2
  • Server Operating System/Architecture: official vault docker image

Vault cassandra plugin configuration was something like this:

vault write database/config/cassandra \
    plugin_name="cassandra-database-plugin" \
    hosts=cassandra \
    protocol_version=4 \
    username=username \
    password=password \
    allowed_roles='*'
@tyrannosaurus-becks
Copy link
Contributor

Marking as an enhancement because the request appears to be to add retrying when Cassandra returns 500's.

@captain-kark
Copy link

If work is going to be done under the hood to fix this, it might help to stem the number of total roles that this plugin generates and maintains.

If some kind of ALTER ROLE strategy can be used in place of DROP ROLE when revoking a role, it would allow a role to be "removed" when a TTL expires without introducing a NULL in the column. In addition, this would also allow for re-instating a role should the exact same access configuration be requested again later (limit one in-flight per TTL, else generate a second copy of the same account), which in my use case is rather common.

Using this plugin with DROP ROLE as a revocation statement can tombstone your system_auth.role_permissions table, which in my experience is one way to get these timeouts to appear.

@munjalpatel
Copy link

Any updates here? I am also running into this exact issue. @nmylesnbx were you able to resolve / work around this?

@denglertai
Copy link

denglertai commented Mar 1, 2021

@munjalpatel it is possible to increase the connect_timeout within the database secret engine configuration settings.
Unfortunately this does not seem to be documented at all.

In my case that solved the problem.
But it was probably related to a slower Cassandra database rather than a problem with Vault itself.

viljoviitanen added a commit to viljoviitanen/vault that referenced this issue Aug 26, 2021
Documentation says timeout is 5s, but code uses 0s, which is too short any any real world usage, causing issues.
https://www.vaultproject.io/api/secret/databases/cassandra#connect_timeout
issues: hashicorp#8527 hashicorp#9400
imthaghost pushed a commit that referenced this issue Feb 15, 2022
* fix cassandra db plugin timeout to 5s as in docs

Documentation says timeout is 5s, but code uses 0s, which is too short any any real world usage, causing issues.
https://www.vaultproject.io/api/secret/databases/cassandra#connect_timeout
issues: #8527 #9400

* Create 12443.txt

changelog entry
@aphorise
Copy link
Contributor

aphorise commented Sep 3, 2022

Hey folks - I feel that this issue is no longer relevant and may be closed. @nmylesnbx any chance you can confirm if / how this may still be applicable for you?

On a related note there's #10467 expected in the next major release of 1.12 which adds initial_connection_timeout, simple_retry_policy_retries.

PS - this request seems somewhat related to #15899

@aphorise
Copy link
Contributor

aphorise commented Sep 9, 2022

Closing due to no response. Please re-open if still relevant.

@aphorise aphorise closed this as completed Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants