Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-6368 Metrics-only listener for Agent #18101

Merged
merged 8 commits into from
Nov 25, 2022
Merged

Conversation

VioletHynes
Copy link
Contributor

This introduces a new option for the Agent HCL config, metrics_only_listener. It defaults to false. If set to true, it will make the defined listener only serve metrics. It can be put alongside other listener blocks, too, like so:

cache {
 enforce_consistency = "always"
 use_auto_auth_token = true
}

listener "tcp" {
    address = "127.0.0.1:8100"
    tls_disable = true
}

listener "tcp" {
    address = "127.0.0.1:3000"
    tls_disable = true
    telemetry {
      metrics_only_listener = true
    }
}

Using the above, you can e.g. cURL the metrics (but nothing else). This shows an example with an auth failure:

curl http://127.0.0.1:3000/agent/v1/metrics
{"Timestamp":"2022-11-23 17:41:50 +0000 UTC","Gauges":[{"Name":"vault.runtime.alloc_bytes","Value":10344560,"Labels":{}},{"Name":"vault.runtime.free_count","Value":52367,"Labels":{}},{"Name":"vault.runtime.heap_objects","Value":45838,"Labels":{}},{"Name":"vault.runtime.malloc_count","Value":98205,"Labels":{}},{"Name":"vault.runtime.num_goroutines","Value":13,"Labels":{}},{"Name":"vault.runtime.sys_bytes","Value":30328072,"Labels":{}},{"Name":"vault.runtime.total_gc_pause_ns","Value":375709,"Labels":{}},{"Name":"vault.runtime.total_gc_runs","Value":5,"Labels":{}}],"Points":[],"Counters":[{"Name":"vault.agent.auth.failure","Count":1,"Rate":0.1,"Sum":1,"Min":1,"Max":1,"Mean":1,"Stddev":0,"Labels":{}}],"Samples":[]}

Note that we still do need a cache block for this listener, though we have a separate work item tracked to enable listeners without a cache, and I felt it best to keep the two work items distinct.

@@ -47,25 +48,6 @@ type SharedConfig struct {
ClusterName string `hcl:"cluster_name"`
}

// LoadConfigFile loads the configuration from the given file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions were not used anywhere, so I cleaned them up.

@ncabatoff
Copy link
Collaborator

I have other things to get to before I review this, but just one question based on the description: why do we need metrics_only_listener? Can't we rely solely on the presence of a telemetry block? I just worry that otherwise it may seem that you can have a telemetry block with metrics_only_listener=false.

Alternatively, maybe the top-level block should be named metrics_listener instead of listener?

Also, can you look at the telemetry block in Vault server config and see what we should adopt from there?

@VioletHynes
Copy link
Contributor Author

I had originally looked at potentially creating a metrics_listener construct, but as I worked through it, it did not seem to make less and less sense. The listener blocks you can define for Vault Agent use the same shared config as Vault itself does. Keeping it this way means that you can utilise all of the same options as you can with Vault Server listener blocks.

So, to the point of:
Also, can you look at the telemetry block in Vault server config and see what we should adopt from there?
There is one:
unauthenticated_metrics_access
You can specify it and it parses fine (since we share the listener blocks), but it doesn't actually do anything for Agent.

I'd rather not diverge the Agent listener config from the Vault listener config, so I'd rather the presence of a telemetry block did not have special behaviour for Agent that it doesn't for Vault, for example. I also think we gain a lot from sharing the listener config.

@ncabatoff
Copy link
Collaborator

Ok. What about the non-listener telemetry stanza, i.e. https://developer.hashicorp.com/vault/docs/configuration/telemetry ?

@VioletHynes
Copy link
Contributor Author

I'm a little confused as to what you mean - we can't rely on items inside the telemetry stanza to determine how listeners should behave. Agent supports the telemetry stanza in the same way as Server - if we're missing functionality, that would be news to me.

See also: https://developer.hashicorp.com/vault/docs/agent#telemetry-stanza

Perhaps I misunderstood what you meant, though?

@ncabatoff
Copy link
Collaborator

I'm a little confused as to what you mean - we can't rely on items inside the telemetry stanza to determine how listeners should behave. Agent supports the telemetry stanza in the same way as Server - if we're missing functionality, that would be news to me.

See also: https://developer.hashicorp.com/vault/docs/agent#telemetry-stanza

Perhaps I misunderstood what you meant, though?

Nope, I just forgot that we'd already added some metrics support to Agent. Nevermind!

@ncabatoff
Copy link
Collaborator

Another high-level thought: rather than a bool, how about introducing a string field? So instead of metrics_only_listener, make it a top-level listener field e.g. listener_function which can have a value of either telemetry or caching-proxy - or possible also proxy, which might help us address #8953.

@peteski22
Copy link

listener_function

Sorry, late to the party, is this to imply the role the listener would be performing?

Would this look something like this:

listener "tcp" {
    address = "127.0.0.1:3000"
    listener_function = "telemetry"
}

@VioletHynes
Copy link
Contributor Author

VioletHynes commented Nov 24, 2022

Ideally I'd like this functionality to be potentially-extendable to Vault-actual if possible. While I don't know of any requests for this feature for Vault server, the listener block is shared between the two and I'd rather not have separate configs in Agent/Vault as a result.

Part of the reason I'm in favour of a boolean over a string is that I see 'metrics only listener' is a very common use case and pattern that (in my mind) is distinct to other software-specific configuration options. In other words, I definitely see this kind of thing as a 'metrics only mode' that you can opt into for security reasons as opposed to something we'll apply to other APIs.

As a result, I do have a preference for the bool, as I feel like a lot of the string options read awkwardly, but I don't feel massively strongly about it and open to changing. If we do, I'd like to make sure it keeps the potential of extendability to Vault server.

I tried to come up with a nice sounding string option, maybe something like these?:

apis_served = "all"
apis_served = "metrics"
listener_function = "normal"
listener_function = "metrics"
listener_type = "regular"
listener_type = "metrics"

Looking for additional thoughts!

@peteski22
Copy link

peteski22 commented Nov 24, 2022

Could you not just have default and metrics (where default is also what we get when the field isn't specified)? (Edit: We do seem to have docs that allow multiple things, but then just say: only this one is supported atm, so we could just say 'only metrics is supported if you want to deviate, and add a different role).

Really sorry if I'm being dumb.

Metrics Listener:

listener "tcp" {
    address = "127.0.0.1:3000"
    role = "metrics"
}

Business as usual:

listener "tcp" {
    address = "127.0.0.1:3000"
}

or

listener "tcp" {
    address = "127.0.0.1:3000"
    role = "default"
}

I'm using role as it's the role that listener is going to perform, but also we don't seem to use listener_address for example, just address. So it feels (to me) like we should pick something from the second part of the names (role, type, function).

I guess using a string vs bool type does give you flexibility to add other roles later. That is what Nick was suggesting, that we could add ones to resolve an issue where one config stanza (listener) is ignored unless some other stanza (cache) is also provided?

@VioletHynes
Copy link
Contributor Author

VioletHynes commented Nov 24, 2022

I think this issue is separate from the issue where the listener stanza is currently linked to the cache stanza. I mentioned it briefly in the description, but I don't think that this really ties into that work at all (if we do enable listener without cache, we should probably just use the cache if we have it and don't if we don't).

I opted not to try and tackle the listener/cache entanglement here to keep those pieces of work distinct, but when we do that work, I think it would be a bad idea to require a new type of listener for it - we should support listeners without the cache without any other special configuration. Customers broadly just want it to work without needing to specify a cache, I think requiring a new kind of listener would still be kind of confusing (it's more that it's confusing that you need a cache to specify a listener, and it has been reported as a bug a few times), so I think that's how we should support it.

The above approach of having a role be something like caching-proxy also makes this option incompatible with Vault's listener stanza, which I don't particularly like, and makes it hard to introduce roles for Vault listeners (we'd need to completely disentangle the code).

We can't use type, for the name, as that's taken, and I guess role works, but I'm currently a little unconvinced that there are is a use case for having more roles to begin with ('metrics only' is a very specific use case that's supported by a lot of applications), and that whatever those roles are are incompatible with the other listener roles. To me, it also makes it a little less immediately clear to me that this turns off all of the other APIs, but I guess we could use metrics_only or similar to make it clearer.

If you both think that the string approach is the right approach, and we think role and default/metrics_only is clear enough for both Vault and Vault Agent, I'm happy to do it!

internalshared/configutil/listener.go Outdated Show resolved Hide resolved
internalshared/configutil/listener.go Outdated Show resolved Hide resolved
@ncabatoff
Copy link
Collaborator

If you both think that the string approach is the right approach, and we think role and default/metrics_only is clear enough for both Vault and Vault Agent, I'm happy to do it!

That works for me. There are some other examples of listeners with specific purposes in Vault already, they're just not configurable as listeners in HCL: the cluster port (8201) and per-KMIP-mount listeners. Conceivably one day we might want to allow users to control some aspects of these in HCL and we could use role for this.

@VioletHynes
Copy link
Contributor Author

Updated to use role, with default and metrics_only options! Thanks for the comments.

website/content/docs/agent/caching/index.mdx Outdated Show resolved Hide resolved
command/agent.go Outdated Show resolved Hide resolved
@VioletHynes VioletHynes merged commit a4a23f7 into main Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants