Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Consul health check identifier #4020

Open
horalstvo opened this issue Mar 21, 2018 · 6 comments
Open

Expose Consul health check identifier #4020

horalstvo opened this issue Mar 21, 2018 · 6 comments

Comments

@horalstvo
Copy link

We want to use Nomad together with Consul. One of our libraries registers a session with Consul and adds additional checks (serf and service). The Consul API is not very specific about it but it looks from observed behavior that the checks are actually IDs of checks (https://www.consul.io/api/session.html#checks).

Our current implementation does not use Nomad. It registers checks manually on Consul and it sets the ID which is then used by the library. The Consul API permits that (https://www.consul.io/docs/agent/checks.html).

When running with Nomad the ID of the Consul check is generated. Hence the library doesn't know it. I would like to set the ID of the check - currently Nomad does not allow that: https://www.nomadproject.io/docs/job-specification/service.html#check-parameters

Alternatively, since I set the ID of the additional check as an env variable, if Nomad exports the generated value in some way I can use that.

@chelseakomlo
Copy link
Contributor

Hi, thanks for opening this issue. We don't currently have the ability to look up health check identifiers in Nomad, but we have plans to make this better in the future.

As you mentioned, in the meantime you can look up the health check identifier in Consul after Nomad has registered it via Consul's check API: https://www.consul.io/docs/agent/checks.html.

Let us know if there is anything else we can help with, leaving this issue open to track future improvements for this.

@chelseakomlo chelseakomlo changed the title Support setting ID on the Consul checks in service stanza Expose Consul health check identifier Mar 22, 2018
@horalstvo
Copy link
Author

horalstvo commented Mar 26, 2018

Just for the record. In our case we use Java so I used the consul-client library to communicate with the agent. And I did:

    /**
     * Get checks of the agent and look up a health check by name  or ID.
     * We look by name to account for generated IDs of health checks defined from Nomad.
     */
    private String resolveHealthCheckId(String additionalHealthCheck, String applicationName) {
        Map<String, HealthCheck> healthChecks = consul.agentClient().getChecks();

        String healthCheck = healthChecks.values().stream()
                .filter(check -> check.getServiceName().equals(Optional.of(applicationName)))
                .filter(check -> check.getName().equals(additionalHealthCheck)
                                        || check.getCheckId().equals(additionalHealthCheck))
                .map(HealthCheck::getCheckId)
                .findFirst()
                .orElseThrow(() -> new IllegalStateException(
                        String.format("The check with ID or name '%s' must be registered for application '%s'",
                                additionalHealthCheck, applicationName)));

        LOG.info("Additional health check {} for application {} resolved. ID '{}'.",
                additionalHealthCheck, applicationName, healthCheck);

        return healthCheck;
    }

Two highlights:

  • We can only check serviceName on the check since a service registered with Nomad has generated ID and the name we specified in the Nomad config.
  • We must query agentClient for checks since then it will be only checks bound to the Nomad worker. If I get checks for a service (using in my case consul.healthClient().getServiceChecks(applicationName) I would get all the checks for that service - so for all nodes.

Now I believe I still have one problem. In our current set up we run a consul agent per Nomad worker. So this agent gets registrations of all services running on that host. If a service has two instances on one worker agent checks would return two health checks with same name and service name.

I can see three ways out of this:

  • Run Consul agent as a sidecar (a bit wasteful). Would be great if there was some example of this in the Nomad configuration (one could reuse example for log shipper but there is no reasonable example either).
  • Set the constraints of the job to never schedule two instances on one worker. We may want to do this anyway but I find it wrong to add this problem as a reason why to do it.
  • Get consul service ID or check ID or both as environment variables set by Nomad (by fixing this ticket :) ).

Please let me know if I missed anything.

@Fuco1
Copy link
Contributor

Fuco1 commented Oct 12, 2019

I'm using telegraf's Consul input plugin which generates health metrics based on consul checks. I register my services through nomad and it generates new check IDs every time. This makes the tags in influxdb assume tens of thousands of values which makes the monitoring completely useless as there's no way to associate anything to anything.

All that is needed is to allow an id => "my custom id" on the check part of the service stanca. If I add checks manually in consul this is possible.

@Fuco1
Copy link
Contributor

Fuco1 commented Nov 3, 2020

Any updates on this? We would still love to use Consul's telemetry for health checks.

@imcom
Copy link

imcom commented Feb 9, 2021

@Fuco1 We are facing the exactly issue, since the community is not actively looking into this issue, we are planning to prepare a patch, would you be interested ?

@danlmr2
Copy link

danlmr2 commented Oct 4, 2024

We've also hit this. At the moment, we've settled for either reserving a static port in our job spec or using distinct_hosts = true so we can rely on locally-registered checks belonging to the right service instance.

If you're willing to rely on a naming convention, it is possible to retrieve the checks for an allocation like this:

$ curl -s "http://127.0.0.1:8500/v1/health/checks/$NOMAD_JOB_NAME" | jq -r '.[] | select(.ServiceID | contains("'"$NOMAD_ALLOC_ID"'")).CheckID'
_nomad-check-a323981091ed884a533faa3c8bd57108563c33a5
_nomad-check-d7d37722cee669a96a6e80b40bfd0d0fd183b832

This relies on Nomad embedding the allocation ID into the Consul service ID, which I don't believe is documented/guaranteed, but does seem to be the case, at least in the versions of Nomad we've tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants