-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad Periodic job enabled=false , recommended way to gain insights into this? #24119
Comments
Hi @dmclf and thanks for raising this issue. I have tried to respond to each question below, with a further note on a particular line that caught my attention.
The question that comes to mind when reading this is why do you need to raise alerts on this? If periodic jobs are having their enabled flag altered (or deployed with the wrong setting) and that is cause to trigger an alert, I would lean towards tighter access control on the Nomad cluster, job specifications within source control, and CI/CD automation for job deployments.
The periodic block is only available when reading the job specification from Nomad and is not available within the listing. In order to display a button on the job list page, the UI would need to list and then read every job. This is time, network, and computationally expensive and must be done for every job, irregardless of whether they are periodic or not, in order to discover this fact.
The API is likely the better tool to use for this kind of work. The following example utilizes curl and JQ to get the periodic enabled value of each running job, printing out the information is a readable manner:
This would require the Nomad servers to emit telemetry based on static job specification parameters rather than runtime information. This brings numerous questions and adds considerable computational overhead and would increase the cardinality of our metrics as 100 periodic jobs would produce 100 new data points. If this was the route you wanted to go, I would first consider building a small Prometheus sidecar exporter, which could consume the Nomad API and present the required data for scarping. I would also be worried about the potential for this enhancement to creep of other job specification parameters if we chose to include it. |
Hi @jrasell re 1: the jobs are actually fully in CI/CD including job deployments Which, if happens, will go unnoticed (job may look to be 'running' from a first glance) and for infrequent jobs, there wont be children due to GC's, so it will be hard to notice. Will likely explore options for custom API monitoring, thanks you for confirming this path. |
Another thought is that this is the sort of thing you can monitor with Sentinel, for Nomad Enterprise. |
Ok, just to briefly comment on Nomad (and Consul/Vault) Enterprise for our use case, cost/benefit ratio was disproportionate. For comparison with other players in the market, that other container orchestration platform does not seem to have issues with this being a specific metric (that does not make them right nor a standard)
Anyway, recommended path was clarified, so case closed. |
following issue 19671 more wondering about Hashicorp Nomad guidelines to be aware of a disabled periodic job.
as with version nomad 1.8.3, when you have a periodic job that is disabled:
nomad job status
marked as Runningnomad job inspect jobname my-disabled-periodic-job |jq '.Job.Periodic.Enabled'
but that also may not be 'convenient' to monitor on a 24x7 basis and raise alertsso then generic question:
(As issue 19671 was closed as 'not worth doing' )
(or, is it not cleaner in that case to deprecate the 'enabled' flag from the periodic jobspec to avoid confusion ? )
The text was updated successfully, but these errors were encountered: