Skip to content

batch: reduce server overhead when dispatching jobs from CLI#27631

Merged
tgross merged 1 commit intomainfrom
NMD396-job-dispatch-cli-overhead
Mar 6, 2026
Merged

batch: reduce server overhead when dispatching jobs from CLI#27631
tgross merged 1 commit intomainfrom
NMD396-job-dispatch-cli-overhead

Conversation

@tgross
Copy link
Copy Markdown
Member

@tgross tgross commented Mar 5, 2026

When dispatching parameterized jobs or forcing periodic jobs to run via the CLI, we do a prefix lookup as we do with most other commands. But in this case we end up getting a potentially very large set of jobs back from the server, even if we have an exact match for the prefix. This can cause excess CPU/memory load in the RPC and HTTP API handlers as we have to serialize these large sets just to report the error to the user.

Update the CLI so that it uses a go-bexpr filter to filter down to the set of jobs we need for these operations. This requires an update to go-bexpr to support nil checking on pointers in structs. Also add a page size to the list results to reduce the load for all commands that need to do a prefix lookup on jobs.

Ref: https://hashicorp.atlassian.net/browse/NMD-941
Ref: hashicorp/go-bexpr#129
Fixes: #26653

Testing & Reproduction steps

$ nomad job run ./jobs/mini-dispatch.nomad.hcl
Job registration successful

$ nomad job dispatch -detach ex
Dispatched Job ID = example/dispatch-1772727387-5152f356
Evaluation ID     = e89062dc

$ nomad job dispatch -detach example/dispatch
No parameterized job(s) with prefix or ID "example/dispatch" found

$ nomad job run ./jobs/periodic.nomad.hcl
Job registration successful
Approximate next launch time: 2026-03-05T21:30:00-08:00 (13h12m42s from now)

$ nomad job periodic force myperiodic
==> 2026-03-05T11:17:30-05:00: Monitoring evaluation "57e5796f"
    2026-03-05T11:17:30-05:00: Evaluation triggered by job "myperiodic/periodic-1772727450"
    2026-03-05T11:17:31-05:00: Allocation "ebbb6ef4" created: node "fd15cdd9", group "group"
    2026-03-05T11:17:31-05:00: Evaluation status changed: "pending" -> "complete"
==> 2026-03-05T11:17:31-05:00: Evaluation "57e5796f" finished with status "complete"

$ nomad job periodic force example
No periodic job(s) with prefix or ID "example" found

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad product documentation, which is stored in the
    web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
    Please also consider whether the change requires notes within the upgrade
    guide
    . If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

tgross added a commit to hashicorp/go-bexpr that referenced this pull request Mar 5, 2026
While working on an issue to reduce load the Nomad CLI can place on the server,
I discovered that go-bexpr does not handle pointers in structs usefully or even
safely.

Without an option for "is nil", users are likely to try "is empty" on a pointer
object. But an expression like "/TopValue/MaybeNilValue is empty" panics because
the handler for empty only works for collections. Fortunately in Nomad we don't
really trust go-bexpr not to panic and have recover handling, so this returns an
error rather than crashing the control plane.

Add "is nil" and "is not nil" to the grammar. Make "is empty" handle
non-collections safely and do the intuitive thing when given a pointer to a
struct.

Ref: https://hashicorp.atlassian.net/browse/NMD-941
Ref: hashicorp/nomad#26653
Ref: hashicorp/nomad#27631
tgross added a commit to hashicorp/go-bexpr that referenced this pull request Mar 5, 2026
While working on an issue to reduce load the Nomad CLI can place on the server,
I discovered that go-bexpr does not handle pointers in structs usefully or even
safely.

Without an option for "is nil", users are likely to try "is empty" on a pointer
object. But an expression like "/TopValue/MaybeNilValue is empty" panics because
the handler for empty only works for collections. Fortunately in Nomad we don't
really trust go-bexpr not to panic and have recover handling, so this returns an
error rather than crashing the control plane.

Add "is nil" and "is not nil" to the grammar. Make "is empty" handle
non-collections safely and do the intuitive thing when given a pointer to a
struct.

Ref: https://hashicorp.atlassian.net/browse/NMD-941
Ref: hashicorp/nomad#26653
Ref: hashicorp/nomad#27631
@tgross tgross mentioned this pull request Mar 5, 2026
3 tasks
@tgross tgross force-pushed the NMD396-job-dispatch-cli-overhead branch from 71dc902 to 16045b9 Compare March 5, 2026 21:06
@tgross tgross added the backport/1.11.x backport to 1.11.x release line label Mar 5, 2026
When dispatching parameterized jobs or forcing periodic jobs to run via the CLI,
we do a prefix lookup as we do with most other commands. But in this case we end
up getting a potentially very large set of jobs back from the server, even if we
have an exact match for the prefix. This can cause excess CPU/memory load in the
RPC and HTTP API handlers as we have to serialize these large sets just to
report the error to the user.

Update the CLI so that it uses a go-bexpr filter to filter down to the set of
jobs we need for these operations. This requires an update to go-bexpr to
support nil checking on pointers in structs. Also add a page size to the list
results to reduce the load for all commands that need to do a prefix lookup on
jobs.

Ref: https://hashicorp.atlassian.net/browse/NMD-941
Fixes: #26653
@tgross tgross force-pushed the NMD396-job-dispatch-cli-overhead branch from 16045b9 to 990ff2c Compare March 5, 2026 21:10
@tgross tgross marked this pull request as ready for review March 5, 2026 21:34
@tgross tgross requested review from a team as code owners March 5, 2026 21:34
Copy link
Copy Markdown
Contributor

@allisonlarson allisonlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!! I have one clarifying question for my understanding, but otherwise 👍

Comment thread command/job_dispatch.go
return j.ParameterizedJob
})
jobID, namespace, err := c.JobIDByPrefix(client, jobIDPrefix,
`ParentID == "" and ParameterizedJob is not nil`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked around and saw that JobListStub.ParameterizedJob is a bool; are there cases where the value would be false instead of nil? Or does it always get serialized as nil if the actual ParameterizedJobConfig isn't present on the Job?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got caught by this too when I was working on it, because it's a little surprising: the filter expression is not on the stub but on the full object. Ref https://developer.hashicorp.com/nomad/api-docs#list-stubs

Some list endpoints return a reduced version of the resource being queried. This smaller version is called a stub and may have different fields than the full resource definition. To allow more expressive filtering operations, the filter is applied to the full version, not the stub.

This is a particularly weird case because we have a field with the same name that's of a different type!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow! There it is 😅 Yeah, just poking through the code didn't make that clear at all. Thanks!! This makes a lot more sense, and makes the filtering way more powerful here than I thought

@tgross tgross merged commit b7fb2a0 into main Mar 6, 2026
37 checks passed
@tgross tgross deleted the NMD396-job-dispatch-cli-overhead branch March 6, 2026 13:21
tgross added a commit to hashicorp/web-unified-docs that referenced this pull request Mar 11, 2026
We've added "is nil" and "is not nil" to the `bexpr` filter expression language,
and shipped this in Nomad 1.11.3.

Ref: hashicorp/nomad#27631
tgross added a commit to hashicorp/web-unified-docs that referenced this pull request Mar 12, 2026
We've added "is nil" and "is not nil" to the `bexpr` filter expression language,
and shipped this in Nomad 1.11.3.

Ref: hashicorp/nomad#27631
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

job dispatch CLI creates high server overhead at high volume

2 participants