batch: reduce server overhead when dispatching jobs from CLI#27631
batch: reduce server overhead when dispatching jobs from CLI#27631
Conversation
While working on an issue to reduce load the Nomad CLI can place on the server, I discovered that go-bexpr does not handle pointers in structs usefully or even safely. Without an option for "is nil", users are likely to try "is empty" on a pointer object. But an expression like "/TopValue/MaybeNilValue is empty" panics because the handler for empty only works for collections. Fortunately in Nomad we don't really trust go-bexpr not to panic and have recover handling, so this returns an error rather than crashing the control plane. Add "is nil" and "is not nil" to the grammar. Make "is empty" handle non-collections safely and do the intuitive thing when given a pointer to a struct. Ref: https://hashicorp.atlassian.net/browse/NMD-941 Ref: hashicorp/nomad#26653 Ref: hashicorp/nomad#27631
While working on an issue to reduce load the Nomad CLI can place on the server, I discovered that go-bexpr does not handle pointers in structs usefully or even safely. Without an option for "is nil", users are likely to try "is empty" on a pointer object. But an expression like "/TopValue/MaybeNilValue is empty" panics because the handler for empty only works for collections. Fortunately in Nomad we don't really trust go-bexpr not to panic and have recover handling, so this returns an error rather than crashing the control plane. Add "is nil" and "is not nil" to the grammar. Make "is empty" handle non-collections safely and do the intuitive thing when given a pointer to a struct. Ref: https://hashicorp.atlassian.net/browse/NMD-941 Ref: hashicorp/nomad#26653 Ref: hashicorp/nomad#27631
71dc902 to
16045b9
Compare
When dispatching parameterized jobs or forcing periodic jobs to run via the CLI, we do a prefix lookup as we do with most other commands. But in this case we end up getting a potentially very large set of jobs back from the server, even if we have an exact match for the prefix. This can cause excess CPU/memory load in the RPC and HTTP API handlers as we have to serialize these large sets just to report the error to the user. Update the CLI so that it uses a go-bexpr filter to filter down to the set of jobs we need for these operations. This requires an update to go-bexpr to support nil checking on pointers in structs. Also add a page size to the list results to reduce the load for all commands that need to do a prefix lookup on jobs. Ref: https://hashicorp.atlassian.net/browse/NMD-941 Fixes: #26653
16045b9 to
990ff2c
Compare
allisonlarson
left a comment
There was a problem hiding this comment.
Neat!! I have one clarifying question for my understanding, but otherwise 👍
| return j.ParameterizedJob | ||
| }) | ||
| jobID, namespace, err := c.JobIDByPrefix(client, jobIDPrefix, | ||
| `ParentID == "" and ParameterizedJob is not nil`) |
There was a problem hiding this comment.
I poked around and saw that JobListStub.ParameterizedJob is a bool; are there cases where the value would be false instead of nil? Or does it always get serialized as nil if the actual ParameterizedJobConfig isn't present on the Job?
There was a problem hiding this comment.
I got caught by this too when I was working on it, because it's a little surprising: the filter expression is not on the stub but on the full object. Ref https://developer.hashicorp.com/nomad/api-docs#list-stubs
Some list endpoints return a reduced version of the resource being queried. This smaller version is called a stub and may have different fields than the full resource definition. To allow more expressive filtering operations, the filter is applied to the full version, not the stub.
This is a particularly weird case because we have a field with the same name that's of a different type!
There was a problem hiding this comment.
Oh wow! There it is 😅 Yeah, just poking through the code didn't make that clear at all. Thanks!! This makes a lot more sense, and makes the filtering way more powerful here than I thought
We've added "is nil" and "is not nil" to the `bexpr` filter expression language, and shipped this in Nomad 1.11.3. Ref: hashicorp/nomad#27631
We've added "is nil" and "is not nil" to the `bexpr` filter expression language, and shipped this in Nomad 1.11.3. Ref: hashicorp/nomad#27631
When dispatching parameterized jobs or forcing periodic jobs to run via the CLI, we do a prefix lookup as we do with most other commands. But in this case we end up getting a potentially very large set of jobs back from the server, even if we have an exact match for the prefix. This can cause excess CPU/memory load in the RPC and HTTP API handlers as we have to serialize these large sets just to report the error to the user.
Update the CLI so that it uses a go-bexpr filter to filter down to the set of jobs we need for these operations. This requires an update to go-bexpr to support nil checking on pointers in structs. Also add a page size to the list results to reduce the load for all commands that need to do a prefix lookup on jobs.
Ref: https://hashicorp.atlassian.net/browse/NMD-941
Ref: hashicorp/go-bexpr#129
Fixes: #26653
Testing & Reproduction steps
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad product documentation, which is stored in the
web-unified-docsrepo. Refer to theweb-unified-docscontributor guide for docs guidelines.Please also consider whether the change requires notes within the upgrade
guide. If you would like help with the docs, tag the
nomad-docsteam in this PR.Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.