Skip to content

Conversation

sam-maloney
Copy link
Contributor

@sam-maloney sam-maloney commented Oct 8, 2025

This PR adds more thorough checks for min/max/operand/operator combinations for resource count complex ranges, accompanied by new/modified "invalid" jobspec examples to match the validation and increase cover. The penultimate commit clarifies some "typos" in error messages, and the final commit simplifies the validation of required top-level keys using positional arguments in the class constructors (feel free to revert this if you disagree, it just felt much cleaner when I was examining how everything was implemented).

One point that merits attention is that I have made the max key always optional in the complex range; this disagrees with with the current RFC14 (and my earlier change in flux-framework/rfc#444) but after implementing the count infrastructure and the range string spec in RFC45 it seems clear to me that the max key should always be optional to indicate an unbounded/infinite/give_me_everything_you_got request, without the user or CLI tools having to otherwise insert an arbitrary large number whenever a non-default operand/operator combination is used. If this seems sensible to others I am of course happy to tweak RFC14 to match (and I suppose also update my changes from flux-framework/flux-sched#1341)

@grondo
Copy link
Contributor

grondo commented Oct 8, 2025

Thanks @sam-maloney! These changes look great! An optional max parameter for complex ranges makes a lot of sense to me. However, I would want @trws or @milroy to comment about Fluxion behavior before merging.

How do you envision this looking on the command line, i.e. the difference between specifying an exact count vs a minimum with unlimited maximum. (Sorry if you've already explained this elsewhere, I probably lost track of any discussion here)

@sam-maloney
Copy link
Contributor Author

sam-maloney commented Oct 9, 2025

How do you envision this looking on the command line, i.e. the difference between specifying an exact count vs a minimum with unlimited maximum. (Sorry if you've already explained this elsewhere, I probably lost track of any discussion here)

@grondo My initial thought would be simply allowing RFC45 strings to be used for current CLI options, e.g.

flux run -n 1+ hostname

or

flux run -n 2+:2:* hostname

where in the second example it's implicit in RFC45 that the max is optional even when non-default operator/operand combinations are used, so that a user never needs to write something like flux run -N 2-RANGE_MAX:2:* hostname which would require knowledge (or guesswork) as to what RANGE_MAX should be, or what the full system size is.

Eventually a CLI option/plugin to allow use of RFC46 strings would also be nice for testing or power users, something like

flux run --resources="slot=2+:2:*/core" hostname

@sam-maloney
Copy link
Contributor Author

However, I would want @trws or @milroy to comment about Fluxion behavior before merging.

FWIW, Fluxion originally used defaults for all three of max/operand/operator and I was the one who added the constraints to match the flux-core validation. So this would simply be a partial revert to separate the requirement for the max key from the other two, while keeping the improved validation of operand/operator combinations. Still good to have their input, but not really anything new, just me understanding better what actually makes sense, haha!

@trws
Copy link
Member

trws commented Oct 9, 2025

That sounds right. The fluxion code that reads from these always uses the max currently though, and somewhat relies on there being an upper bound. We can certainly make the max optional, but the effect will be that the min will become the max until we have time to work through how to handle unbounded requests.

grondo
grondo previously approved these changes Oct 9, 2025
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me on the flux-core side! Thanks @sam-maloney and @trws!

@grondo
Copy link
Contributor

grondo commented Oct 13, 2025

Setting MWP here. Thanks again @sam-maloney

Copy link
Contributor

mergify bot commented Oct 13, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks.

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@mergify mergify bot removed the queued label Oct 13, 2025
Problem: complex ranges validation is not as complete as it should be

Add more thorough checks of min/max/operand/operator combinations
Problem: "max" key should always be optional in complex ranges

Remove the previously "invalid" jobspec from the testsuite related to
"max" key being missing
Problem: several tests in validator for invalid jobpsec are not covered

Add invalid inputs for the following:
- invalid operator in resource count complex range
- zero value for min in resource count complex range
- type not int for min in resource count complex range
- max < min in resource count complex range
- min = 1 with "^" operator in resource count complex range
- operand = 1 with "*" operator in resource count complex range
- operand = 0 in resource count complex range
- operator not a single char in resource count complex range
- resource count negative for resource not in ["node", "slot", "core"]
- resource type not a string
- version = 0
Problem: t/jobspec/invalid/resource_slot_with_zero.yaml is a duplicate
of t/jobspec/invalid/resource_count_is_zero.yaml

Remove the duplicate jobspec
Problem: message for wrong 'attributes' type incorrectly says 'count'
and message for resource counts being > 0 omits 'core'

Change 'count' to 'attributes' and add 'core' to list of resource types
in respective error messages
Problem: validation of required keys is inconsistent; "resources" and
"tasks" are required positional arguments to the constructors, while
"attributes" and "version" are kwargs, which require additional checks.

Make "attributes" and "version" also positional arguments, with a
default value for "version" in the JobspecV1 constructor so it remains
optional there. This greatly simplifies the validation of these
mandatory top-level keys, and improves consistency.
@grondo grondo force-pushed the cleanup-validator branch from 57ef97e to 6b136a9 Compare October 14, 2025 16:52
@mergify mergify bot dismissed grondo’s stale review October 14, 2025 16:52

Approving reviews have been dismissed because this pull request
was updated.

@grondo
Copy link
Contributor

grondo commented Oct 14, 2025

@Mergifyio requeue

Copy link
Contributor

mergify bot commented Oct 14, 2025

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

@mergify mergify bot added the queued label Oct 14, 2025
@mergify mergify bot merged commit ec4cdba into flux-framework:master Oct 14, 2025
35 checks passed
@mergify mergify bot removed the queued label Oct 14, 2025
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.86%. Comparing base (7a81d53) to head (6b136a9).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/bindings/python/flux/job/Jobspec.py 96.77% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7130   +/-   ##
=======================================
  Coverage   83.85%   83.86%           
=======================================
  Files         551      551           
  Lines       93303    93304    +1     
=======================================
+ Hits        78241    78250    +9     
+ Misses      15062    15054    -8     
Files with missing lines Coverage Δ
src/bindings/python/flux/job/Jobspec.py 92.95% <96.77%> (+1.18%) ⬆️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sam-maloney sam-maloney deleted the cleanup-validator branch October 15, 2025 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants