Skip to content

Apparently simple resource request not possible in canonical jobspec? #150

@SteVwonder

Description

@SteVwonder

@grondo's original suggestion for a feature to flux-jobspec:

... isn't the key difference between flux jobspec srun and sbatch that sbatch should insert a flux start or flux broker before the provided COMMAND (which is assumed to be a batch script), and ensure that the tasks run once per node? (or something similar

@SteVwonder's reply:

... To make that last piece about one per node work, I think --nodes would have to be a required argument. If it is optional and the user only specifies --cores, then the tool will have to make a determination/assumption about the number of cores per node......actually, now that I think about this, is it even possible to specify that kind of request in the canonical jobspec? The request of thinking of is along the lines of: "I want N cores. I don't care about the number of nodes, but I want 1 process per node."

@grondo's reply:

It does feel like we got ourselves into a bind by requiring that the slot shape be defined as part of the resource request. There are certain things that are now not possible or very difficult to express (though it could just be my misunderstanding of current jobspec definition). In general, I don't see how you can define a slot at a parent of the resource type you want to request without affect the exact count (and exclusivity) of the parent resource type (if that makes any sense, specific example is what we're discussing here where you want N cores with a slot of a node, but don't care how many nodes. In fact, the exact slot shape can only be determined from a concrete resource set)

A couple ideas on how to handle this come to mind:

  • add a concept of predefined slots that match resource names, e.g. node, socket, core that allow the slot to be left out of the jobspec resources section, e.g. in the case of flux jobspec sbatch you could emit a resources section with no slot and a task: with slot: label node:
  • add a special value for count: like any or 0 that indicates no preference for the number of this type, and any slot that contains an any resource as a direct child would by definition apply to exactly 1 of that resource.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions