Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Gang Jobs Work in The simulator #4024

Merged
merged 12 commits into from
Oct 25, 2024
Merged

Make Gang Jobs Work in The simulator #4024

merged 12 commits into from
Oct 25, 2024

Conversation

d80tb7
Copy link
Collaborator

@d80tb7 d80tb7 commented Oct 24, 2024

  • You can now define a GangCardinality label on a JobTemplate. If this is specified then the jobs will be divided up into gangs wit the given cardinality.
  • Nodes get a label of armadaproject.io/clusterName so that the restriction of one gang per cluster works.
  • Gang accounting maps are now populated so that gang preemption works as expected.
  • There are a couple of unit tests to assert the above.

Outstanding issues are:

  • Gang job durations are not correlated. I.e. they will take the shifted exponential of the workflow. I think the simplest thing to do is to have all gang jobs for a given gang finish at the same time. A slightly better approach would be to have all gang jobs finish within a short time.
  • Re submission after preemption will not work properly if some jobs in the gang have finished, but others haven't. That's because the resubmission code will only resubmit the preempted jobs. As a result you'll have a partial gang resubmitted which can never be scheduled. Solution here is to make the resubmission code resubmit the whole gang, but I want to fiirst break out that logic into a simple workflow manager struct, such that we can separate out workflow management from the main simulator.

d80tb7 and others added 12 commits October 23, 2024 08:25
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
Signed-off-by: Chris Martin <[email protected]>
@d80tb7 d80tb7 changed the title [WIP] Make Gang Jobs Work in The simulator Make Gang Jobs Work in The simulator Oct 25, 2024
@d80tb7 d80tb7 enabled auto-merge (squash) October 25, 2024 09:31
@d80tb7 d80tb7 merged commit 95fa3b2 into master Oct 25, 2024
20 checks passed
@d80tb7 d80tb7 deleted the f/chrisma/simulate-gangs branch October 25, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants