- 
                Notifications
    You must be signed in to change notification settings 
- Fork 41.6k
Closed
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.sig/appsCategorizes an issue or PR as relevant to SIG Apps.Categorizes an issue or PR as relevant to SIG Apps.wg/batchCategorizes an issue or PR as relevant to WG Batch.Categorizes an issue or PR as relevant to WG Batch.
Description
What would you like to be added?
A mode of operation for Jobs with .spec.completionMode="Indexed that allows every index to execute.
Currently this is not possible because when a job reaches its .spec.backoffLimit, active pods will be deleted; moreover, the job is declared failed and so no new pods are created for the indices that didn't execute yet (happens more often when parallelism < completions).
I can think of two open issues:
- How to decide when to stop retrying a failed index. One approach is to consider backofflimit at the index level, this will be challenging to track per index in the job status, but one solution is to have the backofflimit with min semantics: each index is guaranteed to at least backofflimitretries, we track that in the job-controller memory for each index, and in status we only track which indexes reached the limit as a bitmap.
- Job failure status: in the simplest case we could just declare the job failed if at least one index failed, but we could also introduce an API to allow users to tune that (perhaps based on a percentage or a min number of indexes).
Related comment to this issue: #109131 (comment)
Why is this needed?
There are cases where the indexes represent independent operations, and so it is desired to continue and execute all of them before declaring the job complete.
isibeni, jensentanlo, baurmatt, mimowo, dgouju and 2 more
Metadata
Metadata
Assignees
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.sig/appsCategorizes an issue or PR as relevant to SIG Apps.Categorizes an issue or PR as relevant to SIG Apps.wg/batchCategorizes an issue or PR as relevant to WG Batch.Categorizes an issue or PR as relevant to WG Batch.
Type
Projects
Status
Closed