Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pod group instead of PDB for gang scheduling #916

Closed
zionwu opened this issue Jan 23, 2019 · 33 comments · Fixed by #954
Closed

Use pod group instead of PDB for gang scheduling #916

zionwu opened this issue Jan 23, 2019 · 33 comments · Fixed by #954

Comments

@zionwu
Copy link
Contributor

zionwu commented Jan 23, 2019

The lastest version of kube-batch is using pod group instead of PDB.
With the pod group, user can specify the queue for the job, even the job is in different namespaces. With the PDB, we can't specify the queue, it is defaults to the its namespace.

PDB is still supported in kube-batch for backward compatbility, However it will be removed when v1alpha1 finalized. I think tf-operator should support pod group for more powerful scheduling.

@k82cn
Copy link
Collaborator

k82cn commented Jan 23, 2019

kubernetes/enhancements#639 was merged, I think it's time to replace PDB with PodGroup. In addition, it will also support phase/condition/status of PodGroup, e.g. unschedulable, which is helpful for lifecycle management of tfjob.

@k82cn
Copy link
Collaborator

k82cn commented Jan 23, 2019

@k82cn
Copy link
Collaborator

k82cn commented Jan 24, 2019

if no objection, I'd like to creat pr for that :)

@zionwu
Copy link
Contributor Author

zionwu commented Jan 25, 2019

@k82cn I know kube-batch is using namespace as queue as default, but it also supports creating queue CRD and specify the queue in the pod group. It would be great if I can specify the queue for my TFjob, to do that we can add to field called queue in the TFJobSpec. What do you think?

@k82cn
Copy link
Collaborator

k82cn commented Jan 25, 2019

add to field called queue in the TFJobSpec

I'm OK with that; for TFJobSpec, it's better to get @jlewi 's feedback :)

@k82cn
Copy link
Collaborator

k82cn commented Jan 25, 2019

xref kubernetes-retired/kube-batch#465

@adam-marek and I talked about Queue supported in tf-operator; it need to community's feedback on TFJob API changes.

Or maybe we can introduce annotation as an alpha features, @jlewi , WDYT?

@jlewi
Copy link
Contributor

jlewi commented Jan 28, 2019

@richardsliu @johnugeorge WDYT?

@johnugeorge
Copy link
Member

@zionwu Isn't the queue name part of the PodGroupSpec ? https://github.com/kubernetes-sigs/kube-batch/blob/master/pkg/apis/scheduling/v1alpha1/types.go#L116

In such a case, why do we need a separate queue field after we set the PodGroup for the TFJob?

@richardsliu
Copy link
Contributor

Agree with @johnugeorge on the queue name.

I would like to keep k8s details out of the TFJob spec if possible. Ideally, TFJob should contain just TF-specific fields.

@zionwu
Copy link
Contributor Author

zionwu commented Jan 29, 2019

@johnugeorge, @richardsliu , just like PDB, the PodGroup will be created by TF-Operator,. If user can only specify the queue name in PodGroup , the process to specify the queue name is:

  1. User submit a TFJob.
  2. The TF-Operator creates a PodGroup with default queue name.
  3. User has to find out the PodGroup, and modify the queue name.
    However, this may not be working as the job may already be scheduled before step 3.

I think the queue name should be decided at the creation of PodGroup. Since the PodGroup is created by TF-operator, the only way user can specify the queue name is by setting it on TFJob spec, then TF-operator can set the queue name for PodGroup accordingly.

@gaocegege
Copy link
Member

Is pod group a standard resource in K8s? I am not sure what will happen if the user does not register the resource.

@k82cn
Copy link
Collaborator

k82cn commented Jan 29, 2019

Is pod group a standard resource in K8s?

No for now; CRD should be deployed with kube-batch together. If failed to create PodGroup, just log error message.

@k82cn
Copy link
Collaborator

k82cn commented Jan 29, 2019

I would like to keep k8s details out of the TFJob spec if possible. Ideally, TFJob should contain just TF-specific fields.

If tf-operator need resource fair-sharing from scheduler, Queue is one of option; and Queue is a common concept in 'batch' system, e.g. Job, Queue.
If we have concern about the API changes, how about annotations as alpha feature?

@johnugeorge
Copy link
Member

johnugeorge commented Jan 29, 2019

@zionwu I am little skeptical about adding non-TF fields in TFJob API. We would then be forced to change the TFJob API when upstream implementation changes.

As @k82cn suggested, how about considering annotations for now? @gaocegege @richardsliu

@richardsliu
Copy link
Contributor

Would it be possible to introduce a field in https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/common/v1beta1/common_types.go? For example something like "SchedulingPolicy". That way all operators can then share the same implementation.

@k82cn
Copy link
Collaborator

k82cn commented Jan 30, 2019

For example something like "SchedulingPolicy".

+1 ; in the previous version of gang-scheduling design doc, I proposed to introduce SchedulingSpec/PodSchedulingGroup as bridge between operator and scheduler, and provide a default implementation of QueueJob. Currently, PodGroup is playing similar role, it's up to community to wrapper it as new API, and use PodGroup to communicate with scheduler.

@johnugeorge johnugeorge mentioned this issue Feb 6, 2019
4 tasks
@johnugeorge
Copy link
Member

/area 0.5.0

@ChanYiLin
Copy link
Member

@k82cn @richardsliu @gaocegege
I remembered that we have discussed wether we should add a common schedulerName in TFJob Spec or not?
In v1alpha2 we decided not to do that and users need to add schedulerName to each replicas(pod) spec.

How about in v1beta2, should we also ask users to add schedulerName to each replicas(pod) spec or also change to use annotation to indicate the scheduler?

@gaocegege
Copy link
Member

@ChanYiLin Ref #920

We decide to simplify the process.

@johnugeorge
Copy link
Member

@k82cn working on this ?

@k82cn
Copy link
Collaborator

k82cn commented Feb 18, 2019

yes, i'm working on this one :)

@richardsliu
Copy link
Contributor

@k82cn Any updates on this?

@richardsliu
Copy link
Contributor

richardsliu commented Mar 4, 2019

@k82cn Is this still on track to be finished in 0.5.0? We are trying to reach code complete by 3/15.

@k82cn
Copy link
Collaborator

k82cn commented Mar 5, 2019

@richardsliu , sorry for the delay response. Let me try to submit a PR this week.

@richardsliu
Copy link
Contributor

@thandayuthapani @k82cn Any more pending work for this item? This was automatically closed by k8s bot.

@thandayuthapani
Copy link
Contributor

@richardsliu Yeah, in v1beta2 side, it uses podGroup instead of PDB during gang scheduling.

@johnugeorge
Copy link
Member

@thandayuthapani Are you working on SchedulingPolicy? (From #916 (comment) )

@k82cn
Copy link
Collaborator

k82cn commented Mar 20, 2019

Any more pending work for this item?

@richardsliu , no more items for 0.5 release

Are you working on SchedulingPolicy?

@johnugeorge , we'll do some investigation after 0.5 release.

@richardsliu richardsliu mentioned this issue Mar 27, 2019
4 tasks
@jlewi
Copy link
Contributor

jlewi commented May 13, 2019

@johnugeorge @k82cn @richardsliu Is there more work to be done before we close out this issue? If there's more work can we clarify what work is needed and if anyone is planning on taking it on?

@richardsliu
Copy link
Contributor

@k82cn Is there any more we want to do for 0.6?

Note that if the scheduling policy changes need to go in the common API repo, then it can wait until after 0.7.

@gaocegege
Copy link
Member

Ref kubeflow/common#46

It is proposed in common.

@jtfogarty
Copy link

/kind feature

@stale
Copy link

stale bot commented Apr 20, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot closed this as completed Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants