-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request cpu / memory for containers in Kubeflow #3194
Comments
@easadler This is already possible, using the Example below. Similar can be made for anything that the python k8s client can handle.
|
Oh that is better than nothing! Thank you for the response. |
Any example's on how to set different CPU/Memory requests for say, Trainer / ModelResover which have very different requirements would be great. Currently I set Memory/CPU requests for the Trainer, but this slows down some of the liter components which don't need extra processing. Also mean the ExamgleGen (which may be a long stage that just reads from DB) over-requests resources. |
@vaskozl This isn't possible with current versions. See my previous answer. However, I have used the request and limit params on pipeline level and that results in pods using as much resources as they can but not more than they need. So if you set CPU request to 1 and limit to 8, a resolver will use ~1 while Trainer will max out as much as it can. This has worked quite well for me. On a side note, TFX is working towards a new intermediate representation of their pipelines which are more in line with native KubeFlow pipelines, and in those you can specify resources on a component level so hopefully this is coming. |
I'd also like this feature. My use case is I'm implementing a data cleaning pipeline to process and label the data before passing it into ExampleGen. I'm currently using custom python function components for most of the cleaning components and I'm running into all kinds of memory limitations. I understand this is not what these components are made for and I could use As a workaround I ended up using Kubeflow pipelines for the data cleaning part of my data flow, but it's a pretty young product too and I'm missing the local debugging (with breakpoints) feature that |
We are also definitely waiting for this kind of feature too ! |
Is there a way to do something similar with |
Hi, I found a (not so nice) workaround for this issue. I thought some of you might be interested in this. The trick is to check the name specified for the specific containers. Instead of using 'trainer' as in the example below one could use an entire list of component names. This example is for GPUs but it works in the same way for memory requests.
|
@chris-r-99 - Just seeing your answer. Thanks, we will take a look at it! |
@axelborja did you ever find a solution to do this with |
@lre not yet, have you? (btw sorry for the super late reply) |
Hello after investigation with @axelborja he discovered it works if you manually modify the json and modify the spec:
and then give that modified json to vertex ai pipeline, when started it will use a correct instance that respects those conditions. Those could have been set here:
we can see the param name here: https://github.com/kubeflow/pipelines/blob/7d5690a21cf8e8c464a6ddba520879bd30fd2ddc/api/v2alpha1/pipeline_spec.proto#L638 if someone from Vertex AI could confirm this workaround is valid ? thanks @chongkong i think you wrote that code do you think we could allow to set those directly when creating the step ? thanks a lot ! |
This should be available now via this commit: |
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you. |
This issue was closed due to lack of activity after being marked stale for past 7 days. |
@vrooomurthy, the commit was never pushed to the master branch. Any plans to create a pull request? |
It turns out you can specify CPU and RAM on custom components on Vertex like so:
However, the ideal would be to set the machine type directly. From initial experiments it seems like Vertex finds the closest |
I am curious if folks are thinking about allowing users to set the cpu / memory requirements for containers in Kubeflow on TFX components?
We can get around needing larger machines by using ai-platform or Dataflow, but our custom components don't really fit into that paradigm. I'm curious if the inability to set compute requirements means TFX can't take advantage of ai-platform pipeline's ability to autoscale nodes if we kick off a ton of pipelines at once.
Anyways, please let me know if this is possible and I missed it! Maybe it is possible with the upcoming V2 runner.
The text was updated successfully, but these errors were encountered: