Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request cpu / memory for containers in Kubeflow #3194

Closed
easadler opened this issue Feb 3, 2021 · 18 comments
Closed

Request cpu / memory for containers in Kubeflow #3194

easadler opened this issue Feb 3, 2021 · 18 comments

Comments

@easadler
Copy link

easadler commented Feb 3, 2021

I am curious if folks are thinking about allowing users to set the cpu / memory requirements for containers in Kubeflow on TFX components?

We can get around needing larger machines by using ai-platform or Dataflow, but our custom components don't really fit into that paradigm. I'm curious if the inability to set compute requirements means TFX can't take advantage of ai-platform pipeline's ability to autoscale nodes if we kick off a ton of pipelines at once.

Anyways, please let me know if this is possible and I missed it! Maybe it is possible with the upcoming V2 runner.

@ConverJens
Copy link
Contributor

ConverJens commented Feb 16, 2021

@easadler This is already possible, using the pipeline_operator_funcs argument when compiling the pipeline. The downside is that you can't specify it on a component level, only on the entire pipeline.

Example below. Similar can be made for anything that the python k8s client can handle.

def set_memory_request_and_limits(memory_request, memory_limit):
    def _set_memory_request_and_limits(task):
        return (
            task.container.set_memory_request(memory_request)
                .set_memory_limit(memory_limit)
        )

    return _set_memory_request_and_limits
...
kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=([set_memory_request_and_limits(memory_request_param, memory_limit_param)]
)

@easadler
Copy link
Author

Oh that is better than nothing! Thank you for the response.

@vaskozl
Copy link

vaskozl commented Mar 15, 2021

Any example's on how to set different CPU/Memory requests for say, Trainer / ModelResover which have very different requirements would be great. Currently I set Memory/CPU requests for the Trainer, but this slows down some of the liter components which don't need extra processing. Also mean the ExamgleGen (which may be a long stage that just reads from DB) over-requests resources.

@ConverJens
Copy link
Contributor

ConverJens commented Mar 19, 2021

@vaskozl This isn't possible with current versions. See my previous answer.

However, I have used the request and limit params on pipeline level and that results in pods using as much resources as they can but not more than they need. So if you set CPU request to 1 and limit to 8, a resolver will use ~1 while Trainer will max out as much as it can. This has worked quite well for me.

On a side note, TFX is working towards a new intermediate representation of their pipelines which are more in line with native KubeFlow pipelines, and in those you can specify resources on a component level so hopefully this is coming.

@SuperCorks
Copy link

I'd also like this feature.

My use case is I'm implementing a data cleaning pipeline to process and label the data before passing it into ExampleGen. I'm currently using custom python function components for most of the cleaning components and I'm running into all kinds of memory limitations. I understand this is not what these components are made for and I could use Transform components, but I haven't seen any way to set memory/cpu constraints on Transform components either (please let me know if it's possible).

As a workaround I ended up using Kubeflow pipelines for the data cleaning part of my data flow, but it's a pretty young product too and I'm missing the local debugging (with breakpoints) feature that tfx.LocalDagRunner already has.

@axelborja
Copy link

axelborja commented Jul 19, 2021

We are also definitely waiting for this kind of feature too !

@axelborja
Copy link

axelborja commented Jul 19, 2021

@easadler This is already possible, using the pipeline_operator_funcs argument when compiling the pipeline. The downside is that you can't specify it on a component level, only on the entire pipeline.

Example below. Similar can be made for anything that the python k8s client can handle.

def set_memory_request_and_limits(memory_request, memory_limit):
    def _set_memory_request_and_limits(task):
        return (
            task.container.set_memory_request(memory_request)
                .set_memory_limit(memory_limit)
        )

    return _set_memory_request_and_limits
...
kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=([set_memory_request_and_limits(memory_request_param, memory_limit_param)]
)

Is there a way to do something similar with KubeflowV2DagRunnerConfig ?

@chris-r-99
Copy link

chris-r-99 commented Aug 18, 2021

Hi, I found a (not so nice) workaround for this issue. I thought some of you might be interested in this. The trick is to check the name specified for the specific containers. Instead of using 'trainer' as in the example below one could use an entire list of component names. This example is for GPUs but it works in the same way for memory requests.

def get_gpu():
    def _set_gpu_limit(container_op):    
       print(container_op.name)
       if container_op.name == 'trainer':
          container_op.set_gpu_limit(1)
       return _set_gpu_limit

pipeline_operator_funcs = kubeflow_dag_runner.get_default_pipeline_operator_funcs()
pipeline_operator_funcs.append(get_gpu())

runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=pipeline_operator_funcs,
   ...
)

@axelborja
Copy link

@chris-r-99 - Just seeing your answer. Thanks, we will take a look at it!

@lre
Copy link

lre commented Nov 26, 2021

@axelborja did you ever find a solution to do this with KubeflowV2DagRunnerConfig ?

@axelborja
Copy link

axelborja commented Jan 20, 2022

@lre not yet, have you? (btw sorry for the super late reply)

@tanguycdls
Copy link

tanguycdls commented Jan 26, 2022

Hello after investigation with @axelborja he discovered it works if you manually modify the json and modify the spec:

with open(ORGINAL_PIPELINE_FILE) as json_file:
    data = json.load(json_file)

for component_name, component_executor_spec in data["pipelineSpec"]["deploymentSpec"]["executors"].items():
    if component_name.startswith("the_name_of_the_component_you_wish_to_extend"):
        component_executor_spec["container"]["resources"] = {
              "cpuLimit": 16.0,
              "memoryLimit": 64.0
            }

and then give that modified json to vertex ai pipeline, when started it will use a correct instance that respects those conditions.

Those could have been set here:

result = ContainerSpec(

we can see the param name here:
https://github.com/kubeflow/pipelines/blob/7d5690a21cf8e8c464a6ddba520879bd30fd2ddc/api/v2alpha1/pipeline_spec.proto#L638

if someone from Vertex AI could confirm this workaround is valid ? thanks

@chongkong i think you wrote that code do you think we could allow to set those directly when creating the step ?

thanks a lot !

@murthy-varuns
Copy link

This should be available now via this commit:
b15d592.

@singhniraj08 singhniraj08 self-assigned this Jan 5, 2023
@singhniraj08
Copy link
Contributor

@easadler,

This commit b15d592 should enable you custom resource-setting (vCPU and RAM) for containers orchestrating
on Vertex AI. Kindly let us know if this helps your use-case. Thank you!

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Apr 6, 2023
@github-actions
Copy link
Contributor

This issue was closed due to lack of activity after being marked stale for past 7 days.

@axeltidemann
Copy link
Contributor

This should be available now via this commit:
b15d592.

@vrooomurthy, the commit was never pushed to the master branch. Any plans to create a pull request?

@axeltidemann
Copy link
Contributor

It turns out you can specify CPU and RAM on custom components on Vertex like so:

from kfp.pipeline_spec import pipeline_spec_pb2 as pipeline_pb2

my_component = MyComponent().with_platform_config(
        pipeline_pb2.PipelineDeploymentConfig.PipelineContainerSpec
        .ResourceSpec(cpu_limit=2.0, memory_limit=4.0))

However, the ideal would be to set the machine type directly. From initial experiments it seems like Vertex finds the closest e2 configuration that matches the requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests