Request cpu / memory for containers in Kubeflow #3194

easadler · 2021-02-03T22:36:22Z

I am curious if folks are thinking about allowing users to set the cpu / memory requirements for containers in Kubeflow on TFX components?

We can get around needing larger machines by using ai-platform or Dataflow, but our custom components don't really fit into that paradigm. I'm curious if the inability to set compute requirements means TFX can't take advantage of ai-platform pipeline's ability to autoscale nodes if we kick off a ton of pipelines at once.

Anyways, please let me know if this is possible and I missed it! Maybe it is possible with the upcoming V2 runner.

ConverJens · 2021-02-16T11:58:40Z

@easadler This is already possible, using the pipeline_operator_funcs argument when compiling the pipeline. The downside is that you can't specify it on a component level, only on the entire pipeline.

Example below. Similar can be made for anything that the python k8s client can handle.

def set_memory_request_and_limits(memory_request, memory_limit):
    def _set_memory_request_and_limits(task):
        return (
            task.container.set_memory_request(memory_request)
                .set_memory_limit(memory_limit)
        )

    return _set_memory_request_and_limits
...
kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=([set_memory_request_and_limits(memory_request_param, memory_limit_param)]
)

easadler · 2021-02-17T14:31:59Z

Oh that is better than nothing! Thank you for the response.

vaskozl · 2021-03-15T16:58:42Z

Any example's on how to set different CPU/Memory requests for say, Trainer / ModelResover which have very different requirements would be great. Currently I set Memory/CPU requests for the Trainer, but this slows down some of the liter components which don't need extra processing. Also mean the ExamgleGen (which may be a long stage that just reads from DB) over-requests resources.

ConverJens · 2021-03-19T06:48:19Z

@vaskozl This isn't possible with current versions. See my previous answer.

However, I have used the request and limit params on pipeline level and that results in pods using as much resources as they can but not more than they need. So if you set CPU request to 1 and limit to 8, a resolver will use ~1 while Trainer will max out as much as it can. This has worked quite well for me.

On a side note, TFX is working towards a new intermediate representation of their pipelines which are more in line with native KubeFlow pipelines, and in those you can specify resources on a component level so hopefully this is coming.

SuperCorks · 2021-06-28T19:38:00Z

I'd also like this feature.

My use case is I'm implementing a data cleaning pipeline to process and label the data before passing it into ExampleGen. I'm currently using custom python function components for most of the cleaning components and I'm running into all kinds of memory limitations. I understand this is not what these components are made for and I could use Transform components, but I haven't seen any way to set memory/cpu constraints on Transform components either (please let me know if it's possible).

As a workaround I ended up using Kubeflow pipelines for the data cleaning part of my data flow, but it's a pretty young product too and I'm missing the local debugging (with breakpoints) feature that tfx.LocalDagRunner already has.

axelborja · 2021-07-19T10:36:23Z

We are also definitely waiting for this kind of feature too !

axelborja · 2021-07-19T10:43:12Z

@easadler This is already possible, using the pipeline_operator_funcs argument when compiling the pipeline. The downside is that you can't specify it on a component level, only on the entire pipeline.

Example below. Similar can be made for anything that the python k8s client can handle.
def set_memory_request_and_limits(memory_request, memory_limit):
    def _set_memory_request_and_limits(task):
        return (
            task.container.set_memory_request(memory_request)
                .set_memory_limit(memory_limit)
        )

    return _set_memory_request_and_limits
...
kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=([set_memory_request_and_limits(memory_request_param, memory_limit_param)]
)

Is there a way to do something similar with KubeflowV2DagRunnerConfig ?

chris-r-99 · 2021-08-18T14:38:25Z

Hi, I found a (not so nice) workaround for this issue. I thought some of you might be interested in this. The trick is to check the name specified for the specific containers. Instead of using 'trainer' as in the example below one could use an entire list of component names. This example is for GPUs but it works in the same way for memory requests.

def get_gpu():
    def _set_gpu_limit(container_op):    
       print(container_op.name)
       if container_op.name == 'trainer':
          container_op.set_gpu_limit(1)
       return _set_gpu_limit

pipeline_operator_funcs = kubeflow_dag_runner.get_default_pipeline_operator_funcs()
pipeline_operator_funcs.append(get_gpu())

runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=pipeline_operator_funcs,
   ...
)

axelborja · 2021-09-30T10:05:07Z

@chris-r-99 - Just seeing your answer. Thanks, we will take a look at it!

lre · 2021-11-26T12:27:27Z

@axelborja did you ever find a solution to do this with KubeflowV2DagRunnerConfig ?

axelborja · 2022-01-20T15:55:56Z

@lre not yet, have you? (btw sorry for the super late reply)

tanguycdls · 2022-01-26T14:26:08Z

Hello after investigation with @axelborja he discovered it works if you manually modify the json and modify the spec:

with open(ORGINAL_PIPELINE_FILE) as json_file:
    data = json.load(json_file)

for component_name, component_executor_spec in data["pipelineSpec"]["deploymentSpec"]["executors"].items():
    if component_name.startswith("the_name_of_the_component_you_wish_to_extend"):
        component_executor_spec["container"]["resources"] = {
              "cpuLimit": 16.0,
              "memoryLimit": 64.0
            }

and then give that modified json to vertex ai pipeline, when started it will use a correct instance that respects those conditions.

Those could have been set here:

tfx/tfx/orchestration/kubeflow/v2/step_builder.py

Line 403 in 7a0a2ce

result = ContainerSpec(

we can see the param name here:
https://github.com/kubeflow/pipelines/blob/7d5690a21cf8e8c464a6ddba520879bd30fd2ddc/api/v2alpha1/pipeline_spec.proto#L638

if someone from Vertex AI could confirm this workaround is valid ? thanks

@chongkong i think you wrote that code do you think we could allow to set those directly when creating the step ?

thanks a lot !

murthy-varuns · 2022-12-09T22:42:28Z

This should be available now via this commit:
b15d592.

singhniraj08 · 2023-01-05T14:13:02Z

@easadler,

This commit b15d592 should enable you custom resource-setting (vCPU and RAM) for containers orchestrating
on Vertex AI. Kindly let us know if this helps your use-case. Thank you!

github-actions · 2023-04-06T01:52:36Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions · 2023-04-14T01:51:21Z

This issue was closed due to lack of activity after being marked stale for past 7 days.

axeltidemann · 2023-06-01T13:04:55Z

This should be available now via this commit:
b15d592.

@vrooomurthy, the commit was never pushed to the master branch. Any plans to create a pull request?

axeltidemann · 2023-06-02T07:53:57Z

It turns out you can specify CPU and RAM on custom components on Vertex like so:

from kfp.pipeline_spec import pipeline_spec_pb2 as pipeline_pb2

my_component = MyComponent().with_platform_config(
        pipeline_pb2.PipelineDeploymentConfig.PipelineContainerSpec
        .ResourceSpec(cpu_limit=2.0, memory_limit=4.0))

However, the ideal would be to set the machine type directly. From initial experiments it seems like Vertex finds the closest e2 configuration that matches the requirements.

easadler added the type:feature label Feb 3, 2021

arghyaganguly assigned arghyaganguly and rmothukuru and unassigned arghyaganguly Feb 4, 2021

arghyaganguly added the stat:awaiting tensorflower label Feb 4, 2021

rmothukuru assigned ruoyu90 and unassigned rmothukuru Feb 4, 2021

ConverJens mentioned this issue Aug 25, 2022

How to set environment variables for pipeline stages #5049

Closed

singhniraj08 self-assigned this Jan 5, 2023

singhniraj08 added stat:awaiting response and removed stat:awaiting tensorflower labels Jan 5, 2023

github-actions bot added the stale label Apr 6, 2023

github-actions bot closed this as completed Apr 14, 2023

singhniraj08 mentioned this issue Aug 10, 2023

with_platform_config ignored for ImportExampleGen for KubeflowV2DagRunner #6146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request cpu / memory for containers in Kubeflow #3194

Request cpu / memory for containers in Kubeflow #3194

easadler commented Feb 3, 2021

ConverJens commented Feb 16, 2021 •

edited

Loading

easadler commented Feb 17, 2021

vaskozl commented Mar 15, 2021

ConverJens commented Mar 19, 2021 •

edited

Loading

SuperCorks commented Jun 28, 2021

axelborja commented Jul 19, 2021 •

edited

Loading

axelborja commented Jul 19, 2021 •

edited

Loading

chris-r-99 commented Aug 18, 2021 •

edited

Loading

axelborja commented Sep 30, 2021

lre commented Nov 26, 2021

axelborja commented Jan 20, 2022 •

edited

Loading

tanguycdls commented Jan 26, 2022 •

edited

Loading

murthy-varuns commented Dec 9, 2022

singhniraj08 commented Jan 5, 2023

github-actions bot commented Apr 6, 2023

github-actions bot commented Apr 14, 2023

axeltidemann commented Jun 1, 2023

axeltidemann commented Jun 2, 2023

Request cpu / memory for containers in Kubeflow #3194

Request cpu / memory for containers in Kubeflow #3194

Comments

easadler commented Feb 3, 2021

ConverJens commented Feb 16, 2021 • edited Loading

easadler commented Feb 17, 2021

vaskozl commented Mar 15, 2021

ConverJens commented Mar 19, 2021 • edited Loading

SuperCorks commented Jun 28, 2021

axelborja commented Jul 19, 2021 • edited Loading

axelborja commented Jul 19, 2021 • edited Loading

chris-r-99 commented Aug 18, 2021 • edited Loading

axelborja commented Sep 30, 2021

lre commented Nov 26, 2021

axelborja commented Jan 20, 2022 • edited Loading

tanguycdls commented Jan 26, 2022 • edited Loading

murthy-varuns commented Dec 9, 2022

singhniraj08 commented Jan 5, 2023

github-actions bot commented Apr 6, 2023

github-actions bot commented Apr 14, 2023

axeltidemann commented Jun 1, 2023

axeltidemann commented Jun 2, 2023

ConverJens commented Feb 16, 2021 •

edited

Loading

ConverJens commented Mar 19, 2021 •

edited

Loading

axelborja commented Jul 19, 2021 •

edited

Loading

axelborja commented Jul 19, 2021 •

edited

Loading

chris-r-99 commented Aug 18, 2021 •

edited

Loading

axelborja commented Jan 20, 2022 •

edited

Loading

tanguycdls commented Jan 26, 2022 •

edited

Loading