Skip to content

Conversation

@mprahl
Copy link
Collaborator

@mprahl mprahl commented Aug 26, 2025

Description of your changes:

This PR introduces a powerful new feature that allows KFP tasks to forward their task configuration (resources, environment variables, node selectors, tolerations, affinity, volumes) to external workloads like Kubeflow TrainJobs, rather than applying them only to the task pod itself. This is particularly valuable for scenarios where the KFP task is launching another Kubernetes workload (e.g. TrainJob, Job, etc).

Checklist:

@google-oss-prow
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mprahl mprahl force-pushed the trainer-integration branch 4 times, most recently from 8f56491 to 8daf9a9 Compare August 28, 2025 00:19
@mprahl mprahl marked this pull request as ready for review August 28, 2025 01:02
@google-oss-prow google-oss-prow bot requested review from gmfrasca and rimolive August 28, 2025 01:02
@mprahl
Copy link
Collaborator Author

mprahl commented Aug 28, 2025

/cc @HumairAK

@google-oss-prow google-oss-prow bot requested a review from HumairAK August 28, 2025 01:03
@mprahl mprahl changed the title WIP: feat(sdk/backend): Support forwarding Kubernetes task configuration to external workloads feat(sdk/backend): Support forwarding Kubernetes task configuration to external workloads Aug 28, 2025
@mprahl mprahl force-pushed the trainer-integration branch 9 times, most recently from d7b4db5 to a56f03f Compare September 4, 2025 13:52
@mprahl mprahl force-pushed the trainer-integration branch 2 times, most recently from 5fee341 to ed52a40 Compare September 4, 2025 16:56
@mprahl mprahl marked this pull request as ready for review September 4, 2025 16:57
// Throwaway default value.
NONE = 0;
// Indicates that the resource limits and requests should be passed through to the external workload.
// Be cautious about also setting apply_to_task=true since that may double apply the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Be cautious about also setting apply_to_task=true since that may double apply the
// Be cautious about also setting apply_to_task=true since that will double apply

pip_trusted_hosts: Optional[List[str]] = None,
use_venv: bool = False,
additional_funcs: Optional[List[Callable]] = None,
task_config_passthroughs: Optional[List[Union[TaskConfigPassthrough,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing this parameter in the docstring

Comment on lines 680 to 683
CpuLimit: 2.0,
MemoryLimit: 1.5,
CpuRequest: 1.0,
MemoryRequest: 0.65,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the non-deprecated fields?

Introduce dsl.TaskConfig to capture task-level resource settings for
pass-through to externally launched workloads (e.g., trainers).

The driver now updates container execution and Kubernetes handling to inject
the Kubernetes config into executor inputs without applying disallowed fields
to the task pod.

Signed-off-by: mprahl <[email protected]>
@mprahl mprahl force-pushed the trainer-integration branch from ed52a40 to 5393888 Compare September 4, 2025 20:56
@mprahl mprahl requested a review from HumairAK September 4, 2025 20:57
@HumairAK
Copy link
Collaborator

HumairAK commented Sep 4, 2025

/lgtm
/approve

niceee

@google-oss-prow google-oss-prow bot added the lgtm label Sep 4, 2025
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit a6f7fb6 into kubeflow:master Sep 4, 2025
102 checks passed
krishanbhasin-px pushed a commit to krishanbhasin-px/kubeflow-pipelines that referenced this pull request Sep 18, 2025
…workloads (kubeflow#12185)

* Add missing PULL_NUMBER env var to sample tests CI

Signed-off-by: mprahl <[email protected]>

* Use the local API package for the upgrade test

Signed-off-by: mprahl <[email protected]>

* Support forwarding task configuration to external workloads

Introduce dsl.TaskConfig to capture task-level resource settings for
pass-through to externally launched workloads (e.g., trainers).

The driver now updates container execution and Kubernetes handling to inject
the Kubernetes config into executor inputs without applying disallowed fields
to the task pod.

Signed-off-by: mprahl <[email protected]>

* Fix dependencies issue in Read The Docs build

Signed-off-by: mprahl <[email protected]>

---------

Signed-off-by: mprahl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants