Skip to content

Conversation

@Fiona-Waters
Copy link

Description of Changes

Added 3 user-guides for local process, docker and podman local mode backends. This PR should not be merged until after kubeflow/sdk#119 is merged.

Related Issues

Closes: #

Related: # kubeflow/sdk#95 (PR) The details in this PR have been included in this PR as per this comment.

Checklist

@google-oss-prow google-oss-prow bot added do-not-merge/work-in-progress area/trainer AREA: Kubeflow Trainer / Kubeflow Training Operator labels Oct 24, 2025
@google-oss-prow
Copy link

Hi @Fiona-Waters. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

🚫 This command cannot be processed. Only organization members or owners can use the commands.

Copy link
Member

@Arhell Arhell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@astefanutti
Copy link
Contributor

Thanks @Fiona-Waters awesome work, very useful!

/lgtm

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fiona-Waters Thanks for this awesome work, I will take a look soon!
Shall we merge it once the Container PR is complete: kubeflow/sdk#119?
/hold

@Fiona-Waters
Copy link
Author

@Fiona-Waters Thanks for this awesome work, I will take a look soon! Shall we merge it once the Container PR is complete: kubeflow/sdk#119? /hold

@andreyvelich yes this can be merged after the Container PR. I will have a PR in trainer repo for examples soon too, which can also be merged after. Thanks

Signed-off-by: Fiona Waters <[email protected]>

Co-authored-by: Saad Zaher <[email protected]>
@Fiona-Waters Fiona-Waters changed the title [WIP] Trainer: Adding local mode guides Trainer: Adding local mode guides Nov 3, 2025
@astefanutti
Copy link
Contributor

Thanks @Fiona-Waters!

/lgtm
/assign @andreyvelich

@google-oss-prow google-oss-prow bot added the lgtm label Nov 3, 2025
@andreyvelich andreyvelich changed the title Trainer: Adding local mode guides trainer: Adding local mode guides Nov 6, 2025
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @Fiona-Waters!
Overall looks great, I would move common sections to the overview page to reduce duplicated content.
cc @kubeflow/kubeflow-sdk-team @kubeflow/kubeflow-trainer-team

rm -rf /tmp/a1b2c3d4e5f_xyz/
```

## Architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to re-draw it as a diagram in https://www.drawio.com/
we can do it later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll remove this section for now and create a follow on issue to create and add a proper diagram. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Fiona Waters <[email protected]>
@google-oss-prow google-oss-prow bot removed the lgtm label Nov 7, 2025
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andreyvelich. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Fiona-Waters
Copy link
Author

@andreyvelich thanks for the review - I've addressed your feedback, please take a look. Thanks!

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great effort @Fiona-Waters!
/lgtm
/assign @kramaranya @szaher @astefanutti

Comment on lines +47 to +48
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
from kubeflow.trainer import CustomTrainer, TrainerClient, ContainerBackendConfig

- **Python 3.9+**
- **Kubeflow SDK**: Install with Docker support:
```bash
pip install kubeflow[docker]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install kubeflow[docker]
pip install "kubeflow[docker]"

To avoid some issues with shell expansion like with ZSH.

Comment on lines +150 to +151
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
from kubeflow.trainer import CustomTrainer, TrainerClient, ContainerBackendConfig

Comment on lines +45 to +46
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.localprocess import LocalProcessBackendConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.localprocess import LocalProcessBackendConfig
from kubeflow.trainer import CustomTrainer, TrainerClient, LocalProcessBackendConfig

**Example:**

```python
from kubeflow.trainer.backends.localprocess import LocalProcessBackendConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from kubeflow.trainer.backends.localprocess import LocalProcessBackendConfig
from kubeflow.trainer import LocalProcessBackendConfig

- **Python 3.9+**
- **Kubeflow SDK**: Install with Podman support:
```bash
pip install kubeflow[podman]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install kubeflow[podman]
pip install "kubeflow[podman]"


```bash
# Install Podman
brew install podman
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want follow the official recommendation:

Though not recommended, Podman can also be obtained through Homebrew
Also maybe linking to the official installation instructions would be simpler.

Comment on lines +85 to +86
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
from kubeflow.trainer import CustomTrainer, TrainerClient, ContainerBackendConfig

Comment on lines +207 to +208
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from kubeflow.trainer import CustomTrainer, TrainerClient
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
from kubeflow.trainer import CustomTrainer, TrainerClient, ContainerBackendConfig

Comment on lines +195 to +199
```python
backend_config = ContainerBackendConfig(
container_runtime="podman",
auto_remove=False # Containers remain after job completion
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"```" is missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/trainer AREA: Kubeflow Trainer / Kubeflow Training Operator lgtm ok-to-test size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants