Skip to content

Conversation

@Fiona-Waters
Copy link
Contributor

@Fiona-Waters Fiona-Waters commented Oct 29, 2025

What this PR does / why we need it:
This PR adds 2 example notebooks, one for local process and one for container backend with podman/docker - both of them running single node training.
I have also updated the existing image-classification example to use the local trainer client.
I've added the 2 new notebooks to the e2e tests (not tested)
Also I have included cell output but in local-training-mnist.ipynb I think that the logs are too long to include and should be removed.

This PR is related.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes ##2868

Checklist:

  • Docs included if any changes are user facing

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Fiona-Waters Fiona-Waters changed the title Adding local execution example notebook feat: Adding local execution example notebook Oct 29, 2025
@Fiona-Waters
Copy link
Contributor Author

@astefanutti
Copy link
Contributor

Thanks @Fiona-Waters this is awesome!

/lgtm

Copy link
Contributor

@kramaranya kramaranya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @Fiona-Waters!
Could you please run this notebook and keep the output of the cells?

Comment on lines 49 to 50
"# %pip install -U kubeflow[docker] # For Docker\n",
"# %pip install -U kubeflow[podman] # For Podman"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using ! as in other examples?

Suggested change
"# %pip install -U kubeflow[docker] # For Docker\n",
"# %pip install -U kubeflow[podman] # For Podman"
"# !pip install -U kubeflow[docker] # For Docker\n",
"# !pip install -U kubeflow[podman] # For Podman"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fiona-Waters
Copy link
Contributor Author

Thank you, @Fiona-Waters! Could you please run this notebook and keep the output of the cells?

In my humble opinion, I think that including the output of the cells creates quite a cluttered example and it's not necessary - but I am open to other opinions here. @astefanutti @andreyvelich

@astefanutti
Copy link
Contributor

Thank you, @Fiona-Waters! Could you please run this notebook and keep the output of the cells?

In my humble opinion, I think that including the output of the cells creates quite a cluttered example and it's not necessary - but I am open to other opinions here. @astefanutti @andreyvelich

Yes, IMHO that's a matter of compromise and finding the right balance. I agree outputting entire log streams can clutter the examples, but when kept concise those outputs can add useful info when the notebooks are also used as documentation (not executed).

@Fiona-Waters
Copy link
Contributor Author

Thank you, @Fiona-Waters! Could you please run this notebook and keep the output of the cells?

In my humble opinion, I think that including the output of the cells creates quite a cluttered example and it's not necessary - but I am open to other opinions here. @astefanutti @andreyvelich

Yes, IMHO that's a matter of compromise and finding the right balance. I agree outputting entire log streams can clutter the examples, but when kept concise those outputs can add useful info when the notebooks are also used as documentation (not executed).

Agreed, will update and try to find the right balance.

@kramaranya
Copy link
Contributor

Agreed, will update and try to find the right balance.

I'd suggest aligning with other examples, for example https://github.com/kubeflow/trainer/blob/master/examples/pytorch/image-classification/mnist.ipynb

@coveralls
Copy link

coveralls commented Nov 4, 2025

Pull Request Test Coverage Report for Build 19111473416

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 51.477%

Totals Coverage Status
Change from base Build 19074062868: 0.0%
Covered Lines: 1237
Relevant Lines: 2403

💛 - Coveralls

@Fiona-Waters Fiona-Waters force-pushed the add-local-example branch 2 times, most recently from 7908133 to 03b8791 Compare November 5, 2025 17:17
@google-oss-prow google-oss-prow bot added size/XXL and removed size/XL labels Nov 5, 2025
@Fiona-Waters
Copy link
Contributor Author

@andreyvelich @kramaranya @astefanutti I've updated the PR to include 2 notebooks - one for local process and one for container backend - both single node. I also updated the image-classification example as Andrey requested.
I have added both notebooks to the existing e2e workflow - I have not tested that this works.
Also I have included cell output but in local-training-mnist.ipynb I think that the logs are too long to include and should be removed.
Please review, thank you!

@andreyvelich
Copy link
Member

/ok-to-test

Co-authored-by Brian Gallagher <[email protected]>

Signed-off-by: Fiona Waters <[email protected]>
@andreyvelich
Copy link
Member

/rerun

@Fiona-Waters
Copy link
Contributor Author

Tests are failing because the options PR has been merged in the sdk and the container backend doesn't include the param. Need to drop off for a while but will try to update the sdk later this evening and then we can re-run - unless someone else has time to do it before me. @andreyvelich

@andreyvelich
Copy link
Member

/retest

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to be working 🎉
Feel free to unhold when you are ready.
It would be also good to add those Notebooks to the SDK E2Es: https://github.com/kubeflow/sdk/blob/main/.github/workflows/test-e2e.yaml#L57-L65

/lgtm
/approve
/hold
/cc @astefanutti @kramaranya

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kramaranya
Copy link
Contributor

Thank you, @Fiona-Waters!
/lgtm

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@google-oss-prow google-oss-prow bot merged commit 803ca66 into kubeflow:master Nov 6, 2025
46 of 55 checks passed
@google-oss-prow google-oss-prow bot added this to the v2.1 milestone Nov 6, 2025
@andreyvelich
Copy link
Member

/cherry-pick release-2.1

@google-oss-robot
Copy link

@andreyvelich: new pull request created: #2924

In response to this:

/cherry-pick release-2.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants