Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributions and Kubeflow 1.6 release #2221

Closed
annajung opened this issue Jun 13, 2022 · 44 comments
Closed

Distributions and Kubeflow 1.6 release #2221

annajung opened this issue Jun 13, 2022 · 44 comments

Comments

@annajung
Copy link
Member

annajung commented Jun 13, 2022

The goal of this issue is to track the progress of distributions alongside the 1.6 release

While we hope all distros will manage to be ready when the KF 1.6 release is out, this is sometimes impossible to achieve. In this issue, we want to both keep track of the progress of distributions toward the KF 1.6 release and also which of the distros will be working on KF 1.6 (testing during the distribution testing cycle) even if they can't meet the KF 1.6 deadline.

Tagging distribution owners identified in the kubeflow/community#560 (Any new or missed distro owners, please comment on the issue to be tracked with the 1.6 release)

Distribution Representatives State
Arrikto EKF @kimwnasptd (stretch goal) participating in 1.6
Arrikto MiniKF @kimwnasptd (stretch goal) participating in 1.6
AWS @surajkota helping with testing in 1.6
Charmed Kubeflow @DomFleischmann participating in 1.6
Google Cloud @zijianjoy @gkcalat participating in 1.6
IBM @yhwang participating in 1.6
Nutanix @johnugeorge participating in 1.6
Kubeflow with Argo CD @davidspek
Openshift @VaishnaviHire @LaVLaS participating in 1.6
Oracle Cloud Infrastructure @julioo participating in 1.6

Please let us know if you'll be participating in the 1.6 release by answering the following questions:

  • Are you planning on having your distro ready in sync with the KF 1.6 release?
  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
  • If you cannot participate, when can the community expect your distro to be ready for release 1.6?

[Update on June 14th] Distribution testing is scheduled to take place from July 20th to August 10th
Note: After the 2 weeks delay, distribution testing is now scheduled to take place from July 6th to July 27th (ref kubeflow/community#561)

cc @kubeflow/release-team @jbottum

@yhwang
Copy link
Member

yhwang commented Jun 13, 2022

For IBM IKS,

  • Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@johnugeorge
Copy link
Member

For Nutanix Karbon,

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@surajkota
Copy link
Contributor

For AWS,

Are you planning on having your distro ready in sync with the KF 1.6 release?

TBD. If not in sync, we will follow up

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@DomFleischmann
Copy link
Contributor

For Canonical's Charmed Kubeflow

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@annajung
Copy link
Member Author

Hi distribution owners! After checking with all WGs, the release team has decided to extend the all release deadline by 2 more weeks.

Email announcement: https://groups.google.com/g/kubeflow-discuss/c/I4l97HvrGEA/m/227aCe_mCgAJ
New schedule PR: kubeflow/community#562

Distribution testing is now scheduled to take place from July 20th to August 10th

@LaVLaS
Copy link
Contributor

LaVLaS commented Jun 22, 2022

For OpenShift,

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@zijianjoy
Copy link
Contributor

For Google Cloud

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@julioo
Copy link

julioo commented Jun 27, 2022

For Oracle Cloud Infrastructure

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@zijianjoy
Copy link
Contributor

cc @gkcalat for working on Kubeflow on Google Cloud release.

@kimwnasptd
Copy link
Member

A little bit late to the party, but tfr Arrikto EKF, MiniKF

Are you planning on having your distro ready in sync with the KF 1.6 release?

It will be a stretch, but this will be our goal.

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

@kimwnasptd
Copy link
Member

Also heads up to everyone for the following items from Notebooks and Manifests WG:

  1. Status with K8s 1.22 and Notebooks Notebook WG and Kubeflow 1.6 release #2199 (comment)
  2. We are targeting to use Istio 1.14, instead of 1.13 Manifests WG and Kubeflow 1.6 release #2200 (comment)
  3. We are targeting on Knative 1.4 Kubeflow 1.6 Dependency Versions #2207 (comment)

We'll also be on the lookout during the feature freeze for any bug that could occur from any of the above updates, but we are confident there won't be any major issues. But of course don't hesitate to report and ping is you bump into anything undexpected!

@annajung
Copy link
Member Author

Hi Distribution owners, sorry for the delay in providing you with a new RC to test with.

There was a bug identified for Notebooks WG and they're currently working on providing the release team with a new release to be used to cut a new 1.6 RC.

We hope to have the 1.6 RC1 that contains the fix for the bug identified available for you soon. Once the new RC is available, I'll leave an update here and send out an announcement to kubeflow-discuss.

If you want to get started with testing, please note the issue with Jupyter web app.

In addition, here are the PRs that would be included in the new RC.

@annajung
Copy link
Member Author

Hi Distribution owner, providing you with another update on the RC.

As discussed in the release team meeting today (July 25th), we hope to have a new RC available for everyone early this week. We are waiting for this PR to merge as it aims to address the problem with building images using GitHub actions kubeflow/kubeflow#6591, and once a new notebook release is available, then a PR needs to be created against the manifest repo.

The release team would like to stick with the current schedule and keep the distribution testing till August 10th as planned. However, with the delay in getting the new RC out, we also would like to gather your feedback on the current timeline and if you think it would be necessary to delay the release to increase the time for distribution testing. If you have any concerns with the current release timeline, please reach out soon to ensure your concerns are reviewed in advance before the end of distribution testing.

@annajung
Copy link
Member Author

Kubeflow v1.6.0-rc.1 is now available!

@annajung
Copy link
Member Author

annajung commented Aug 4, 2022

Hi Distribution owners, friendly reminder to share any issues you ran into when testing and to update the kubeflow distribution docs

Distribution testing and Doc updates are both scheduled to end on Wed Aug 10th 2022.

@johnugeorge
Copy link
Member

@annajung In the last community meeting, there was a discussion to extend by one extra week

@surajkota
Copy link
Contributor

surajkota commented Aug 4, 2022

Testing in progress from AWS side, no new issues so far. Will post an update by early next week. @annajung when do we expect the final RC to be out? Couldnt get clear idea from Community meeting notes

awslabs/kubeflow-manifests#309

@gkcalat
Copy link
Member

gkcalat commented Aug 4, 2022

Testing on GCP. We are observing profiles-deployment crashing. Could it be related to #2263? Has anyone else experienced it? Besides, we need the latest changes to contrib/metacontroller to be included in 1.6.0. They were not included in v1.6.0-rc.1.
Thank you!

@julioo
Copy link

julioo commented Aug 5, 2022

Testing on OCI. Will report status early next week.
Inspired by IBM#47

@annajung
Copy link
Member Author

annajung commented Aug 9, 2022

@annajung In the last community meeting, there was a discussion to extend by one extra week

Thanks @johnugeorge for raising this! I was not able to attend the last community meeting, but other release team members did inform me that distribution owners who were present in the meeting asked for an extension.

In addition to that, during the August 8th release team meeting, the release team discussed the following issues identified based on issues/comments mentioned in the distribution tracking and release tracking

Based on the extension request and a need for a new RC, we are working on a new release timeline to provide to the community. We contacted the Notebook WG to determine if the issues identified are release blocking issues and if they will be providing another RC for the release.

Unless there are release blocking issues, we'll stick to the date that was agreed on during the community meeting last week which is August 17th for distribution testing to end.

I plan to send out an official announcement to kubeflow-discuss about the new timeline after catching up with the notebook WG or before the 10th whichever comes first.

@annajung
Copy link
Member Author

Hi everyone, I owe an update here - will be sending out a message on kubeflow-discuss today as well.

After catching up with notebook WG and investigating the three issues identified, here is where we are now.

  1. Missing Notebook image group
  • This has been identified as a release-blocking issue. There is a PR open that might fix the issue. However, even if we get this merged, the lead who can provide the release team with a new Notebook RC is not available until the week of Aug 22nd.
  1. Duplicate liveness probe in Notebook controller manager
  1. Metacontroller update not included in the RC 1
  • This PR adds the metacontroller into the /contrib directory which does not get used by default in the pipeline installation. By default, the metacontroller from /third-party is used and it already has the update that was made to the /contrib. This means that the current RC already contains this change, therefore, there are no changes to any functionality. I reached out to the pipeline team to get their feedback. Until we hear otherwise, the release team has labeled this as non-blocking-issue and has no plans to cut a new RC for this change.

Overall,

  • There is no plan for a new RC until the fix for the notebook issue (1) is available
  • We will be extending the release until the blocking issue is resolved, the new release date is TBD, will propose a date to the community for feedback
  • I do not believe cutting a new RC with a notebook issue fixed will have a huge impact on distributions. As of now, the plan is only to include the fix for the Notebook image group unless other release-blocking issues are identified
  • Please keep providing issues you have identified while testing
  • With the extra time, don't forget to update the kubeflow distribution docs as well

Thanks everyone!

cc @kubeflow/release-team

@annajung
Copy link
Member Author

The official announcement for the release delay has been sent to kubeflow-discuss mailing list. The proposed timeline PR is also available if distribution owners would like to provide your feedback.

The proposed timeline moves the distribution testing end date to August 31st

@DnPlas
Copy link
Contributor

DnPlas commented Aug 11, 2022

@ryansteakley
Copy link

ryansteakley commented Aug 20, 2022

Hello, here is an issue the AWS team has found on v1.6.0-rc.1. Currently we consider this a release blocker as this is feature-regression. We are currently looking into it, any help from the community would be welcome and appreciated.

@julioo
Copy link

julioo commented Aug 23, 2022

Hello, I successfully installed KF v1.6.0-rc.1 on Oracle Infrastructure (OKE 1.22.5, 1.23.4 and 1.24.1).

  • Exposed KF dashboard using LB and Istio gateway.
  • Run Demo pipeline like [Demo] XGBoost - Iterative model training and [Tutorial] Data passing in python components.
  • Created Notebook through web UI with success.
    • One problem with Mnist E2E Vanilla demo but related to ipykernel/iostream.py version will create an issue to document.

I am waiting to test the next RC and to share the Github page with OCI documentation.

@surajkota
Copy link
Contributor

Created kubeflow/kubeflow#6624 to address kubeflow/kubeflow#6618

@julioo
Copy link

julioo commented Aug 24, 2022

  • One problem with Mnist E2E Vanilla demo but related to ipykernel/iostream.py version will create an issue to document.
    Created kubeflow/examples/issues/993 to document the issue

@surajkota
Copy link
Contributor

surajkota commented Aug 25, 2022

@kimwnasptd @yuzisun It would be great to consider this PR kubeflow/kubeflow#6627 for this release. Details in the issue

@annajung
Copy link
Member Author

Hi distribution owners, new notebook RC with the fix for the image group issue is planned to be cut by upcoming Tuesday.

The new RC might include more than the fix for the image group fix. It may include the fix for the profiler issue kubeflow/kubeflow#6618 as well, hope notebook wg lead @kimwnasptd can share more.

As for other issues that were raised, none of them have been labeled as blocker issues from the WG leads so far. Therefore, not being tracked as release blocking issue for this release.

Please don't forget to keep your distribution docs updated by making updates to the following docs before the docs deadline EOD Aug 31st.

@VaishnaviHire
Copy link

Hello, successfully installed KF v1.6.0-rc.1 on OCP 4.9. The ongoing testing issue can be tracked here - opendatahub-io#99.

@johnugeorge
Copy link
Member

1.6.rc1 is tested against Nutanix NKE 2.5.0 with k8s 1.22
Ref: nutanix/karbon-platform-services#94

@annajung
Copy link
Member Author

Hello distribution owners,

For those who missed the Aug 30th community meeting, please check out the discussion in the meeting notes / recording.

TL;DR

  • Based on the feedback from WG leads, distribution owners, and community members, we have decided to continue distribution testing and docs updates until the day before the release to accommodate for the delay in the Kubeflow RC2.
  • Kubeflow 1.6 RC2 is still in progress. We were hoping to have it today but ran into an issue with Notebook images (release: Images for the v1.6.0-rc.2 tag kubeflow#6631) which requires Notebook WG to take a look.
  • As for docs, you can continue updating the docs until the day before the release. The plan is to cut the kubeflow/website release branch on the release day to give everyone until the last day to update the docs.

More details can be found in the announcement: https://groups.google.com/g/kubeflow-discuss/c/TNwRfoq3Pk4/m/r5aIGS2XBAAJ

@annajung
Copy link
Member Author

annajung commented Sep 1, 2022

Hi Distribution owners,

Kubeflow RC.2 is now available!

Announcement: https://groups.google.com/g/kubeflow-discuss/c/S79XhJYIkC8/m/GBvvWasoBQAJ

@gkcalat
Copy link
Member

gkcalat commented Sep 2, 2022

@annajung deployment of the rc.2 on GCP fails because of kubeflow/kubeflow#6604.

@annajung
Copy link
Member Author

annajung commented Sep 2, 2022

Hey @gkcalat, manifest WG only supports Kustomize 3.2.0 and any other version especially Kustomize 4.x is not supported. My understanding is that they are waiting for #1797 (comment) to be resolved before being able to support Kustomize 4.x

kubeflow/kubeflow#6604 is a bug but it's a non-issue for those using Kustomize 3.2.0 and was not given priority or labeled as release-blocking due to those reasons.

tagging Manifest WG to chime in more if needed @kubeflow/wg-manifests-leads @kimwnasptd

@gkcalat
Copy link
Member

gkcalat commented Sep 2, 2022

Hey @gkcalat, manifest WG only supports Kustomize 3.2.0 and any other version especially Kustomize 4.x is not supported. My understanding is that they are waiting for #1797 (comment) to be resolved before being able to support Kustomize 4.x

kubeflow/kubeflow#6604 is a bug but it's a non-issue for those using Kustomize 3.2.0 and was not given priority or labeled as release-blocking due to those reasons.

tagging Manifest WG to chime in more if needed @kubeflow/wg-manifests-leads @kimwnasptd

Thank you.

Are we going to leave users who used newer kustomize versions outside the boat? Are we going to ask users who have KF 1.5 to downgrade kustomize?

That comment on kustomize 4.x support is a year old. The issue I mentioned is the only blocker that was introduced in a recent PR. Why don't we resolve that small bug (literally a few lines in a single file) and move forward with the release?

@annajung
Copy link
Member Author

annajung commented Sep 2, 2022

Hi everyone, it looks like Notebook WG was already looking at the PR and have finished testing the issue. Therefore, they were available to provide a new Notebook RC.3 that includes kubeflow/kubeflow#6604.

With that, we create a new kubeflow RC.3. It updates notebook version to RC3 which only includes one fix https://github.com/kubeflow/manifests/releases/tag/v1.6.0-rc.3

Thanks @kimwnasptd for your help today!

@yhwang
Copy link
Member

yhwang commented Sep 2, 2022

finished RC.3 testing on IKS distribution testing, all good!
Actually, I tested against RC.2 and the deployment yaml file generated from kustomize for RC.2 and RC.3 are identical. So consider that I finished RC.3 testing!

Thanks to the release team's effort and making the RC.3 being available.

@gkcalat
Copy link
Member

gkcalat commented Sep 7, 2022

Looking good on GCP. Kudos to @annajung and @kimwnasptd!

@juliusvonkohout
Copy link
Member

Please beware of two severe regressions in 1.6. Hopefully they will be fixed in 1.6.1.

  1. [bug] Pipeline metrics Invalid input error: Unknown execution spec pipelines#8256
  2. Culling prevents re-starting culled Notebooks  kubeflow#6648

@annajung
Copy link
Member Author

@annajung
Copy link
Member Author

annajung commented Oct 3, 2022

Hi everyone, @surajkota brought up a really good point about pipelines alpha.5, therefore, I want to give the community a few more days to test against Kubeflow 1.6.1-rc.0 before cutting the final release.

The release team previously mentioned that we'll have a final 1.6.1 cut available today Oct 3rd, but we will delay cutting the final release until this next Monday, October 10th.

Please help test against the 1.6.1.-rc.0 and bring up any issues identified in the 1.6 tracking issue

@annajung
Copy link
Member Author

Hi everyone, with no other issues identified, we went ahead and cut the final 1.6.1 release!

If using 1.6.1, please note that you also need to update https://www.kubeflow.org/docs/started/installing-kubeflow/

More info about the 1.6.1 release can be found in kubeflow-discuss announcement: https://groups.google.com/g/kubeflow-discuss/c/amsxyXbY_nk/m/FaWxOd4VBAAJ

@annajung
Copy link
Member Author

With 1.6.1 released and 1.7 release started, closing out the issue.

FYI, Call for distribution participating for 1.7 is also available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests