-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backend] Cannot get MLMD objects from Metadata store. Cannot find context (Version 1.9.0-rc.2) #2800
Comments
Hello, this should be fixed in the 1.9 branch and the final release today. Please try again with the current v1.9 branch, not the rc.2 tag, and reopen if it still occurs with "/reopen". |
Thanks @juliusvonkohout. Yes, the issue is fixed in v1.9 stable release. |
/reopen @Jithsaavvy, Did you solve this issue? env : And I can see following message when I run the command "kubectl logs -n kubeflow metadata-grpc-deployment-c568bd446-zpptp" WARNING: Logging before InitGoogleLogging() is written to STDERR Do you know about it? |
@nparkstar is this a pipelines or manifests issue? It looks like pipelines only. |
Jithsaavvy said "Yes, the issue is fixed in v1.9 stable release." |
@nparkstar you probably need a new issue. Please test with a fresh kind cluster as described in our readme, to make sure that it is not specific to your Kubernetes cluster first. |
|
I solved my issue at last. But I think the following instruction is wrong. I cannot connect to the central dashboard after installing kubeflow according to the above instruction. I succeeded after installing using bellow command. while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done Thanks, |
Then please create a PR to fix it. |
This is my first time to create PR. Thanks, |
@juliusvonkohout I also get the same error when trying to deploy all components |
Validation Checklist
Version
master
Describe your issue
Environment
v1.9.0-rc.2
v1.27.4
v2.2.0
v2.8.0
master
branch.v1.9.0-rc.2
version was deployed (manually checked with every deployed component versions using release notes).Description
Kubeflow was upgraded from v1.8.1 stable to v1.9.0-rc.2 (since no stable release is available yet for v1.9 ) to use the latest KFP v2.2.0, as a clean redeployment. When attempted to run a pipeline via the UI, it resulted in the following error:
The same pipeline executed successfully without any issues or errors in the previous stable Kubeflow
v1.8.1
. As a sanity check, a sample pipeline from the documentation and an existing tutorial pipeline available within the KFP UI were also attempted to run, both resulting in the above error.Upon inspection of the embedded MySQL pod, the pipeline context record was created in the
mlpipeline
database as the following:mlpipeline table
run_details table
However, the execution-run context for the same pipeline was not created or referenced in the
metadb
database. Analysis of the pods from the kubeflow namespace revealed that theml-pipeline-api-server
container within the ml-pipeline pod uses themlpipeline
database as backend storage for the pipeline component and themetadata-controller
pod uses themetadb
database as backend storage for the MLMD store. It appears that metadb cannot find or access the pipeline context record from the mlpipeline db or something similar. The connection to the MySQL-db pod is strong and the respectivepvc
is mounted, available and accessible.Note: The above description applies to any pipeline.
Expected Behavior
Pipeline run should succeed without any issues.
Current Behavior
When a pipeline run is triggered from the UI, a
system-dag-driver
pod is created in the KF user namespace and runs to completion successfully. After that, the KFP execution pod is created with respect to the pipeline components and fails immediately, resulting in the above error.Steps to reproduce the issue
v1.9.0-rc.2
using Kubeflow Manifests.Additional Context
v1.9.0-rc.2
's release notes states that it supports Kubernetes v1.27 - 1.29. But, the README from this particular RC's release tag states that it targets Kubernetes v1.29+ which is a little confusing. My questions are:Related Issues
Put here any screenshots or videos (optional)
No response
The text was updated successfully, but these errors were encountered: