[backend] Cannot get MLMD objects from Metadata store. Cannot find context (Version 1.9.0-rc.2) #2800

Jithsaavvy · 2024-07-15T17:55:00Z

Validation Checklist

Is this a Kubeflow issue?
Are you posting in the right repository ?
Did you follow the installation guide https://github.com/kubeflow/manifests?tab=readme-ov-file ?
Is the issue report properly structured and detailed with version numbers?
Is this for Kubeflow development ?
Would you like to work on this issue?
You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Version

master

Describe your issue

Environment

Kubeflow: v1.9.0-rc.2
Kubernetes: v1.27.4
Platform: On-premise Kubernetes cluster
Kubeflow Pipelines (KFP) - v2.2.0
KFP SDK - v2.8.0
OS: Ubuntu 22.04
Deployment:
- Using Kubeflow Manifests (without any specific distribution) from the master branch.
- v1.9.0-rc.2 version was deployed (manually checked with every deployed component versions using release notes).

Description

Kubeflow was upgraded from v1.8.1 stable to v1.9.0-rc.2 (since no stable release is available yet for v1.9 ) to use the latest KFP v2.2.0, as a clean redeployment. When attempted to run a pipeline via the UI, it resulted in the following error:

Cannot get MLMD objects from Metadata store. Cannot find context with {"typeName":"system.PipelineRun" "contextName":"cc1bbc51-426f-4192-843a-bf4b94535a5b"}: Cannot find specified context

The same pipeline executed successfully without any issues or errors in the previous stable Kubeflow v1.8.1. As a sanity check, a sample pipeline from the documentation and an existing tutorial pipeline available within the KFP UI were also attempted to run, both resulting in the above error.

Upon inspection of the embedded MySQL pod, the pipeline context record was created in the mlpipeline database as the following:

mlpipeline table

mysql> USE mlpipeline;
mysql> SELECT uuid, name, status from pipelines;
| uuid                                 | name                                | status |
|--------------------------------------|-------------------------------------|--------|
| 645a4823-8e01-432b-b6b1-75776d14c805 | [Tutorial] DSL - Control structures | READY  |

run_details table

mysql> SELECT uuid, displayname, pipelinecontextid, pipelineid, conditions from run_details;

| uuid                                 | displayname                                        | pipelinecontextid | pipelineid                           | conditions |
|--------------------------------------|----------------------------------------------------|-------------------|--------------------------------------|------------|
| cc1bbc51-426f-4192-843a-bf4b94535a5b | Run of [Tutorial] DSL - Control structures (be550) | 0                 | 645a4823-8e01-432b-b6b1-75776d14c805 | Failed     |

However, the execution-run context for the same pipeline was not created or referenced in the metadb database. Analysis of the pods from the kubeflow namespace revealed that the ml-pipeline-api-server container within the ml-pipeline pod uses the mlpipeline database as backend storage for the pipeline component and the metadata-controller pod uses the metadb database as backend storage for the MLMD store. It appears that metadb cannot find or access the pipeline context record from the mlpipeline db or something similar. The connection to the MySQL-db pod is strong and the respective pvc is mounted, available and accessible.

Note: The above description applies to any pipeline.

Expected Behavior

Pipeline run should succeed without any issues.

Current Behavior

When a pipeline run is triggered from the UI, a system-dag-driver pod is created in the KF user namespace and runs to completion successfully. After that, the KFP execution pod is created with respect to the pipeline components and fails immediately, resulting in the above error.

Steps to reproduce the issue

Install Kubeflow v1.9.0-rc.2 using Kubeflow Manifests.
Copy the pipeline code or use the already existing tutorial pipeline from the UI and create a run from it.

Additional Context

v1.9.0-rc.2's release notes states that it supports Kubernetes v1.27 - 1.29. But, the README from this particular RC's release tag states that it targets Kubernetes v1.29+ which is a little confusing. My questions are:

What are the supported Kubernetes versions for KF v1.9?
Is the above issue a known bug in this RC version which will be patched in v1.9 stable release?
Is anyone else impacted by this issue or are there any solutions available?

Related Issues

Put here any screenshots or videos (optional)

No response

The text was updated successfully, but these errors were encountered:

juliusvonkohout · 2024-07-22T11:19:00Z

Hello, this should be fixed in the 1.9 branch and the final release today. Please try again with the current v1.9 branch, not the rc.2 tag, and reopen if it still occurs with "/reopen".

Jithsaavvy · 2024-07-24T10:13:33Z

Thanks @juliusvonkohout. Yes, the issue is fixed in v1.9 stable release.

nparkstar · 2024-09-04T05:21:26Z

/reopen

@Jithsaavvy, Did you solve this issue?
I have the same issue still, though I tried with kubeflow v1.9.
I've got the message "Cannot get MLMD objects from Metadata store."

env :
OS : Ubuntu 22.04
Kubernetes : v1.27
Kubeflow : upgrade to 1.9 (using "git clone -b v1.9.0 https://github.com/kubeflow/manifests.git")
KFP : kfp 2.7.0 (using "kfp --version")

message on GUI

And I can see following message when I run the command "kubectl logs -n kubeflow metadata-grpc-deployment-c568bd446-zpptp"

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0716 08:00:27.548267 1 metadata_store_server_main.cc:577] Server listening on 0.0.0.0:8080
W0902 12:52:33.509635 10 metadata_store_service_impl.cc:239] GetContextType failed: No type found for query, name: system.Pipeline, version: nullopt
W0902 12:52:33.568145 11 metadata_store_service_impl.cc:239] GetContextType failed: No type found for query, name: system.PipelineRun, version: nullopt
E0903 13:59:45.840103283 10 hpack_parser.cc:1216] Error parsing metadata: error=invalid value key=:method value=HEAD
E0903 14:49:58.423987147 11 hpack_parser.cc:1216] Error parsing metadata: error=invalid value key=content-type value=application/grpc-web-text

Do you know about it?

juliusvonkohout · 2024-09-04T08:06:26Z

@nparkstar is this a pipelines or manifests issue? It looks like pipelines only.

nparkstar · 2024-09-05T03:58:30Z

@nparkstar is this a pipelines or manifests issue? It looks like pipelines only.

Jithsaavvy said "Yes, the issue is fixed in v1.9 stable release."
I tried v1.9, but the issue remains.
So I commented is this issue.
My comments are not appropriate for this issue, I'll delete.

juliusvonkohout · 2024-09-05T06:20:11Z

@nparkstar you probably need a new issue. Please test with a fresh kind cluster as described in our readme, to make sure that it is not specific to your Kubernetes cluster first.

nparkstar · 2024-09-05T06:41:33Z

@nparkstar you probably need a new issue. Please test with a fresh kind cluster as described in our readme, to make sure that it is not specific to your Kubernetes cluster first.
Thank you for your response.
I'll test on new system as you tell and I'll post the result.

nparkstar · 2024-09-06T06:40:01Z

@nparkstar you probably need a new issue. Please test with a fresh kind cluster as described in our readme, to make sure that it is not specific to your Kubernetes cluster first.
Thank you for your response.
I'll test on new system as you tell and I'll post the result.

I solved my issue at last.
I did fresh install on the new machine, and the problem has not appear anymore.

But I think the following instruction is wrong.
"Install individual components" of https://github.com/kubeflow/manifests/tree/v1.9.0.

I cannot connect to the central dashboard after installing kubeflow according to the above instruction.

I succeeded after installing using bellow command.

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Thanks,

juliusvonkohout · 2024-09-06T08:56:59Z

Then please create a PR to fix it.

nparkstar · 2024-09-10T12:25:43Z

@juliusvonkohout

I created PRs.
#2873
#2874

This is my first time to create PR.
If I did something wrong, tell me.

Thanks,

vak890 · 2024-09-12T02:23:32Z

@juliusvonkohout I also get the same error when trying to deploy all components
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done

juliusvonkohout closed this as completed Jul 22, 2024

This was referenced Sep 10, 2024

Updating master README.md for Install individual components #2873

Merged

commit message : Updating v1.9-branch README.md for Install individua… #2874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] Cannot get MLMD objects from Metadata store. Cannot find context (Version 1.9.0-rc.2) #2800

[backend] Cannot get MLMD objects from Metadata store. Cannot find context (Version 1.9.0-rc.2) #2800

Jithsaavvy commented Jul 15, 2024 •

edited

Loading

juliusvonkohout commented Jul 22, 2024

Jithsaavvy commented Jul 24, 2024

nparkstar commented Sep 4, 2024 •

edited

Loading

juliusvonkohout commented Sep 4, 2024 •

edited

Loading

nparkstar commented Sep 5, 2024

juliusvonkohout commented Sep 5, 2024

nparkstar commented Sep 5, 2024

nparkstar commented Sep 6, 2024

juliusvonkohout commented Sep 6, 2024

nparkstar commented Sep 10, 2024

vak890 commented Sep 12, 2024

[backend] Cannot get MLMD objects from Metadata store. Cannot find context (Version 1.9.0-rc.2) #2800

[backend] Cannot get MLMD objects from Metadata store. Cannot find context (Version 1.9.0-rc.2) #2800

Comments

Jithsaavvy commented Jul 15, 2024 • edited Loading

Validation Checklist

Version

Describe your issue

Environment

Description

Expected Behavior

Current Behavior

Steps to reproduce the issue

Additional Context

Related Issues

Put here any screenshots or videos (optional)

juliusvonkohout commented Jul 22, 2024

Jithsaavvy commented Jul 24, 2024

nparkstar commented Sep 4, 2024 • edited Loading

juliusvonkohout commented Sep 4, 2024 • edited Loading

nparkstar commented Sep 5, 2024

juliusvonkohout commented Sep 5, 2024

nparkstar commented Sep 5, 2024

nparkstar commented Sep 6, 2024

juliusvonkohout commented Sep 6, 2024

nparkstar commented Sep 10, 2024

vak890 commented Sep 12, 2024

Jithsaavvy commented Jul 15, 2024 •

edited

Loading

nparkstar commented Sep 4, 2024 •

edited

Loading

juliusvonkohout commented Sep 4, 2024 •

edited

Loading