Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot save parameter /tmp/outputs/condition #9678

Closed
vsoch opened this issue Jun 23, 2023 · 15 comments · Fixed by #10459
Closed

cannot save parameter /tmp/outputs/condition #9678

vsoch opened this issue Jun 23, 2023 · 15 comments · Fixed by #10459
Assignees

Comments

@vsoch
Copy link

vsoch commented Jun 23, 2023

I've created a KubeFlow pipelines deployment on GKE with the manifests directly from the repository (the ones in the tutorial combined with code examples here led to validation errors). You can see the exact steps I'm taking to deploy the cluster, install KubeFlow, generate the compile / compile / install here: https://github.com/converged-computing/flux-operator-component/tree/add/component#kubeflow-on-gke

The issue I run into is on a run - I don't see any evidence that my workflow is running, but rather a weird error message about a parameter path in /tmp/outputs not existing. I also see that although I haven't yet defined any outputs, there is an automatic addition of a metadata one?

I0622 23:49:18.534358      17 main.go:224] output ExecutorInput:{
  "inputs": {
    "parameterValues": {
      "command": "echo hello world",
      "image": "ghcr.io/flux-framework/flux-restful-api:latest",
      "local": true,
      "name": "hello-world-run-123",
      "namespace": "kubeflow",
      "nnodes": 2,
      "project": "llnl-flux"
    }
  },
  "outputs": {
    "outputFile": "/tmp/kfp_outputs/output_metadata.json"
  }
}

And the error

time="2023-06-22T23:49:19.209Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-06-22T23:49:19.209Z" level=info msg="/tmp/outputs/pod-spec-patch -> /var/run/argo/outputs/parameters//tmp/outputs/pod-spec-patch" argo=true
time="2023-06-22T23:49:19.210Z" level=info msg="/tmp/outputs/cached-decision -> /var/run/argo/outputs/parameters//tmp/outputs/cached-decision" argo=true
time="2023-06-22T23:49:19.210Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"

I am new to KubeFlow so I apologize for my ignorance! I really like the idea, conceptually, and hope that the implementation is able to empower me to build components that work - I'm still trying to get my feet wet for this basic development setup. Thanks!

@liangzhupic
Copy link

same issue on my kfp deployment, any solution?

@vsoch
Copy link
Author

vsoch commented Jul 14, 2023

I haven't found any yet! I've moved away from Kubeflow Pipelines because there doesn't seem to be much support for it, and a lot of what I needed is being deprecated in v1 to v2.

@LeeSangJun
Copy link

I'm having the same issue +1

@zijianjoy
Copy link
Collaborator

/assign @chensun
Will it be related to #7629?

@LorenzoColombi
Copy link

Has anyone found a solution?

@bibekyess
Copy link

Any update? I am also facing similar issue.

@LorenzoColombi
Copy link

Any update? I am also facing similar issue.

I didn't find a solution but for me a workaround was moving from pipeline v1 to v2 (I installed fkp 2.0.0) and adapting the code.

@bibekyess
Copy link

bibekyess commented Oct 9, 2023

@LorenzoColombi Thank you for your prompt response. In my case, I faced this error when I was passing the input as parameters in Dict format and resulting output also in Dict format. But, when I passed it as an artifact (using Input[Dataset]) and output as Output[Dataset], the problem was solved. In some components, passing directly as an input parameter in Dict format is also working. So, I don't know the exact reason but it seems a safe bet to always pass large data using artifacts instead of parameters.

@DnPlas
Copy link

DnPlas commented Oct 14, 2023

Hi folks, I am having a very similar issue (if not the same) running the Data passing and DSL control structures examples, both ending in messages like this:

time="2023-10-13T14:58:08.311Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-10-13T14:58:08.311Z" level=info msg="/tmp/outputs/pod-spec-patch -> /var/run/argo/outputs/parameters//tmp/outputs/pod-spec-patch" argo=true
time="2023-10-13T14:58:08.311Z" level=info msg="/tmp/outputs/cached-decision -> /var/run/argo/outputs/parameters//tmp/outputs/cached-decision" argo=true
time="2023-10-13T14:58:08.311Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"

I'm using pipelines v2.0.1 and argo 3.3.10 for this.

@chensun could it be possible that #8733 is related and is in fact an issue? In the UI I get the message Cannot find context with {"typeName":"system.PipelineRun","contextName":"a5e7085e-ef10-48b2-a0a5-1ced3b93e2e5"}: Unknown Content-type received. and then the Pod logs throw the no such file or directory message.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 16, 2024
@hbelmiro
Copy link
Contributor

hbelmiro commented Feb 8, 2024

I've also faced this issue. Besides condition, it also happens for iteration-count.

time="2024-01-04T15:09:35.338Z" level=error msg="cannot save parameter /tmp/outputs/iteration-count" argo=true error="open /tmp/outputs/iteration-count: no such file or directory"
time="2024-01-04T15:09:35.338Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"

I sent #10459 to fix it.

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 8, 2024
hbelmiro added a commit to hbelmiro/data-science-pipelines that referenced this issue Feb 22, 2024
openshift-merge-bot bot added a commit to opendatahub-io/data-science-pipelines that referenced this issue Feb 22, 2024
fix(backend): fixes "cannot save parameter" error message. Fixes kubeflow#9678 (kubeflow#10459)
petethegreat pushed a commit to petethegreat/pipelines that referenced this issue Mar 27, 2024
petethegreat pushed a commit to petethegreat/pipelines that referenced this issue Mar 29, 2024
@thesuperzapper
Copy link
Member

@hbelmiro I am still seeing this error for the /tmp/outputs/pod-spec-patch path in KFP 2.1.0 (which contains your fix from #10459).

Specifically, all V2 pipeline runs show this error at the end:

time="2024-04-19T17:26:17.124Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"

However, the /tmp/outputs/cached-decision and /tmp/outputs/condition are working now.

@hbelmiro
Copy link
Contributor

@thesuperzapper I couldn't reproduce the error with the hello world pipeline.
Would you have some example to share?

@thesuperzapper
Copy link
Member

@hbelmiro just run it again, because the issue only happens for cached steps.

@hbelmiro
Copy link
Contributor

@thesuperzapper got it. On the second run, I can see the errors.
I opened a new issue and will work on that.
Thank you for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
10 participants