Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] "cannot save parameter" for cached steps #10729

Open
hbelmiro opened this issue Apr 23, 2024 · 8 comments
Open

[backend] "cannot save parameter" for cached steps #10729

hbelmiro opened this issue Apr 23, 2024 · 8 comments

Comments

@hbelmiro
Copy link
Contributor

hbelmiro commented Apr 23, 2024

When running a simple V2 pipeline more than once the following errors happen:

time="2024-04-23T12:22:21.218Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-04-23T12:22:21.218Z" level=info msg="/tmp/outputs/cached-decision -> /var/run/argo/outputs/parameters//tmp/outputs/cached-decision" argo=true
time="2024-04-23T12:22:21.218Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"

Pipeline sample:

# PIPELINE DEFINITION
# Name: hello-pipeline
# Inputs:
#    recipient: str
# Outputs:
#    Output: str
components:
  comp-say-hello:
    executorLabel: exec-say-hello
    inputDefinitions:
      parameters:
        name:
          parameterType: STRING
    outputDefinitions:
      parameters:
        Output:
          parameterType: STRING
deploymentSpec:
  executors:
    exec-say-hello:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - say_hello
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef say_hello(name: str) -> str:\n    hello_text = f'Hello, {name}!'\n\
          \    print(hello_text)\n    return hello_text\n\n"
        image: python:3.7
pipelineInfo:
  name: hello-pipeline
root:
  dag:
    outputs:
      parameters:
        Output:
          valueFromParameter:
            outputParameterKey: Output
            producerSubtask: say-hello
    tasks:
      say-hello:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-say-hello
        inputs:
          parameters:
            name:
              componentInputParameter: recipient
        taskInfo:
          name: say-hello
  inputDefinitions:
    parameters:
      recipient:
        parameterType: STRING
  outputDefinitions:
    parameters:
      Output:
        parameterType: STRING
schemaVersion: 2.1.0
sdkVersion: kfp-2.7.0

This is related to #9678 (comment).

Impacted by this bug? Give it a 👍.

@hbelmiro
Copy link
Contributor Author

/assign @hbelmiro

@leanaha
Copy link

leanaha commented Jun 7, 2024

Hi @hbelmiro, any update on this?

I bumped my company pipelines to make them compliant with KFP v2 and they are throwing these errors:

time="2024-06-07T17:29:06.435Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-06-07T17:29:06.436Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-06-07T17:29:06.436Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-06-07T17:29:06.436Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"

@hbelmiro
Copy link
Contributor Author

hbelmiro commented Jun 7, 2024

Hi @leanaha.
I still didn't have time to work on it.
Feel free to send a PR if you know how to fix it. I can help with the review.

/unassign @hbelmiro

Copy link

github-actions bot commented Aug 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Aug 7, 2024
@AndersBennedsgaard
Copy link

Still relevant

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Aug 7, 2024
@hbelmiro
Copy link
Contributor Author

hbelmiro commented Aug 7, 2024

/lifecycle frozen
/remove-lifecycle stale

@lost-io
Copy link

lost-io commented Aug 16, 2024

(Potential solve) may not be relevant.

We had similar issue in our cluster, based on Rancher Kubernetes engine 2.
The issue where not Kubeflow pipelines itself, but the pipeline container not being able to communicate with the ml-pipeline controller.
Due to network/network policies.

Applied something like this for the given Kubeflow profile namespace.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-ml-pipeline-controller
  namespace: profile-namespace
spec:
  policyTypes:
    - Egress
  egress:
    - ports:
      - port: 8887
        protocol: TCP
      to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kubeflow
      - podSelector:
          matchLabels:
            app: ml-pipeline
            app.kubernetes.io/name: kubeflow-pipelines

This may not be fine grained enough, but you get the idea.


Running recurring pipeline of say hello example:

Without networkPolicy

time="2024-08-16T10:38:06.360Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-08-16T10:38:06.360Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-08-16T10:38:06.360Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-08-16T10:38:06.360Z" level=info msg="/tmp/outputs/condition -> /var/run/argo/outputs/parameters//tmp/outputs/condition" argo=true
Error: exit status 1

With networkPolicy

time="2024-08-16T10:39:46.856Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-08-16T10:39:46.856Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-08-16T10:39:46.856Z" level=info msg="/tmp/outputs/cached-decision -> /var/run/argo/outputs/parameters//tmp/outputs/cached-decision" argo=true
time="2024-08-16T10:39:46.856Z" level=info msg="/tmp/outputs/condition -> /var/run/argo/outputs/parameters//tmp/outputs/condition" argo=true

Hope this solves the issue, for others.

@hbelmiro
Copy link
Contributor Author

hbelmiro commented Sep 3, 2024

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants