Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Cannot Pass Artifacts from Parent Component to Children within dsl.ParallelFor #10149

Closed
TristanGreathouse opened this issue Oct 25, 2023 · 0 comments · Fixed by #10162
Closed

Comments

@TristanGreathouse
Copy link

TristanGreathouse commented Oct 25, 2023

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    Using AWS distribution of Kubeflow.

  • KFP version:
    Built from 1.8 branch the day before 1.8-rc2 was released.

  • KFP SDK version:

kfp                        2.3.0                  
kfp-kubernetes             1.0.0                  
kfp-pipeline-spec          0.2.2                  
kfp-server-api             2.0.2

Steps to reproduce

This problem occurs when passing an artifact from a node outside of a ParallelFor to a component inside of a ParallelFor. It can be reproduced with this simple example. You will need to change the pipeline_root and region env variables, but other than that should be good to go out of the box.

import kfp
from kfp import dsl
from kfp.dsl import Output, Input, Dataset

@dsl.component(packages_to_install=['pandas'])
def parent(output_path: Output[Dataset]):
    import pandas as pd
    data = pd.DataFrame({"a": [1,2,3], "b": [4,5,6]})
    data.to_csv(output_path.path, index=False)

@dsl.component(packages_to_install=['pandas'])
def child(input_path: Input[Dataset], output_path: Output[Dataset]):
    import pandas as pd
    df = pd.read_csv(input_path.path)
    df.to_csv(output_path.path, index=False)


@dsl.pipeline(pipeline_root="s3://beta-kf-1-8-test")
def compile_pipeline(l: list = [1, 2]):
    parent_component = parent()
    parent_component.set_env_variable(name="AWS_REGION", value="us-east-1")
    with dsl.ParallelFor(l) as args:
        child_component = child(input_path=parent_component.output)
        child_component.set_env_variable(name="AWS_REGION", value="us-east-1")
    
if __name__ == "__main__":
    kfp.compiler.Compiler().compile(compile_pipeline, "test_pipeline.yaml")

When running a pipeline compiled using the above snippet of code, we cannot pass the artifact to the child node, and get the following error.

I1024 23:57:02.344644      20 driver.go:771] parent DAG input parameters map[pipelinechannel--l-loop-item:string_value:"2"]
F1024 23:57:02.344738      20 main.go:76] KFP driver: driver.Container(pipelineName=compile-pipeline, runID=385130ec-9bfd-4d88-af2c-a0bf563d6e47, task="child", component="comp-child", dagExecutionID=77983, componentSpec) failed: failed to resolve inputs: failed to resolve input artifact input_path with spec component_input_artifact:"pipelinechannel--parent-output_path": component input artifact not implemented yet
time="2023-10-24T23:57:03.117Z" level=info msg="sub-process exited" argo=true error="<nil>"

Please let us know if we're doing something wrong, or if there is another way to workaround this.

Expected result

We should be able to pass artifacts from outside the ParallelFor to components within the ParallelFor without issue.

Materials and Reference

It seems like this may be related to #10041 and #10039, but the manifestation of the error is different.


Impacted by this bug? Give it a 👍.

@TristanGreathouse TristanGreathouse changed the title [bug] <Cannot Pass Artifacts from Parent Component to Children within dsl.ParallelFor> [bug] Cannot Pass Artifacts from Parent Component to Children within dsl.ParallelFor Oct 25, 2023
@chensun chensun self-assigned this Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

2 participants