[WIP] fix(backend): implement subdag output resolution #11196

droctothorpe · 2024-09-11T16:46:20Z

Description of your changes:
This is a WIP PR intended to fix #10039. Additional functionality, tests, and a more detailed PR description to follow.

Checklist:

You have signed off your commits
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

google-oss-prow · 2024-09-11T16:46:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

backend/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]>

droctothorpe · 2024-09-11T17:33:36Z

backend/src/v2/driver/driver.go

@@ -125,6 +126,8 @@ func RootDAG(ctx context.Context, opts Options, mlmd *metadata.Client) (executio
 			err = fmt.Errorf("driver.RootDAG(%s) failed: %w", opts.info(), err)
 		}
 	}()
+	b, _ := json.Marshal(opts)
+	glog.V(4).Info("RootDAG opts: ", string(b))


We added a ton of debug level logs to make debugging stuff like this easier for the next person. We need to add some handling in the backend to support toggling level 4 logs in the driver on and off.

backend/src/v2/driver/driver.go

droctothorpe · 2024-09-11T18:27:10Z

Happy to jump on a call if synchronous questions / feedback is easier. Although concise, these changes are quite convoluted.

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]>

droctothorpe · 2024-09-11T20:53:45Z

We just pushed up a commit that implements support for multiple layers of nested subdags (i.e. subdags of subdags). We validated that it behaves as expected with the following example code:

from kfp import dsl
from kfp.client import Client

@dsl.component
def small_comp() -> str:
    return "privet"

@dsl.component
def large_comp(input: str):
    print("input :", input)


@dsl.pipeline
def small_matroushka_doll() -> str:
    task = small_comp()
    task.set_caching_options(False)
    return task.output

@dsl.pipeline
def medium_matroushka_doll() -> str:
    dag_task = small_matroushka_doll()
    dag_task.set_caching_options(False)
    return dag_task.output

@dsl.pipeline
def large_matroushka_doll():
    dag_task = medium_matroushka_doll()
    task = large_comp(input=dag_task.output)
    task.set_caching_options(False)
    dag_task.set_caching_options(False)


if __name__ == "__main__":
    client = Client()

    run = client.create_run_from_pipeline_func(
        pipeline_func=large_matroushka_doll,
        enable_caching=False,
    )

PS. I hate matroushka dolls, they're so full of themselves.

droctothorpe · 2024-09-12T15:58:08Z

So this PR handles subdag output parameters but not subdag output artifacts. We're going to add some logic to handle the latter as well since the problems are similar.

Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]>

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]> Co-authored-by: edmondop <[email protected]>

droctothorpe · 2024-09-17T20:16:59Z

We just added and validated support for output artifacts as well, which addresses #10041. Here's a screenshot from a pipeline with nested DAGs and output artifacts that executed successfully:

Here's the example code:

from kfp import dsl
from kfp.client import Client
from kfp.compiler import Compiler

@dsl.component
def inner_comp(dataset: dsl.Output[dsl.Dataset]):
    with open(dataset.path, "w") as f:
        f.write("foobar")


@dsl.component
def outer_comp(input: dsl.Dataset):
    print("input: ", input)


@dsl.pipeline
def inner_pipeline() -> dsl.Dataset:
    inner_comp_task = inner_comp()
    inner_comp_task.set_caching_options(False)
    return inner_comp_task.output
    
@dsl.pipeline
def outer_pipeline():
    inner_pipeline_task = inner_pipeline()
    outer_comp_task = outer_comp(input=inner_pipeline_task.output)
    outer_comp_task.set_caching_options(False)


if __name__ == "__main__":
    # Compiler().compile(outer_pipeline, "ignore/subdag_artifacts.yaml")
    client = Client()

    run = client.create_run_from_pipeline_func(
        pipeline_func=outer_pipeline,
        enable_caching=False,
    )

There's still a lot more work to be done in terms of testing, decomposition, making the code more consistent and DRY, etc, but it works and it did not work before, so hooray for progress.

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>

Signed-off-by: droctothorpe <[email protected]>

droctothorpe · 2024-09-19T16:19:45Z

Just pushed up a commit that decomposes the graph traversal logic to improve readability, reduce complexity, and make granular testing easier. The next order of business is multiple outputs and NamedTuples.

HumairAK

Hey folks, love that you are doing this, amazing stuff!!

I just had a skim and left some quick thoughts, I see that it's still WIP, so apologies if the comments are premature. Haven't had a chance to try it out yet.

The approach does make sense to me. Given that we are just writing spec data as execution properties, I think it makes sense to do it in the driver, since we already have this info at pipeline creation.

HumairAK · 2024-09-19T21:58:40Z

backend/src/v2/driver/driver.go

+	if flattenedTasks == nil {
+		flattenedTasks = make(map[string]*metadata.Execution)
+	}
+	currentExecutionTasks, err := mlmd.GetExecutionsInDAG(ctx, dag, pipeline)


question, have you considered just getting all the executions for the context instead of doing a dfs filter here?

for example GetExecutionsInDag() is simply doing a get executions for the context but with a filter, without the filter it should just simply get all the dags for that particular context, this way we don't need to do multiple db queries

task name's iirc should be unique, I suppose the only concern here would be if the pipeline is really large and has a lot of task executions, but I would think it would have to be unrealistically large for that to be an issue

🔥 Right now, the number of calls is equal to the number of nested sub-DAGs. If the call sans filter gets all executions AND executions in sub-DAGs, that could definitely reduce the number of database queries. We'll test it out. Thanks for the suggestion!

HumairAK · 2024-09-19T22:01:51Z

backend/src/v2/driver/driver.go

+			json.Unmarshal(b, &outputParametersMap)
+			glog.V(4).Info("Deserialized outputParametersMap: ", outputParametersMap)
+			subTaskName := outputParametersMap["producer_subtask"]
+			glog.V(4).Infof(
+				"Overriding currentTask, %v, output with currentTask's producer_subtask, %v, output.",
+				currentTask.TaskName(),
+				subTaskName,
+			)
+
+			// Reassign sub-task before running through the loop again.
+			currentTask = tasks[subTaskName]


might need to handle some of these potential error cases, for example:

if the producer task is not in outputParametersMap

subtaskname is not in tasks

HumairAK · 2024-09-19T22:14:17Z

backend/src/v2/driver/driver.go

+		outputParameterKey := value.GetValueFromParameter().OutputParameterKey
+		producerSubTask := value.GetValueFromParameter().ProducerSubtask
+		glog.V(4).Info("outputParameterKey: ", outputParameterKey)
+		glog.V(4).Info("producerSubtask: ", producerSubTask)
+
+		outputParameterMap := map[string]interface{}{
+			"output_parameter_key": outputParameterKey,
+			"producer_subtask":     producerSubTask,
+		}
+
+		outputParameterStruct, _ := structpb.NewValue(outputParameterMap)


how come you didn't use a DagOutputParameterSpec like you did for DagOutputArtifactSpec?

The logic we added to the mlmd client handles converting the artifact struct into a format suitable for storage in the database. Our reasoning was that that was an implementation detail that wasn't really relevant to the driver. That logic holds true for the output parameters as well, but we never refactored it to apply the principle. We'll move that logic out of driver.go and into client.go. Good call!

HumairAK · 2024-09-19T22:25:05Z

backend/src/v2/metadata/client.go

+	if config.OutputParameters != nil {
+		e.CustomProperties[keyOutputs] = &pb.Value{Value: &pb.Value_StructValue{
+			StructValue: &structpb.Struct{
+				Fields: config.OutputParameters,
+			},
+		}}
+	}
+	if config.OutputArtifacts != nil {
+		b, err := json.Marshal(config.OutputArtifacts)
+		if err != nil {
+			return nil, err
+		}
+		e.CustomProperties[keyOutputArtifacts] = StringValue(string(b))
+	}


one thing I'm thinking about is how we distinguish between outputs for container executions vs DAG executions, since for paramers in container executions they map to the actual resolved values, but for DAG we're simply storing reference values to the producer tasks

similarly for artifacts, for container executions we store the artifact obj store metadata (pulled from artifact properties), but for dag executions we are again, storing the output producer spec data, right?

so I'm wondering if it makes sense to have a separate parameter entirely to distinguish these types, I don't have a concrete suggestion, maybe something like parameter_producer_tasks and artifact_producer_tasks 🤷🏾‍♂️

I'm not sure but atm if we use outputs then it will probably show up on the UI for executions page under the DAG but show something different than it does for Container executions

HumairAK · 2024-09-19T22:26:41Z

backend/src/v2/driver/driver.go

+		ecfg.OutputParameters = map[string]*structpb.Value{
+			value.GetValueFromParameter().OutputParameterKey: outputParameterStruct,
+		}


this looks like you are re writing ecfg.OutputParameters with a new map every iteration, did you mean to update the map instead?

You're absolutely right! We would likely have hit a wall because of this when we started tested multiple outputs.

Really appreciate the time and effort you took to grok some not particularly grokkable code. Thank you for the feedback, @HumairAK!

Anytime! feel free to @ me here or slack if you'd like another review

Awesome! I spent like 2 hours trying to dig up where the overwriting of outputs was happening! Thank you @HumairAK

boarder7395 · 2024-09-23T20:00:53Z

backend/src/v2/driver/driver.go

-			// TODO(Bobgy): cache results
-			outputs, err := mlmd.GetOutputArtifactsByExecutionId(ctx, producer.GetID())
+			glog.V(4).Infof("Deserialized outputArtifacts: %v", outputArtifacts)
+			artifactSelectors := outputArtifacts["Output"].GetArtifactSelectors()


Key "Output" only works when a component has a single output. We should loop over the keys here.

for k, value := range outputArtifacts { glog.V(4).Infof("k: %v", k) glog.V(4).Infof("outputArtifacts[k]: %v", value) artifactSelectors := value.GetArtifactSelectors() for _, v := range artifactSelectors { glog.V(4).Infof("v: %v", v) glog.V(4).Infof("v.ProducerSubtask: %v", v.ProducerSubtask) glog.V(4).Infof("v.OutputArtifactKey: %v", v.OutputArtifactKey) subTaskName = v.ProducerSubtask outputArtifactKey = v.OutputArtifactKey } }

Note: This change breaks the lookup loop. I think we'll have to go to full recursion.

boarder7395 · 2024-09-23T20:10:06Z

backend/src/v2/driver/driver.go

+			// corresponding producer sub-task, reassign currentTask,
+			// and iterate through this loop again.
+			var outputParametersMap map[string]string
+			b, err := outputParametersCustomProperty["Output"].GetStructValue().MarshalJSON()


Key "Output" only works when a component has a single output. We should loop over the keys here.

for k, value := range outputParametersCustomProperty { var outputParametersMap map[string]string b, err := value. GetStructValue().MarshalJSON() if err != nil { return err } json.Unmarshal(b, &outputParametersMap) subTaskName := outputParametersMap["producer_subtask"] outputArtifactKey := outputParametersMap["output_artifact_key"] }

Note: This breaks the loop logic.

google-oss-prow bot added the do-not-merge/work-in-progress label Sep 11, 2024

google-oss-prow bot requested review from HumairAK and rimolive September 11, 2024 16:46

google-oss-prow bot added the size/L label Sep 11, 2024

fix(backend): implement subdag output resolution

1e62d0d

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]>

droctothorpe force-pushed the subdagio branch from af3c3e1 to 1e62d0d Compare September 11, 2024 16:55

droctothorpe commented Sep 11, 2024

View reviewed changes

backend/src/v2/driver/driver.go Outdated Show resolved Hide resolved

Add support for subdags of subdags

a0a7b7b

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]>

zazulam and others added 2 commits September 17, 2024 09:58

handle edge case

1cb4db8

Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]>

Handle artifact outputs as well

d920cf1

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]> Co-authored-by: edmondop <[email protected]>

droctothorpe force-pushed the subdagio branch from 1cb4db8 to a0a7b7b Compare September 17, 2024 20:11

Simplify parameter handling logic

ee7f6c9

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>

droctothorpe force-pushed the subdagio branch from 58bed92 to ee7f6c9 Compare September 18, 2024 21:19

Begin decomposition

217ff0d

Signed-off-by: droctothorpe <[email protected]>

HumairAK reviewed Sep 19, 2024

View reviewed changes

CarterFendley mentioned this pull request Sep 20, 2024

Subdagio #11228

Open

boarder7395 reviewed Sep 23, 2024

View reviewed changes

CarterFendley mentioned this pull request Sep 23, 2024

[feature] Build and test V2 driver / launcher images against incoming PRs #11239

Open

Fixes nested pipelines with multiple artifact, or parameter outputs.

2f0af61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] fix(backend): implement subdag output resolution #11196

[WIP] fix(backend): implement subdag output resolution #11196

droctothorpe commented Sep 11, 2024

google-oss-prow bot commented Sep 11, 2024

droctothorpe Sep 11, 2024

droctothorpe commented Sep 11, 2024

droctothorpe commented Sep 11, 2024

droctothorpe commented Sep 12, 2024 •

edited

Loading

droctothorpe commented Sep 17, 2024

droctothorpe commented Sep 19, 2024

HumairAK left a comment •

edited

Loading

HumairAK Sep 19, 2024

droctothorpe Sep 19, 2024

HumairAK Sep 19, 2024

HumairAK Sep 19, 2024

droctothorpe Sep 19, 2024

HumairAK Sep 19, 2024 •

edited

Loading

HumairAK Sep 19, 2024

droctothorpe Sep 19, 2024

HumairAK Sep 20, 2024

boarder7395 Sep 20, 2024 •

edited

Loading

boarder7395 Sep 23, 2024

boarder7395 Sep 23, 2024

boarder7395 Sep 23, 2024

[WIP] fix(backend): implement subdag output resolution #11196

Are you sure you want to change the base?

[WIP] fix(backend): implement subdag output resolution #11196

Conversation

droctothorpe commented Sep 11, 2024

google-oss-prow bot commented Sep 11, 2024

Choose a reason for hiding this comment

droctothorpe commented Sep 11, 2024

droctothorpe commented Sep 11, 2024

droctothorpe commented Sep 12, 2024 • edited Loading

droctothorpe commented Sep 17, 2024

droctothorpe commented Sep 19, 2024

HumairAK left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HumairAK Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boarder7395 Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droctothorpe commented Sep 12, 2024 •

edited

Loading

HumairAK left a comment •

edited

Loading

HumairAK Sep 19, 2024 •

edited

Loading

boarder7395 Sep 20, 2024 •

edited

Loading