-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] fix(backend): implement subdag output resolution #11196
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]>
af3c3e1
to
1e62d0d
Compare
@@ -125,6 +126,8 @@ func RootDAG(ctx context.Context, opts Options, mlmd *metadata.Client) (executio | |||
err = fmt.Errorf("driver.RootDAG(%s) failed: %w", opts.info(), err) | |||
} | |||
}() | |||
b, _ := json.Marshal(opts) | |||
glog.V(4).Info("RootDAG opts: ", string(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added a ton of debug level logs to make debugging stuff like this easier for the next person. We need to add some handling in the backend to support toggling level 4 logs in the driver on and off.
Happy to jump on a call if synchronous questions / feedback is easier. Although concise, these changes are quite convoluted. |
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]>
We just pushed up a commit that implements support for multiple layers of nested subdags (i.e. subdags of subdags). We validated that it behaves as expected with the following example code: from kfp import dsl
from kfp.client import Client
@dsl.component
def small_comp() -> str:
return "privet"
@dsl.component
def large_comp(input: str):
print("input :", input)
@dsl.pipeline
def small_matroushka_doll() -> str:
task = small_comp()
task.set_caching_options(False)
return task.output
@dsl.pipeline
def medium_matroushka_doll() -> str:
dag_task = small_matroushka_doll()
dag_task.set_caching_options(False)
return dag_task.output
@dsl.pipeline
def large_matroushka_doll():
dag_task = medium_matroushka_doll()
task = large_comp(input=dag_task.output)
task.set_caching_options(False)
dag_task.set_caching_options(False)
if __name__ == "__main__":
client = Client()
run = client.create_run_from_pipeline_func(
pipeline_func=large_matroushka_doll,
enable_caching=False,
) PS. I hate matroushka dolls, they're so full of themselves. |
So this PR handles subdag output parameters but not subdag output artifacts. We're going to add some logic to handle the latter as well since the problems are similar. |
Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]>
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> Co-authored-by: CarterFendley <[email protected]> Co-authored-by: edmondop <[email protected]>
1cb4db8
to
a0a7b7b
Compare
We just added and validated support for output artifacts as well, which addresses #10041. Here's a screenshot from a pipeline with nested DAGs and output artifacts that executed successfully: Here's the example code: from kfp import dsl
from kfp.client import Client
from kfp.compiler import Compiler
@dsl.component
def inner_comp(dataset: dsl.Output[dsl.Dataset]):
with open(dataset.path, "w") as f:
f.write("foobar")
@dsl.component
def outer_comp(input: dsl.Dataset):
print("input: ", input)
@dsl.pipeline
def inner_pipeline() -> dsl.Dataset:
inner_comp_task = inner_comp()
inner_comp_task.set_caching_options(False)
return inner_comp_task.output
@dsl.pipeline
def outer_pipeline():
inner_pipeline_task = inner_pipeline()
outer_comp_task = outer_comp(input=inner_pipeline_task.output)
outer_comp_task.set_caching_options(False)
if __name__ == "__main__":
# Compiler().compile(outer_pipeline, "ignore/subdag_artifacts.yaml")
client = Client()
run = client.create_run_from_pipeline_func(
pipeline_func=outer_pipeline,
enable_caching=False,
) There's still a lot more work to be done in terms of testing, decomposition, making the code more consistent and DRY, etc, but it works and it did not work before, so hooray for progress. |
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>
58bed92
to
ee7f6c9
Compare
Signed-off-by: droctothorpe <[email protected]>
Just pushed up a commit that decomposes the graph traversal logic to improve readability, reduce complexity, and make granular testing easier. The next order of business is multiple outputs and NamedTuples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks, love that you are doing this, amazing stuff!!
I just had a skim and left some quick thoughts, I see that it's still WIP, so apologies if the comments are premature. Haven't had a chance to try it out yet.
The approach does make sense to me. Given that we are just writing spec data as execution properties, I think it makes sense to do it in the driver, since we already have this info at pipeline creation.
if flattenedTasks == nil { | ||
flattenedTasks = make(map[string]*metadata.Execution) | ||
} | ||
currentExecutionTasks, err := mlmd.GetExecutionsInDAG(ctx, dag, pipeline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, have you considered just getting all the executions for the context instead of doing a dfs filter here?
for example GetExecutionsInDag() is simply doing a get executions for the context but with a filter, without the filter it should just simply get all the dags for that particular context, this way we don't need to do multiple db queries
task name's iirc should be unique, I suppose the only concern here would be if the pipeline is really large and has a lot of task executions, but I would think it would have to be unrealistically large for that to be an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Right now, the number of calls is equal to the number of nested sub-DAGs. If the call sans filter gets all executions AND executions in sub-DAGs, that could definitely reduce the number of database queries. We'll test it out. Thanks for the suggestion!
backend/src/v2/driver/driver.go
Outdated
json.Unmarshal(b, &outputParametersMap) | ||
glog.V(4).Info("Deserialized outputParametersMap: ", outputParametersMap) | ||
subTaskName := outputParametersMap["producer_subtask"] | ||
glog.V(4).Infof( | ||
"Overriding currentTask, %v, output with currentTask's producer_subtask, %v, output.", | ||
currentTask.TaskName(), | ||
subTaskName, | ||
) | ||
|
||
// Reassign sub-task before running through the loop again. | ||
currentTask = tasks[subTaskName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to handle some of these potential error cases, for example:
- if the producer task is not in outputParametersMap
- subtaskname is not in tasks
backend/src/v2/driver/driver.go
Outdated
outputParameterKey := value.GetValueFromParameter().OutputParameterKey | ||
producerSubTask := value.GetValueFromParameter().ProducerSubtask | ||
glog.V(4).Info("outputParameterKey: ", outputParameterKey) | ||
glog.V(4).Info("producerSubtask: ", producerSubTask) | ||
|
||
outputParameterMap := map[string]interface{}{ | ||
"output_parameter_key": outputParameterKey, | ||
"producer_subtask": producerSubTask, | ||
} | ||
|
||
outputParameterStruct, _ := structpb.NewValue(outputParameterMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come you didn't use a DagOutputParameterSpec like you did for DagOutputArtifactSpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic we added to the mlmd client handles converting the artifact struct into a format suitable for storage in the database. Our reasoning was that that was an implementation detail that wasn't really relevant to the driver. That logic holds true for the output parameters as well, but we never refactored it to apply the principle. We'll move that logic out of driver.go
and into client.go
. Good call!
if config.OutputParameters != nil { | ||
e.CustomProperties[keyOutputs] = &pb.Value{Value: &pb.Value_StructValue{ | ||
StructValue: &structpb.Struct{ | ||
Fields: config.OutputParameters, | ||
}, | ||
}} | ||
} | ||
if config.OutputArtifacts != nil { | ||
b, err := json.Marshal(config.OutputArtifacts) | ||
if err != nil { | ||
return nil, err | ||
} | ||
e.CustomProperties[keyOutputArtifacts] = StringValue(string(b)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing I'm thinking about is how we distinguish between outputs
for container executions vs DAG executions, since for paramers in container executions they map to the actual resolved values, but for DAG we're simply storing reference values to the producer tasks
similarly for artifacts, for container executions we store the artifact obj store metadata (pulled from artifact properties), but for dag executions we are again, storing the output producer spec data, right?
so I'm wondering if it makes sense to have a separate parameter entirely to distinguish these types, I don't have a concrete suggestion, maybe something like parameter_producer_tasks and artifact_producer_tasks 🤷🏾♂️
I'm not sure but atm if we use outputs
then it will probably show up on the UI for executions page under the DAG but show something different than it does for Container executions
backend/src/v2/driver/driver.go
Outdated
ecfg.OutputParameters = map[string]*structpb.Value{ | ||
value.GetValueFromParameter().OutputParameterKey: outputParameterStruct, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like you are re writing ecfg.OutputParameters with a new map every iteration, did you mean to update the map instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right! We would likely have hit a wall because of this when we started tested multiple outputs.
Really appreciate the time and effort you took to grok some not particularly grokkable code. Thank you for the feedback, @HumairAK!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anytime! feel free to @ me here or slack if you'd like another review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I spent like 2 hours trying to dig up where the overwriting of outputs was happening! Thank you @HumairAK
backend/src/v2/driver/driver.go
Outdated
// TODO(Bobgy): cache results | ||
outputs, err := mlmd.GetOutputArtifactsByExecutionId(ctx, producer.GetID()) | ||
glog.V(4).Infof("Deserialized outputArtifacts: %v", outputArtifacts) | ||
artifactSelectors := outputArtifacts["Output"].GetArtifactSelectors() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key "Output" only works when a component has a single output. We should loop over the keys here.
for k, value := range outputArtifacts {
glog.V(4).Infof("k: %v", k)
glog.V(4).Infof("outputArtifacts[k]: %v", value)
artifactSelectors := value.GetArtifactSelectors()
for _, v := range artifactSelectors {
glog.V(4).Infof("v: %v", v)
glog.V(4).Infof("v.ProducerSubtask: %v", v.ProducerSubtask)
glog.V(4).Infof("v.OutputArtifactKey: %v", v.OutputArtifactKey)
subTaskName = v.ProducerSubtask
outputArtifactKey = v.OutputArtifactKey
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This change breaks the lookup loop. I think we'll have to go to full recursion.
backend/src/v2/driver/driver.go
Outdated
// corresponding producer sub-task, reassign currentTask, | ||
// and iterate through this loop again. | ||
var outputParametersMap map[string]string | ||
b, err := outputParametersCustomProperty["Output"].GetStructValue().MarshalJSON() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key "Output" only works when a component has a single output. We should loop over the keys here.
for k, value := range outputParametersCustomProperty {
var outputParametersMap map[string]string
b, err := value. GetStructValue().MarshalJSON()
if err != nil {
return err
}
json.Unmarshal(b, &outputParametersMap)
subTaskName := outputParametersMap["producer_subtask"]
outputArtifactKey := outputParametersMap["output_artifact_key"]
}
Note: This breaks the loop logic.
Description of your changes:
This is a WIP PR intended to fix #10039. Additional functionality, tests, and a more detailed PR description to follow.
Checklist: