-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Hello,
One of the features of pipelines is Step Caching (https://www.kubeflow.org/docs/components/pipelines/caching/) to avoid running the costly computations again and again.
The key for caching is:
message CacheKey {
map<string, ArtifactNameList> inputArtifactNames = 1;
map<string, Value> inputParameters = 2;
map<string, RuntimeArtifact> outputArtifactsSpec = 3;
map<string, string> outputParametersSpec=4;
ContainerSpec containerSpec=5;
}
When using the option use_code_pickling from
| use_code_pickling=False) -> ComponentSpec: |
the pickle of the function gets embedded in the ContainerSpec (and hence becomes part of the key).
So far, all good.
However, the pickle is generated with cloudpickle which leads to non deterministic pickles every time you run the pipeline. As you can imagine, this makes caching feature useless because it will invalidate the cache every time it is run.
This non determinism was removed from cloudpickle with the following commit:
cloudpipe/cloudpickle#428 and released as part of 2.0.0 release:
https://github.com/cloudpipe/cloudpickle/releases/tag/v2.0.0
Currently, kfp has bounded cloudpickle to less than v2.0.0 here:
Line 37 in 74c7773
| 'cloudpickle>=1.3.0,<2', |
Would it be possible to make a new kfp release with upgraded cloudpickle? Without this cloudpickle version, step caching is currently impossible to use (or at the mercy of dictionary insertion order of cloudpickle).
Thanks!