Skip to content

Upgrade cloudpickle to > 2.0.0 #6699

@feizerl

Description

@feizerl

Hello,

One of the features of pipelines is Step Caching (https://www.kubeflow.org/docs/components/pipelines/caching/) to avoid running the costly computations again and again.

The key for caching is:

message CacheKey {
  map<string, ArtifactNameList> inputArtifactNames = 1;
  map<string, Value> inputParameters = 2;
  map<string, RuntimeArtifact> outputArtifactsSpec = 3;
  map<string, string> outputParametersSpec=4;
  ContainerSpec containerSpec=5;
}

When using the option use_code_pickling from

use_code_pickling=False) -> ComponentSpec:

the pickle of the function gets embedded in the ContainerSpec (and hence becomes part of the key).

So far, all good.

However, the pickle is generated with cloudpickle which leads to non deterministic pickles every time you run the pipeline. As you can imagine, this makes caching feature useless because it will invalidate the cache every time it is run.

This non determinism was removed from cloudpickle with the following commit:
cloudpipe/cloudpickle#428 and released as part of 2.0.0 release:
https://github.com/cloudpipe/cloudpickle/releases/tag/v2.0.0

Currently, kfp has bounded cloudpickle to less than v2.0.0 here:

'cloudpickle>=1.3.0,<2',

Would it be possible to make a new kfp release with upgraded cloudpickle? Without this cloudpickle version, step caching is currently impossible to use (or at the mercy of dictionary insertion order of cloudpickle).

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions