Conversation
eee6afc to
9b9e85d
Compare
|
cc @jrbourbeau |
Unit Test Results 12 files ±0 12 suites ±0 7h 22m 24s ⏱️ + 7m 33s For more details on these failures, see this check. Results for commit df66b2d. ± Comparison against base commit 94ebd57. ♻️ This comment has been updated with latest results. |
|
@dask/maintenance any thoughts on this? 🙂 |
|
Isn't the default protocol level for |
|
If we are using stdlib, yes. In many cases though we use our own |
|
OK, sorry - never mind. In that case, we might fall foul of any protocol 6 that might come along! |
jrbourbeau
left a comment
There was a problem hiding this comment.
we might fall foul of any protocol 6 that might come along
Yeah, I wonder if we should have a compatiblity variable we use to control these protocol levels. So instead of protocol=4 we have something like protocol=distributed.compatibility._PICKLE_PROTOCOL to centralize control a bit more
|
Think that is somewhat unlikely. Pickle protocol 4 was added in Python 3.4, which came out in Mar 2014. Pickle protocol 5 was added in Python 3.8, which came out in Oct 2019. So it took 5 years between those protocols and involved a PEP, reaching out to the community, a backport package, NumPy integration, etc.. Think we have plenty of time to respond if we are concerned (as was the case even around pickle protocol 5). Likely we would be engaged directly since we are seen as a group that cares about performant pickle serialization. Doubt there will be as big of a change as pickle protocol 5 (namely out-of-band communication). Though we may see coverage expanded to more builtin types. That all being said, we do a handshake to determine what the maximum commonly supported pickle version is, which is used to set the protocol in things like |
This was specified as Python 3.7 had native pickle protocol 4 support with the option for pickle protocol 5 via a backport package. If there were issues with the availability of the backport package, using the latest pickle protocol might run into issues. However Python 3.8+ has native pickle protocol 5 support. No backport required. So the kinds of issues Python 3.7 had won't occur. Given this, just use the latest pickle protocol.
9b9e85d to
df66b2d
Compare
|
Putting this another way, this should not be an issue as long as users have the same Python version used across Workers as they will all have access to the same pickle protocols. The only reason this was an issue before is we adopted a backport package that may or may not be installed. We are no longer in that situation and are unlikely to be again. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 19m 1s ⏱️ -4s For more details on these failures, see this check. Results for commit ac44705. ± Comparison against base commit 2dab85a. |
|
Sorry for the extremely long wait. This LGTM.
Are we even willing to support mismatched major Python versions between clients, scheduler, and workers? Everything will fall apart as soon as you hit a mismatched serializer/deserializer couple, a renamed implementation module, etc. etc. |
|
Thanks @crusaderky! 🙏
Not at all. Was unsure whether we were ready for this change based on the discussion so far. Happy to hear that we are.
Yeah think we were being more careful while still supporting Python 3.7 and optionally It's a good point on On a related point, even pickling directly to disk via protocol 5 sees some performance improvements (better memory usage) ( pandas-dev/pandas#37056 ). |
This was specified as Python 3.7 had native pickle protocol 4 support with the option for pickle protocol 5 via a backport package. If there were issues with the availability of the backport package, using the latest pickle protocol might run into issues.
However Python 3.8+ has native pickle protocol 5 support. No backport required. So the kinds of issues Python 3.7 had won't occur. Given this, just use the latest pickle protocol.
pre-commit run --all-files