Make WriterOptions serializable#251
Conversation
38fe28e to
a2b6b76
Compare
3088ab3 to
13d9bdd
Compare
|
@ZacBlanco Could you please review this? Thanks! |
13d9bdd to
922cafb
Compare
ZacBlanco
left a comment
There was a problem hiding this comment.
Thanks for the contributions! Just a few minor comments. The main changes for serialization look OK to me. Can we also update the PR title/description and commits to reflect the fact that SpillConfig serde is also added?
| } | ||
|
|
||
| void WriterOptions::registerSerDe() { | ||
| bolt::Type::registerSerDe(); |
There was a problem hiding this comment.
Why do we call Type::registerSerDe() in this method?
There was a problem hiding this comment.
Why do we call
Type::registerSerDe()in this method?
WriterOptions::deserialize (Options.cpp:157) calls:
opts->schema = ISerializable::deserialize<bolt::Type>(*p);
I know other Classes like HiveColumnHandle::registerSerDe() didn't add this (HiveColumnHandle also has TypePtr members), but it relies on every caller to call Type::registerSerDe() pre-hand. I don't think it's a good pattern to follow. It's a hidden dependency that the caller must know the internal implementation detail that HiveColumnHandle::create calls ISerializable::deserialize. That leaks the internals and makes the API easy to misuse: forgetting Type::registerSerDe() causes a silent runtime failure, not a compile error. IMHO The better design is for registerSerDe() to chain its own dependencies internally, which is exactly what WriterOptions::registerSerDe() does with bolt::Type::registerSerDe(). The call is idempotent (it just re-registers the same entry), so there's no harm in calling it redundantly.
I forgot to add registry.Register("SpillConfig", SpillConfig::deserialize); last time. In this update it was added too.
There was a problem hiding this comment.
I'm not a huge fan of this pattern where the serializers and deserializers need to have their dependencies explicitly written in the registration methods. I am open to re-designing it at a later point though. For this PR it is fine.
I would probably prefer if we had something like a static initialization method that was executed at library load time/program start time. Registering the serializers/deserializers automatically makes maintenance a little bit easier on developers. Also, this way as long as the necessary library(s) are loaded/compiled properly all the registry entries should exist. There are some downides to this approach too but we can discuss later. This is fine for now
922cafb to
e78371f
Compare
ZacBlanco
left a comment
There was a problem hiding this comment.
LGTM, thanks for this improvement!
d814a35 to
80129a0
Compare
80129a0 to
5da4a88
Compare
|
@yingsu00 FYI I update your branch with |
I agree. Not only the serializers, but the different file formats, and connectors shall also register themselves when loaded. In facebookincubator/velox#14090 I tried to make file reader/writers register themselves using a static initialization function. This had some issue on Linux, but we can revisit them again. |
What problem does this PR solve?
First PR for #250
Type of Change
Description
Make WriterOptions serializable. This is required for future refactors
Performance Impact
No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
Positive Impact: I have run benchmarks.
Click to view Benchmark Results
Negative Impact: Explained below (e.g., trade-off for correctness).
Release Note
N/A
Checklist (For Author)
Breaking Changes
No
Yes (Description: ...)
Click to view Breaking Changes