Skip to content

Make WriterOptions serializable#251

Merged
ZacBlanco merged 5 commits intobytedance:mainfrom
yingsu00:connector_refactor_1
Mar 6, 2026
Merged

Make WriterOptions serializable#251
ZacBlanco merged 5 commits intobytedance:mainfrom
yingsu00:connector_refactor_1

Conversation

@yingsu00
Copy link
Copy Markdown
Contributor

@yingsu00 yingsu00 commented Feb 22, 2026

What problem does this PR solve?

First PR for #250

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Make WriterOptions serializable. This is required for future refactors

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

N/A

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@yingsu00 yingsu00 force-pushed the connector_refactor_1 branch from 38fe28e to a2b6b76 Compare February 22, 2026 15:58
@yingsu00 yingsu00 marked this pull request as draft February 23, 2026 15:34
@yingsu00 yingsu00 marked this pull request as draft February 23, 2026 15:34
@yingsu00 yingsu00 marked this pull request as draft February 23, 2026 15:34
@yingsu00 yingsu00 force-pushed the connector_refactor_1 branch 2 times, most recently from 3088ab3 to 13d9bdd Compare March 2, 2026 07:59
@yingsu00 yingsu00 marked this pull request as ready for review March 2, 2026 14:43
@yingsu00
Copy link
Copy Markdown
Contributor Author

yingsu00 commented Mar 2, 2026

@ZacBlanco Could you please review this? Thanks!

@ZacBlanco ZacBlanco force-pushed the connector_refactor_1 branch from 13d9bdd to 922cafb Compare March 2, 2026 18:32
Copy link
Copy Markdown
Collaborator

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributions! Just a few minor comments. The main changes for serialization look OK to me. Can we also update the PR title/description and commits to reflect the fact that SpillConfig serde is also added?

}

void WriterOptions::registerSerDe() {
bolt::Type::registerSerDe();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call Type::registerSerDe() in this method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call Type::registerSerDe() in this method?

WriterOptions::deserialize (Options.cpp:157) calls:

  opts->schema = ISerializable::deserialize<bolt::Type>(*p);

I know other Classes like HiveColumnHandle::registerSerDe() didn't add this (HiveColumnHandle also has TypePtr members), but it relies on every caller to call Type::registerSerDe() pre-hand. I don't think it's a good pattern to follow. It's a hidden dependency that the caller must know the internal implementation detail that HiveColumnHandle::create calls ISerializable::deserialize. That leaks the internals and makes the API easy to misuse: forgetting Type::registerSerDe() causes a silent runtime failure, not a compile error. IMHO The better design is for registerSerDe() to chain its own dependencies internally, which is exactly what WriterOptions::registerSerDe() does with bolt::Type::registerSerDe(). The call is idempotent (it just re-registers the same entry), so there's no harm in calling it redundantly.

I forgot to add registry.Register("SpillConfig", SpillConfig::deserialize); last time. In this update it was added too.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this pattern where the serializers and deserializers need to have their dependencies explicitly written in the registration methods. I am open to re-designing it at a later point though. For this PR it is fine.

I would probably prefer if we had something like a static initialization method that was executed at library load time/program start time. Registering the serializers/deserializers automatically makes maintenance a little bit easier on developers. Also, this way as long as the necessary library(s) are loaded/compiled properly all the registry entries should exist. There are some downides to this approach too but we can discuss later. This is fine for now

@yingsu00 yingsu00 force-pushed the connector_refactor_1 branch from 922cafb to e78371f Compare March 3, 2026 15:05
Copy link
Copy Markdown
Collaborator

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this improvement!

@yingsu00 yingsu00 force-pushed the connector_refactor_1 branch from 80129a0 to 5da4a88 Compare March 6, 2026 03:02
@frankobe
Copy link
Copy Markdown
Collaborator

frankobe commented Mar 6, 2026

@yingsu00 FYI I update your branch with main to pick up some CI improvement to ensure test stability.

@ZacBlanco ZacBlanco added this pull request to the merge queue Mar 6, 2026
Merged via the queue into bytedance:main with commit 7692925 Mar 6, 2026
7 checks passed
@yingsu00
Copy link
Copy Markdown
Contributor Author

yingsu00 commented Mar 7, 2026

@ZacBlanco

I would probably prefer if we had something like a static initialization method that was executed at library load time/program start time.

I agree. Not only the serializers, but the different file formats, and connectors shall also register themselves when loaded. In facebookincubator/velox#14090 I tried to make file reader/writers register themselves using a static initialization function. This had some issue on Linux, but we can revisit them again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants