Skip to content

fix: Extract serde params from additionalTableParameters in CTAS#27340

Merged
aditi-pandit merged 1 commit intoprestodb:masterfrom
kewang1024:fix
Mar 17, 2026
Merged

fix: Extract serde params from additionalTableParameters in CTAS#27340
aditi-pandit merged 1 commit intoprestodb:masterfrom
kewang1024:fix

Conversation

@kewang1024
Copy link
Copy Markdown
Collaborator

@kewang1024 kewang1024 commented Mar 16, 2026

Summary:
CTAS on Prestissimo silently drops textfile delimiters and nimble config because the C++ CreateHandle path never reads additionalTableParameters.

Fix: Add extractSerdeParameters() that extracts serde keys (field.delim, escape.delim, etc. and nimble.*) from additionalTableParameters and passes them to HiveInsertTableHandle. No protocol changes needed.

== NO RELEASE NOTE ==

Summary by Sourcery

Ensure Hive CTAS operations propagate relevant serde parameters from additional table properties into Velox Hive insert handles.

Bug Fixes:

  • Preserve textfile serde delimiters and Nimble configuration in Prestissimo CTAS by extracting them from additionalTableParameters into the Hive insert table handle.

Tests:

  • Add connector tests verifying that CTAS propagates textfile serde parameters only for serde-related keys.
  • Add connector tests verifying that CTAS propagates Nimble-specific serde parameters and ignores non-serde table parameters.
  • Add connector test verifying that CTAS handles empty serde parameters without errors.

@kewang1024 kewang1024 requested review from a team as code owners March 16, 2026 05:46
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Mar 16, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 16, 2026

Reviewer's Guide

This PR ensures CTAS operations in Prestissimo propagate Hive serde-related parameters from HiveOutputTableHandle.additionalTableParameters into the native HiveInsertTableHandle, including textfile delimiters and nimble.* options, and adds focused tests to validate the new behavior and that non-serde table parameters are ignored.

Sequence diagram for CTAS serde parameter extraction and propagation

sequenceDiagram
    actor User
    participant PrestoCoordinator
    participant HivePrestoToVeloxConnector
    participant HiveInsertTableHandle
    participant HiveMetastore

    User->>PrestoCoordinator: Submit CTAS statement
    PrestoCoordinator->>PrestoCoordinator: Plan CTAS
    PrestoCoordinator->>HivePrestoToVeloxConnector: Build HiveOutputTableHandle
    Note over PrestoCoordinator,HivePrestoToVeloxConnector: HiveOutputTableHandle.additionalTableParameters includes
    Note over PrestoCoordinator,HivePrestoToVeloxConnector: field.delim, escape.delim, nimble.* and other table params

    PrestoCoordinator->>HivePrestoToVeloxConnector: toVeloxInsertTableHandle(HiveOutputTableHandle)
    HivePrestoToVeloxConnector->>HivePrestoToVeloxConnector: toHiveColumns(inputColumns)
    HivePrestoToVeloxConnector->>HivePrestoToVeloxConnector: extractSerdeParameters(additionalTableParameters)
    HivePrestoToVeloxConnector-->>HivePrestoToVeloxConnector: serdeParameters map

    HivePrestoToVeloxConnector->>HiveInsertTableHandle: new HiveInsertTableHandle(inputColumns, locationHandle, tableStorageFormat, bucketProperty, compressionKind, serdeParameters)

    HiveInsertTableHandle->>HiveMetastore: Create table with serdeParameters
    HiveMetastore-->>HiveInsertTableHandle: Table created with correct serde config
    HiveInsertTableHandle-->>PrestoCoordinator: Insert handle ready
    PrestoCoordinator-->>User: CTAS succeeds with correct serde settings
Loading

Updated class diagram for HivePrestoToVeloxConnector serde parameter handling

classDiagram
    class HivePrestoToVeloxConnector {
        +toVeloxInsertTableHandle(hiveOutputTableHandle, typeParser) std::unique_ptr~ConnectorInsertTableHandle~
    }

    class HiveOutputTableHandle {
        +inputColumns : std::vector~HiveColumn~
        +locationHandle : LocationHandle
        +tableStorageFormat : TableStorageFormat
        +bucketProperty : std::optional~BucketProperty~
        +compressionCodec : CompressionCodec
        +additionalTableParameters : std::map~std::string, std::string~
    }

    class HiveInsertTableHandle {
        +HiveInsertTableHandle(
            inputColumns : std::vector~HiveColumn~,
            locationHandle : LocationHandle,
            tableStorageFormat : TableStorageFormat,
            bucketProperty : std::optional~BucketProperty~,
            compressionKind : std::optional~FileCompressionKind~,
            serdeParameters : std::unordered_map~std::string, std::string~
        )
    }

    class ExtractSerdeParametersUtil {
        +extractSerdeParameters(tableParameters : std::map~std::string, std::string~) std::unordered_map~std::string, std::string~
    }

    HivePrestoToVeloxConnector ..> HiveOutputTableHandle : consumes
    HivePrestoToVeloxConnector ..> HiveInsertTableHandle : produces
    HivePrestoToVeloxConnector ..> ExtractSerdeParametersUtil : calls
    ExtractSerdeParametersUtil ..> HiveOutputTableHandle : reads additionalTableParameters
    HiveInsertTableHandle ..> ExtractSerdeParametersUtil : receives serdeParameters
Loading

File-Level Changes

Change Details Files
Extract serde-related parameters from additionalTableParameters and pass them into HiveInsertTableHandle for CTAS.
  • Introduce a local extractSerdeParameters helper that mirrors Java HiveMetadata.extractSerdeParameters and filters additionalTableParameters to serde keys and nimble.* keys.
  • Use the extracted serde parameters when constructing velox::connector::hive::HiveInsertTableHandle in HivePrestoToVeloxConnector::toVeloxInsertTableHandle, adding a new serdeParameters argument to the constructor call.
  • Include string_view and unordered_set headers to support prefix checks and key-set filtering for serde parameter extraction.
presto-native-execution/presto_cpp/main/connectors/HivePrestoToVeloxConnector.cpp
Add unit tests to verify CTAS correctly propagates textfile and nimble serde parameters and handles empty serde parameters.
  • Extend PrestoToVeloxConnectorTest by including HiveDataSink to support HiveInsertTableHandle usage in tests.
  • Add ctasPassesTextfileSerdeParameters test to assert that textfile delimiter-related keys are propagated while non-serde keys like presto.version are ignored.
  • Add ctasPassesNimbleSerdeParameters test to assert that nimble.* configuration entries are propagated as serde parameters.
  • Add ctasEmptySerdeParameters test to assert that when no additionalTableParameters are set, the resulting HiveInsertTableHandle has an empty serdeParameters map.
presto-native-execution/presto_cpp/main/types/tests/PrestoToVeloxConnectorTest.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In extractSerdeParameters, consider hoisting kNimblePrefix out of the loop (and possibly out of the function) so it isn’t reconstructed on every iteration and is clearly grouped with kSerdeKeys as part of the serde-filter configuration.
  • The three CTAS serde tests build HiveOutputTableHandle instances with a lot of duplicated boilerplate; consider introducing a small helper factory for the common fields so each test focuses only on the parameters it actually varies.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `extractSerdeParameters`, consider hoisting `kNimblePrefix` out of the loop (and possibly out of the function) so it isn’t reconstructed on every iteration and is clearly grouped with `kSerdeKeys` as part of the serde-filter configuration.
- The three CTAS serde tests build `HiveOutputTableHandle` instances with a lot of duplicated boilerplate; consider introducing a small helper factory for the common fields so each test focuses only on the parameters it actually varies.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

zacw7
zacw7 previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kewang1024 for this code. Have one comment as there are special paths for nimble format in this code.

std::unordered_map<std::string, std::string> serdeParameters;
for (const auto& [key, value] : tableParameters) {
static constexpr std::string_view kNimblePrefix{"nimble."};
if (kSerdeKeys.count(key) > 0 ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate this into 2 loops ... pick up the serde keys in kSerdeKeys and then add another loop to retain the nimble related serde parameters. Also it would be great to abstract the nimble loop into a separate function, as its not used by non-Meta teams.

Copy link
Copy Markdown
Collaborator Author

@kewang1024 kewang1024 Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good suggestion, thanks! Updated @aditi-pandit, can you help take another look?

aditi-pandit
aditi-pandit previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kewang1024 for this fix.

zacw7
zacw7 previously approved these changes Mar 17, 2026
Copy link
Copy Markdown
Member

@zacw7 zacw7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the fix

@kewang1024 kewang1024 dismissed stale reviews from zacw7 and aditi-pandit via 47a9316 March 17, 2026 07:57
@kewang1024 kewang1024 force-pushed the fix branch 2 times, most recently from 47a9316 to 6b0057a Compare March 17, 2026 08:26
Summary:
CTAS on Prestissimo silently drops textfile delimiters and nimble config
because the C++ CreateHandle path never reads additionalTableParameters.

Fix: Add extractSerdeParameters() that extracts serde keys (field.delim,
escape.delim, etc. and nimble.*) from additionalTableParameters and
passes them to HiveInsertTableHandle. No protocol changes needed.
Copy link
Copy Markdown
Member

@zacw7 zacw7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the fix

Comment on lines +810 to +819
void extractNimbleSerdeParameters(
const std::map<std::string, std::string>& tableParameters,
std::unordered_map<std::string, std::string>& serdeParameters) {
static constexpr std::string_view kNimblePrefix{"nimble."};
for (const auto& [key, value] : tableParameters) {
if (key.compare(0, kNimblePrefix.size(), kNimblePrefix) == 0) {
serdeParameters[key] = value;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need a separate method for this. Putting it inside the original extract method might be sufficient.

@kewang1024 kewang1024 requested a review from aditi-pandit March 17, 2026 22:44
Copy link
Copy Markdown
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Left some nit

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kewang1024

@aditi-pandit aditi-pandit merged commit f6e05b4 into prestodb:master Mar 17, 2026
82 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants