fix: Extract serde params from additionalTableParameters in CTAS#27340
fix: Extract serde params from additionalTableParameters in CTAS#27340aditi-pandit merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideThis PR ensures CTAS operations in Prestissimo propagate Hive serde-related parameters from HiveOutputTableHandle.additionalTableParameters into the native HiveInsertTableHandle, including textfile delimiters and nimble.* options, and adds focused tests to validate the new behavior and that non-serde table parameters are ignored. Sequence diagram for CTAS serde parameter extraction and propagationsequenceDiagram
actor User
participant PrestoCoordinator
participant HivePrestoToVeloxConnector
participant HiveInsertTableHandle
participant HiveMetastore
User->>PrestoCoordinator: Submit CTAS statement
PrestoCoordinator->>PrestoCoordinator: Plan CTAS
PrestoCoordinator->>HivePrestoToVeloxConnector: Build HiveOutputTableHandle
Note over PrestoCoordinator,HivePrestoToVeloxConnector: HiveOutputTableHandle.additionalTableParameters includes
Note over PrestoCoordinator,HivePrestoToVeloxConnector: field.delim, escape.delim, nimble.* and other table params
PrestoCoordinator->>HivePrestoToVeloxConnector: toVeloxInsertTableHandle(HiveOutputTableHandle)
HivePrestoToVeloxConnector->>HivePrestoToVeloxConnector: toHiveColumns(inputColumns)
HivePrestoToVeloxConnector->>HivePrestoToVeloxConnector: extractSerdeParameters(additionalTableParameters)
HivePrestoToVeloxConnector-->>HivePrestoToVeloxConnector: serdeParameters map
HivePrestoToVeloxConnector->>HiveInsertTableHandle: new HiveInsertTableHandle(inputColumns, locationHandle, tableStorageFormat, bucketProperty, compressionKind, serdeParameters)
HiveInsertTableHandle->>HiveMetastore: Create table with serdeParameters
HiveMetastore-->>HiveInsertTableHandle: Table created with correct serde config
HiveInsertTableHandle-->>PrestoCoordinator: Insert handle ready
PrestoCoordinator-->>User: CTAS succeeds with correct serde settings
Updated class diagram for HivePrestoToVeloxConnector serde parameter handlingclassDiagram
class HivePrestoToVeloxConnector {
+toVeloxInsertTableHandle(hiveOutputTableHandle, typeParser) std::unique_ptr~ConnectorInsertTableHandle~
}
class HiveOutputTableHandle {
+inputColumns : std::vector~HiveColumn~
+locationHandle : LocationHandle
+tableStorageFormat : TableStorageFormat
+bucketProperty : std::optional~BucketProperty~
+compressionCodec : CompressionCodec
+additionalTableParameters : std::map~std::string, std::string~
}
class HiveInsertTableHandle {
+HiveInsertTableHandle(
inputColumns : std::vector~HiveColumn~,
locationHandle : LocationHandle,
tableStorageFormat : TableStorageFormat,
bucketProperty : std::optional~BucketProperty~,
compressionKind : std::optional~FileCompressionKind~,
serdeParameters : std::unordered_map~std::string, std::string~
)
}
class ExtractSerdeParametersUtil {
+extractSerdeParameters(tableParameters : std::map~std::string, std::string~) std::unordered_map~std::string, std::string~
}
HivePrestoToVeloxConnector ..> HiveOutputTableHandle : consumes
HivePrestoToVeloxConnector ..> HiveInsertTableHandle : produces
HivePrestoToVeloxConnector ..> ExtractSerdeParametersUtil : calls
ExtractSerdeParametersUtil ..> HiveOutputTableHandle : reads additionalTableParameters
HiveInsertTableHandle ..> ExtractSerdeParametersUtil : receives serdeParameters
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
extractSerdeParameters, consider hoistingkNimblePrefixout of the loop (and possibly out of the function) so it isn’t reconstructed on every iteration and is clearly grouped withkSerdeKeysas part of the serde-filter configuration. - The three CTAS serde tests build
HiveOutputTableHandleinstances with a lot of duplicated boilerplate; consider introducing a small helper factory for the common fields so each test focuses only on the parameters it actually varies.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `extractSerdeParameters`, consider hoisting `kNimblePrefix` out of the loop (and possibly out of the function) so it isn’t reconstructed on every iteration and is clearly grouped with `kSerdeKeys` as part of the serde-filter configuration.
- The three CTAS serde tests build `HiveOutputTableHandle` instances with a lot of duplicated boilerplate; consider introducing a small helper factory for the common fields so each test focuses only on the parameters it actually varies.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @kewang1024 for this code. Have one comment as there are special paths for nimble format in this code.
| std::unordered_map<std::string, std::string> serdeParameters; | ||
| for (const auto& [key, value] : tableParameters) { | ||
| static constexpr std::string_view kNimblePrefix{"nimble."}; | ||
| if (kSerdeKeys.count(key) > 0 || |
There was a problem hiding this comment.
Can we separate this into 2 loops ... pick up the serde keys in kSerdeKeys and then add another loop to retain the nimble related serde parameters. Also it would be great to abstract the nimble loop into a separate function, as its not used by non-Meta teams.
There was a problem hiding this comment.
This is good suggestion, thanks! Updated @aditi-pandit, can you help take another look?
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @kewang1024 for this fix.
47a9316 to
6b0057a
Compare
Summary: CTAS on Prestissimo silently drops textfile delimiters and nimble config because the C++ CreateHandle path never reads additionalTableParameters. Fix: Add extractSerdeParameters() that extracts serde keys (field.delim, escape.delim, etc. and nimble.*) from additionalTableParameters and passes them to HiveInsertTableHandle. No protocol changes needed.
| void extractNimbleSerdeParameters( | ||
| const std::map<std::string, std::string>& tableParameters, | ||
| std::unordered_map<std::string, std::string>& serdeParameters) { | ||
| static constexpr std::string_view kNimblePrefix{"nimble."}; | ||
| for (const auto& [key, value] : tableParameters) { | ||
| if (key.compare(0, kNimblePrefix.size(), kNimblePrefix) == 0) { | ||
| serdeParameters[key] = value; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
We probably don't need a separate method for this. Putting it inside the original extract method might be sufficient.
tanjialiang
left a comment
There was a problem hiding this comment.
Thanks. Left some nit
Summary:
CTAS on Prestissimo silently drops textfile delimiters and nimble config because the C++ CreateHandle path never reads additionalTableParameters.
Fix: Add extractSerdeParameters() that extracts serde keys (field.delim, escape.delim, etc. and nimble.*) from additionalTableParameters and passes them to HiveInsertTableHandle. No protocol changes needed.
Summary by Sourcery
Ensure Hive CTAS operations propagate relevant serde parameters from additional table properties into Velox Hive insert handles.
Bug Fixes:
Tests: