Skip to content

[Native] Allow '$file_size' and '$file_modified_time' for HiveSplits in SQL queries#21965

Merged
aditi-pandit merged 1 commit intomasterfrom
hive_file_size
Mar 8, 2024
Merged

[Native] Allow '$file_size' and '$file_modified_time' for HiveSplits in SQL queries#21965
aditi-pandit merged 1 commit intomasterfrom
hive_file_size

Conversation

@aditi-pandit
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit commented Feb 19, 2024

Fixes #21867

Test Plan

e2e tests

== NO RELEASE NOTE ==

@aditi-pandit aditi-pandit requested a review from a team as a code owner February 19, 2024 21:54
@aditi-pandit aditi-pandit marked this pull request as draft February 19, 2024 21:54
facebook-github-bot pushed a commit to facebookincubator/velox that referenced this pull request Mar 5, 2024
…me' to be queried in SQL (#8800)

Summary:
$file_size and $file_modified_time are queryable synthesized columns for Hive tables in Presto. Spark also has bunch of such queryable synthesized columns (#7880).

The columns are passed by the co-ordinator to the worker in the HiveSplit.

i) Velox HiveSplit needed to be enhanced to get filesize and file_modified_time metadata in a generic map data-structure of (column name, value) from Prestissimo.
ii) These values should be populated by SplitReader into TableScanOperator output buffers.

This also needs a Prestissimo change to populate the HiveSplit with this info sent in the fragment prestodb/presto#21965

Fixes prestodb/presto#21867

gaoyangxiaozhu will have a follow up PR on the Spark integration.

Pull Request resolved: #8800

Reviewed By: mbasmanova

Differential Revision: D54512245

Pulled By: Yuhta

fbshipit-source-id: 190a97f9fcb1e869fff82e0a2264d57f9915376e
@aditi-pandit aditi-pandit force-pushed the hive_file_size branch 2 times, most recently from fd7414c to 3173c67 Compare March 7, 2024 07:55
@aditi-pandit aditi-pandit marked this pull request as ready for review March 7, 2024 07:56
@aditi-pandit
Copy link
Copy Markdown
Contributor Author

@Yuhta : This is the Prestissimo side change for info columns. PTAL.

@aditi-pandit aditi-pandit force-pushed the hive_file_size branch 2 times, most recently from fcbb73c to 0d4a6f3 Compare March 7, 2024 08:12
Comment on lines 960 to 966
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (const auto& entry : hiveLayout->predicateColumns) {
if (toHiveColumnType(entry.second.columnType) ==
velox::connector::hive::HiveColumnHandle::ColumnType::kSynthesized) {
if (assignments.count(entry.first) == 0) {
assignments.emplace(
entry.first, toColumnHandle(&entry.second, typeParser));
}
for (const auto& [name, col] : hiveLayout->predicateColumns) {
if (toHiveColumnType(col.columnType) ==
velox::connector::hive::HiveColumnHandle::ColumnType::kSynthesized) {
VELOX_CHECK(assignments.emplace(name, toColumnHandle(&col, typeParser)).second, "Duplicate assignment: {}", name);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta : We want to add these to the assignments list only if they are not present already. Its not quite the same code.

@aditi-pandit
Copy link
Copy Markdown
Contributor Author

@Yuhta : PTAL.

protocol::ColumnType columnType,
const protocol::ColumnHandle& column) {
if (toHiveColumnType(columnType) ==
velox::connector::hive::HiveColumnHandle::ColumnType::kSynthesized) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: connector::hive::HiveColumnHandle::ColumnType should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native] Hidden columns missing in Prestissimo Hive Connector

3 participants