Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL by aditi-pandit · Pull Request #8800 · facebookincubator/velox

aditi-pandit · 2024-02-19T21:47:41Z

$file_size and $file_modified_time are queryable synthesized columns for Hive tables in Presto. Spark also has bunch of such queryable synthesized columns (#7880).

The columns are passed by the co-ordinator to the worker in the HiveSplit.

i) Velox HiveSplit needed to be enhanced to get filesize and file_modified_time metadata in a generic map data-structure of (column name, value) from Prestissimo.
ii) These values should be populated by SplitReader into TableScanOperator output buffers.

This also needs a Prestissimo change to populate the HiveSplit with this info sent in the fragment prestodb/presto#21965

Fixes prestodb/presto#21867

@gaoyangxiaozhu will have a follow up PR on the Spark integration.

netlify · 2024-02-19T21:48:01Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`86ab66c`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65e63b3958c088000893f7ad

gaoyangxiaozhu · 2024-02-21T06:00:53Z

hey @aditi-pandit I also have a similar PR #7880 to let velox support query spark engine supported file metadata for hiveTables (file_path, file_size, file_name, file_modify_time, file_block_start, file_block_end) etc.

Maybe we can work together to see if can let the change support for both engine presto and spark ?

velox/exec/tests/TableScanTest.cpp

gaoyangxiaozhu · 2024-02-26T06:34:24Z

hey @aditi-pandit may change the PR title to Allow info columns for HiveSplits to be queried in SQL

velox/connectors/hive/SplitReader.cpp

velox/connectors/hive/iceberg/IcebergSplit.cpp

velox/connectors/hive/SplitReader.cpp

aditi-pandit · 2024-03-04T21:31:01Z

@Yuhta @majetideepak : PTAL.

facebook-github-bot · 2024-03-04T22:32:52Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aditi-pandit · 2024-03-05T16:59:21Z

@Yuhta : Do you need help with the linter error ? Please can you give me more info about it.

facebook-github-bot · 2024-03-05T18:13:20Z

@Yuhta merged this pull request in b9afa14.

…ified_time' to be queried in SQL (facebookincubator#8800)" This reverts commit b9afa14.

…file_modified_time' to be queried in SQL (facebookincubator#8800)"" This reverts commit d3dc172.

tdcmeehan · 2024-03-07T14:43:05Z

velox/connectors/hive/HiveConnectorUtil.cpp

    //
-    // Unfortunately, Presto happens to specify a filter for $path or
-    // $bucket column. This filter is redundant and needs to be removed.
+    // Unfortunately, Presto happens to specify a filter for $path, $file_size,


Just wondering if there is there an issue for this on Presto side?

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2024

aditi-pandit force-pushed the hive_file_metadata branch from 38310c0 to b7be310 Compare February 19, 2024 21:51

aditi-pandit requested review from Yuhta, majetideepak and yingsu00 February 19, 2024 21:53

aditi-pandit mentioned this pull request Feb 19, 2024

[native] Hidden columns missing in Prestissimo Hive Connector prestodb/presto#21867

Closed

aditi-pandit force-pushed the hive_file_metadata branch from b7be310 to 340ffbe Compare February 19, 2024 22:27

gaoyangxiaozhu mentioned this pull request Feb 21, 2024

Add file metadata columns support for spark parquet #7880

Closed

aditi-pandit commented Feb 22, 2024

View reviewed changes

velox/exec/tests/TableScanTest.cpp Show resolved Hide resolved

aditi-pandit force-pushed the hive_file_metadata branch from 340ffbe to 1974a77 Compare February 23, 2024 21:30

aditi-pandit changed the title ~~Allow '$file_size' and '$file_modified_time' for HiveSplits to be queried in SQL~~ Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL Feb 27, 2024

aditi-pandit force-pushed the hive_file_metadata branch from 1974a77 to a67125c Compare February 27, 2024 03:55

aditi-pandit requested a review from mbasmanova February 27, 2024 16:16

majetideepak reviewed Feb 29, 2024

View reviewed changes

velox/connectors/hive/SplitReader.cpp Outdated Show resolved Hide resolved

velox/connectors/hive/SplitReader.cpp Outdated Show resolved Hide resolved

velox/connectors/hive/iceberg/IcebergSplit.cpp Outdated Show resolved Hide resolved

Yuhta reviewed Feb 29, 2024

View reviewed changes

velox/connectors/hive/SplitReader.cpp Outdated Show resolved Hide resolved

velox/connectors/hive/SplitReader.cpp Outdated Show resolved Hide resolved

velox/connectors/hive/SplitReader.cpp Outdated Show resolved Hide resolved

aditi-pandit force-pushed the hive_file_metadata branch from a67125c to 43fce91 Compare February 29, 2024 23:06

Allow queries for '$file_size' and '$file_modified_time' for HiveSplits

86ab66c

aditi-pandit force-pushed the hive_file_metadata branch from 43fce91 to 86ab66c Compare March 4, 2024 21:20

facebook-github-bot closed this in b9afa14 Mar 5, 2024

facebook-github-bot added the Merged label Mar 5, 2024

aditi-pandit deleted the hive_file_metadata branch March 5, 2024 21:22

aditi-pandit mentioned this pull request Mar 6, 2024

[native] Advance Velox prestodb/presto#22091

Closed

gaoyangxiaozhu mentioned this pull request Mar 7, 2024

[VL] parquet file metadata columns support in velox apache/gluten#3870

Merged

philo-he added a commit to philo-he/velox that referenced this pull request Mar 7, 2024

Revert "Allow HiveSplit info columns like '$file_size' and '$file_mod…

d3dc172

…ified_time' to be queried in SQL (facebookincubator#8800)" This reverts commit b9afa14.

philo-he added a commit to philo-he/velox that referenced this pull request Mar 7, 2024

Revert "Revert "Allow HiveSplit info columns like '$file_size' and '$…

fbf0636

…file_modified_time' to be queried in SQL (facebookincubator#8800)"" This reverts commit d3dc172.

tdcmeehan reviewed Mar 7, 2024

View reviewed changes

aditi-pandit mentioned this pull request Apr 29, 2024

Document types of $file_size and $file_modified_time prestodb/presto#22627

Closed

aditi-pandit mentioned this pull request Aug 19, 2024

Apply info column filters during split generation prestodb/presto#23411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL#8800

Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL#8800
aditi-pandit wants to merge 1 commit intomainfrom
hive_file_metadata

aditi-pandit commented Feb 19, 2024 •

edited

Loading

Uh oh!

netlify bot commented Feb 19, 2024 •

edited

Loading

Uh oh!

gaoyangxiaozhu commented Feb 21, 2024

Uh oh!

Uh oh!

gaoyangxiaozhu commented Feb 26, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aditi-pandit commented Mar 4, 2024

Uh oh!

facebook-github-bot commented Mar 4, 2024

Uh oh!

aditi-pandit commented Mar 5, 2024

Uh oh!

facebook-github-bot commented Mar 5, 2024

Uh oh!

tdcmeehan Mar 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

aditi-pandit commented Feb 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Feb 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

gaoyangxiaozhu commented Feb 21, 2024

Uh oh!

Uh oh!

gaoyangxiaozhu commented Feb 26, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aditi-pandit commented Mar 4, 2024

Uh oh!

facebook-github-bot commented Mar 4, 2024

Uh oh!

aditi-pandit commented Mar 5, 2024

Uh oh!

facebook-github-bot commented Mar 5, 2024

Uh oh!

tdcmeehan Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

aditi-pandit commented Feb 19, 2024 •

edited

Loading

netlify bot commented Feb 19, 2024 •

edited

Loading