Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL#8800
Closed
aditi-pandit wants to merge 1 commit intomainfrom
Closed
Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL#8800aditi-pandit wants to merge 1 commit intomainfrom
aditi-pandit wants to merge 1 commit intomainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
38310c0 to
b7be310
Compare
b7be310 to
340ffbe
Compare
Contributor
|
hey @aditi-pandit I also have a similar PR #7880 to let velox support query spark engine supported file metadata for hiveTables (file_path, file_size, file_name, file_modify_time, file_block_start, file_block_end) etc. Maybe we can work together to see if can let the change support for both engine presto and spark ? |
aditi-pandit
commented
Feb 22, 2024
340ffbe to
1974a77
Compare
Contributor
|
hey @aditi-pandit may change the PR title to |
1974a77 to
a67125c
Compare
Yuhta
reviewed
Feb 29, 2024
a67125c to
43fce91
Compare
43fce91 to
86ab66c
Compare
Collaborator
Author
|
@Yuhta @majetideepak : PTAL. |
Contributor
|
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Collaborator
Author
|
@Yuhta : Do you need help with the linter error ? Please can you give me more info about it. |
Contributor
philo-he
added a commit
to philo-he/velox
that referenced
this pull request
Mar 7, 2024
…ified_time' to be queried in SQL (facebookincubator#8800)" This reverts commit b9afa14.
philo-he
added a commit
to philo-he/velox
that referenced
this pull request
Mar 7, 2024
…file_modified_time' to be queried in SQL (facebookincubator#8800)"" This reverts commit d3dc172.
tdcmeehan
reviewed
Mar 7, 2024
| // | ||
| // Unfortunately, Presto happens to specify a filter for $path or | ||
| // $bucket column. This filter is redundant and needs to be removed. | ||
| // Unfortunately, Presto happens to specify a filter for $path, $file_size, |
Contributor
There was a problem hiding this comment.
Just wondering if there is there an issue for this on Presto side?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
$file_size and $file_modified_time are queryable synthesized columns for Hive tables in Presto. Spark also has bunch of such queryable synthesized columns (#7880).
The columns are passed by the co-ordinator to the worker in the HiveSplit.
i) Velox HiveSplit needed to be enhanced to get filesize and file_modified_time metadata in a generic map data-structure of (column name, value) from Prestissimo.
ii) These values should be populated by SplitReader into TableScanOperator output buffers.
This also needs a Prestissimo change to populate the HiveSplit with this info sent in the fragment prestodb/presto#21965
Fixes prestodb/presto#21867
@gaoyangxiaozhu will have a follow up PR on the Spark integration.