Add support for reading orc column statistics#6588
Add support for reading orc column statistics#6588wypb wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
8b689ff to
70fa5e9
Compare
| } | ||
|
|
||
| std::unordered_map<uint32_t, proto::ColumnStatistics> | ||
| std::unordered_map<uint32_t, ColumnStatisticsWrapper> |
There was a problem hiding this comment.
We have some problem here, the wrapper is just a reference but here we are returning actual values. If ORC does not need it, can we keep the signature and make it DWRF only (with a check)?
| for (auto node = 0; node < typesSize; node++) { | ||
| if (columnSelector_.shouldReadNode(node)) { | ||
| stats[node] = proto::ColumnStatistics(); | ||
| const proto::ColumnStatistics cs = proto::ColumnStatistics(); |
There was a problem hiding this comment.
@Yuhta Thank you for your review, It should be that temporary local variables are used here, I will modify it.
7dd28a8 to
9c15412
Compare
| auto cs = | ||
| google::protobuf::Arena::CreateMessage<proto::ColumnStatistics>( | ||
| stripeReaderBase_.getReader().arena()); | ||
| stats.emplace(node, ColumnStatisticsWrapper(cs)); |
There was a problem hiding this comment.
We still have the same problem, I think for this function you should keep it as it is, and create another one returning proto::orc::ColumnStatistics if it is required by ORC as well. Wrappers can only be created on call sites.
There was a problem hiding this comment.
@Yuhta I looked at the code again, ORC does not use this method. I'll keep the previous code logic. Thank you for testing this.
2af7412 to
c5c5986
Compare
| } | ||
|
|
||
| const ::facebook::velox::dwrf::proto::ColumnStatistics& statistics( | ||
| const ::facebook::velox::dwrf::proto::ColumnStatistics& statisticsByIndex( |
There was a problem hiding this comment.
A better name is dwrfStatistics
0d21b06 to
4b3960e
Compare
|
Hi @Yuhta Do you have any other comment for this PR? thank you. |
|
Hi @Yuhta PTAL, Thank you. |
6b3e52d to
5deef28
Compare
0fbf1b2 to
d886334
Compare
|
HI @kevinwilfong I moved the implementation of |
5ec1137 to
cfc7d0c
Compare
|
@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@kevinwilfong merged this pull request in 0c0a973. |
|
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Currently, the dwrf module only supports reading the column statistics of dwrf format, and does not support reading the column statistics of orc format. When I use velox to read the orc table, the following exception occurs:
with this pr, we support read orc column statistics, than we can read orc data through velox:
CC: @Yuhta