-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Support property filter pushdown by utilizing payload file formats #178
Conversation
🎊 PR Preview 282e1d8 has been successfully built and deployed to https://alibaba-graphar-build-pr-178.surge.sh 🤖 By surge-preview |
PTAL :) @lixueclaire @acezen |
Good work! thanks for the change, we will take a look. |
Well done! Thank you for completing this prototype. I agree with you that we could create a wrap with the |
Good catch! Let me take a look. |
You could fix the header files included refer to arrow_chunk_writer |
d253175
to
6e5370e
Compare
0c8832e
to
8911968
Compare
9a090dd
to
3d6d8d5
Compare
virtual ArrowExpression Evaluate() = 0; | ||
}; | ||
|
||
class ExpressionProperty : public Expression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add some brief descriptions for the classes defined in this file.
Besides, if you think it is necessary, you can add new classes or methods into API references by updating this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greate tips! Let me take a look.
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @acezen , do you have further comments on this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work.
Hi, @Ziy1-Tan, If you think the PR is ready, you can remove the WIP tag. |
Good. Let me add more uts for reader. |
Signed-off-by: Ziy1-Tan <[email protected]>
Signed-off-by: Ziy1-Tan <[email protected]>
This PR is about C++ SDK for OSPP 2023
Issue number: #98.
You can find more detail about this feature here
Steps
Proposed changes
Filter pushdown is a performance optimization that prunes extraneous data from a Parquet or ORC file to reduce the amount of data that GraphAr scans and reads when a query on a file contains a filter expression. We want to support this feature for GraphAr C++ Reader SDK.
Forms of pushdown
Design
We enable pushdown in these ways:
Reader
:ConstructVertexPropertyArrowChunkReader(options)
reader.Filter(...)
reader.Select(...)
Implementation
Pushdown options are wrapped into
FilterOptions
:Select column
firstName
,lastName
from files wheregender = female
Nested expressions are also supported, e.g.
"2012-06-02T04:30:44.526+0000" < reationDate and creationDate = creationDate
Result:
Scope
arrow::compute::Expression
VertexPropertyArrowChunkReader
GetRange()
Filter(filter)
Select(col_names)
VertexPropertyArrowChunkReader
ConstructVertexPropertyArrowChunkReader(..., filter_options)
ConstructAdjListPropertyArrowChunkReader(..., filter_options)
ConstructVerticesCollection(..., filter_options)
ConstructEdgesCollection(..., filter_options)
TBD
Filter()
andSelect()
, i.g. match the property name