Conversation
|
@JamesRTaylor were you able to verify the effectiveness of this patch in staging? |
|
Next step post-merge is to deploy this to next-staging on top of the branch I've already been working with so that we can test perf. |
|
Got it. This is a big patch and I haven't been totally up to speed with all the upstream PRs and discussions, but looking into this. Can you give a tl;dr; of some of the key areas of the code to focus on? And which parts have we considerably changed from the upstream PR? |
|
👍 I gave this a cursory look but would need quite a bit more time to familiarize with the code and I don't want to block on that. Some of the comments I have would be better suited for the OSS patch, like missing tests and questions about the API/object model, so they might not be that constructive here. I recommend putting it in staging and we can test it. |
|
The main part of the patch is threading through a map of NestedColumn objects to the Hive connector and then using that in ParquetPageSource to only read what's projected. I'll get another PR together for prestoinfra to push this to next staging so we can evaluate the performance. |
* Pushdown dereference expression to paruqet reader (qqibrow) * Fix TestMergeNestedColumns
* Pushdown dereference expression to paruqet reader (qqibrow) * Fix TestMergeNestedColumns
Please review, @billonahill. I've adapted the patch from qqibrow here to be on top of Presto 309. I'm hoping this will address the perf issue for pruning of nested fields and close the gap between Presto & BQ.
cc @puneetjaiswal