refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic#13860
refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic#13860aditi-pandit wants to merge 1 commit intomainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
40e7bc7 to
1f9be71
Compare
JkSelf
left a comment
There was a problem hiding this comment.
LGTM. Except one small nit. Thanks.
| } | ||
|
|
||
| void TopNRowNumber::setupNextOutput(const RowVectorPtr& output) { | ||
| auto resetNextRank = [this]() { nextRank_ = 1; }; |
There was a problem hiding this comment.
Why you reset the nextRank_ = 1 and use <= to replace < before?
There was a problem hiding this comment.
@jinchengchenghh I will be updating this logic for both rank and dense_rank computation in the next PR.
In rank and dense_rank the result value changes only if order by keys are different between current row and next. By setting starting rank as 0 and then incrementing in place if different from adjacent row is complicated logic...Its simpler to set current rank as 1 and increment it when advancing to the next row. So I changed how the result value is computed.
There was a problem hiding this comment.
I'm not too familiar with this operator.
We can use 1 because we have at least one row result (if it doesn't match any other row) that has rank/row_number 1.
1f9be71 to
4ae0ede
Compare
velox/exec/TopNRowNumber.h
Outdated
|
|
||
| // Row number for the first row in the next output batch from the spiller. | ||
| int32_t nextRowNumber_{0}; | ||
| // Row number ( or rank or dense_rank in the future) for the next row being |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Approve for the code, I'm not familiar with TopN
4ae0ede to
f204bc7
Compare
f204bc7 to
7de74c0
Compare
|
@xiaoxmeng : Please can you help with the review. Have bunch of approvals from maintainers already. |
xiaoxmeng
left a comment
There was a problem hiding this comment.
@aditi-pandit LGTM % minors. Thanks!
| vector_size_t index, | ||
| SpillMergeStream* next) { | ||
| const SpillMergeStream* next, | ||
| vector_size_t startColumn, |
There was a problem hiding this comment.
Why we pass startColumn and endColumn but not 0, numPartitionKeys_? And why call this as compareSpillRowColumns?
There was a problem hiding this comment.
@xiaoxmeng : This code will be enhanced to do the topn optimization for rank functions as well... and those require to compare the rows on the order_by columns. So I abstracted the function for compareSpillRowColumns which will be reused to compare the order by columns as well.
velox/exec/TopNRowNumber.cpp
Outdated
| if (index > 0 && isNewPartition(output, index, next)) { | ||
| rowNumber = 0; | ||
| if (index > 0) { | ||
| computeNextRankInSpill(output, index, next); |
There was a problem hiding this comment.
s/computeNextRankInSpill/computeNextRankFromSpill/
| void TopNRowNumber::setupNextOutput(const RowVectorPtr& output) { | ||
| auto resetNextRank = [this]() { nextRank_ = 1; }; | ||
|
|
||
| auto* lookAhead = merge_->next(); |
There was a problem hiding this comment.
Let's just call this next?
There was a problem hiding this comment.
The loop below uses the next variable naming at line 595, so I stuck with the original lookAhead name here.
There was a problem hiding this comment.
I mean let's just use next why have two variables: next and lookAhead?
83db96b to
9baa46f
Compare
|
@xiaoxmeng : Have addressed your review comments. PTAL. |
… for TopNRowNumber output loop
9baa46f to
cb72f94
Compare
|
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this in D83028157. |
|
@xiaoxmeng merged this pull request in 9578779. |
Last refactoring towards #11554