Skip to content

refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic#13860

Closed
aditi-pandit wants to merge 1 commit intomainfrom
topn_get_output
Closed

refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic#13860
aditi-pandit wants to merge 1 commit intomainfrom
topn_get_output

Conversation

@aditi-pandit
Copy link
Copy Markdown
Collaborator

@aditi-pandit aditi-pandit commented Jun 24, 2025

Last refactoring towards #11554

@netlify
Copy link
Copy Markdown

netlify bot commented Jun 24, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit cb72f94
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/68ced6267eb178000850716f

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2025
@aditi-pandit aditi-pandit changed the title refactor(TopNRowNumber): Abstract computeRankInMemory and computeRankInSpill functions in getOutput() logic refactor(TopNRowNumber): Abstract computeNextRankInMemory and computeNextRankInSpill functions in getOutput() logic Jun 24, 2025
@aditi-pandit aditi-pandit changed the title refactor(TopNRowNumber): Abstract computeNextRankInMemory and computeNextRankInSpill functions in getOutput() logic refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic Jun 24, 2025
@aditi-pandit aditi-pandit marked this pull request as draft June 24, 2025 06:37
@aditi-pandit aditi-pandit changed the title refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic [Do not review] refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic Jun 24, 2025
@aditi-pandit aditi-pandit changed the title [Do not review] refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic refactor(TopNRowNumber): Abstract computeNextRankInMemory(InSpill) functions in getOutput() logic Jun 24, 2025
@aditi-pandit aditi-pandit marked this pull request as ready for review June 24, 2025 18:10
Copy link
Copy Markdown
Collaborator

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Except one small nit. Thanks.

}

void TopNRowNumber::setupNextOutput(const RowVectorPtr& output) {
auto resetNextRank = [this]() { nextRank_ = 1; };
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you reset the nextRank_ = 1 and use <= to replace < before?

Copy link
Copy Markdown
Collaborator Author

@aditi-pandit aditi-pandit Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh I will be updating this logic for both rank and dense_rank computation in the next PR.

In rank and dense_rank the result value changes only if order by keys are different between current row and next. By setting starting rank as 0 and then incrementing in place if different from adjacent row is complicated logic...Its simpler to set current rank as 1 and increment it when advancing to the next row. So I changed how the result value is computed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with this operator.
We can use 1 because we have at least one row result (if it doesn't match any other row) that has rank/row_number 1.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr : Yes.


// Row number for the first row in the next output batch from the spiller.
int32_t nextRowNumber_{0};
// Row number ( or rank or dense_rank in the future) for the next row being
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( or -> (or

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Copy Markdown
Collaborator

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for the code, I'm not familiar with TopN

@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

@xiaoxmeng : Please can you help with the review. Have bunch of approvals from maintainers already.

Copy link
Copy Markdown
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit LGTM % minors. Thanks!

vector_size_t index,
SpillMergeStream* next) {
const SpillMergeStream* next,
vector_size_t startColumn,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we pass startColumn and endColumn but not 0, numPartitionKeys_? And why call this as compareSpillRowColumns?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng : This code will be enhanced to do the topn optimization for rank functions as well... and those require to compare the rows on the order_by columns. So I abstracted the function for compareSpillRowColumns which will be reused to compare the order by columns as well.

if (index > 0 && isNewPartition(output, index, next)) {
rowNumber = 0;
if (index > 0) {
computeNextRankInSpill(output, index, next);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/computeNextRankInSpill/computeNextRankFromSpill/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void TopNRowNumber::setupNextOutput(const RowVectorPtr& output) {
auto resetNextRank = [this]() { nextRank_ = 1; };

auto* lookAhead = merge_->next();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just call this next?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop below uses the next variable naming at line 595, so I stuck with the original lookAhead name here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean let's just use next why have two variables: next and lookAhead?

@aditi-pandit aditi-pandit force-pushed the topn_get_output branch 2 times, most recently from 83db96b to 9baa46f Compare September 10, 2025 05:02
@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

@xiaoxmeng : Have addressed your review comments. PTAL.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this in D83028157.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@xiaoxmeng merged this pull request in 9578779.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants