Skip to content

fix(TopNRowNumber): Rank with peer computation#16190

Closed
aditi-pandit wants to merge 1 commit intofacebookincubator:mainfrom
aditi-pandit:topn_rank_peers
Closed

fix(TopNRowNumber): Rank with peer computation#16190
aditi-pandit wants to merge 1 commit intofacebookincubator:mainfrom
aditi-pandit:topn_rank_peers

Conversation

@aditi-pandit
Copy link
Copy Markdown
Collaborator

When decrementing rank values when outputting to memory, the ranks drop by the number of peers of the lower rank row than the higher one.

Also fix the computation of numTopRankRows to avoid using the priority queue container vector as that does not necessarily maintain the order of ranks.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 1, 2026
@aditi-pandit aditi-pandit requested a review from Copilot February 1, 2026 23:38
@aditi-pandit aditi-pandit changed the title fix(TopNRowNumber) : Fix rank with peer computation fix(TopNRowNumber): Fix rank with peer computation Feb 1, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Feb 1, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5e1f0c0
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/6983b3c37db42500083880e1

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect rank() peer handling when producing in-memory output and makes counting of top-rank peer rows deterministic/correct.

Changes:

  • Adjust in-memory rank decrement logic to account for the peer-count of the next rank group.
  • Rework TopRows::numTopRankRows() to compute peer counts without relying on std::priority_queue’s underlying container ordering.
  • Add a new test case exercising peer (tie) scenarios across multiple limits and configurations.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
velox/exec/tests/TopNRowNumberTest.cpp Adds basicWithPeers to validate correctness under ties/peer rows for all rank functions.
velox/exec/TopNRowNumber.h Updates computeNextRankInMemory signature to allow mutation needed by new peer-count logic.
velox/exec/TopNRowNumber.cpp Fixes in-memory rank decrement behavior and replaces numTopRankRows() implementation to avoid relying on PQ container ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1113 to +1116
auto popAndSaveTopRow = [&]() {
tempTopRankRows.push_back(rows.top());
rows.pop();
};
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda popAndSaveTopRow performs a simple two-line operation that is only called twice in close proximity. Inlining these operations directly would improve code readability and reduce unnecessary abstraction.

Copilot uses AI. Check for mistakes.
@aditi-pandit aditi-pandit force-pushed the topn_rank_peers branch 2 times, most recently from e73749e to 32046d6 Compare February 2, 2026 19:57
@kgpai kgpai requested a review from kagamiori February 3, 2026 17:27
Copy link
Copy Markdown
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, but looks good to me. Thanks for the fix

@@ -578,7 +578,7 @@ TopNRowNumber::TopRows* TopNRowNumber::nextPartition() {

template <core::TopNRowNumberNode::RankFunction TRank>
void TopNRowNumber::computeNextRankInMemory(
const TopRows& partition,
TopRows& partition,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be const anymore? I don't see where you modify this below.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function now uses partition.numTopRankRows() which needs non const variable.

Ideally we should be able to do this logic without mutable structures, but priority_queue doesn't really offer methods beyond top of queue member access. So the only option was to call pop() and check top() and then reinsert elements back.

@majetideepak majetideepak changed the title fix(TopNRowNumber): Fix rank with peer computation fix(TopNRowNumber): Rank with peer computation Feb 3, 2026
Copy link
Copy Markdown
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit thanks for the change. I don't quite understand what's the problem of the peer tracking? Thanks!

@@ -587,21 +587,19 @@ void TopNRowNumber::computeNextRankInMemory(

// This is the logic for rank() and dense_rank().
// If the next row is a peer of the current one, then the rank remains the
// same, but the number of peers is incremented.
// same.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the peer counting has bug? Why we can't use the current tracking and it seems that numTopRankRows is a bit expensive than explicitly tracking the peers? thanks!

Copy link
Copy Markdown
Collaborator Author

@aditi-pandit aditi-pandit Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxmeng :
The peer counting bug is the fix at line 602.

To give an example:
So say the input had sort keys : 1, 2, 2, 3.
The ranks for these should be: 1, 2, 2, 4

When emitting output rows, we are going from top of the queue to lower. We know that the top rank found so far is 4.
So first row output is (3, 4). Peers of 3 = 1

We need to compute the rank of the new top of the queue (which is 2). Since 2 != 3 we know its a new rank value. All rows with sort key 2 will have the same rank, and that rank value its incremented by the number of peers when computing the rank of the current row which is 4. As there are 2 peers of 2 then their rank should be 4 - 2 = 2.
So next output is (2, 2)

The new row at top of queue is 2. As its not a new sort key the rank remains the same. So next output is (2, 2). Peers of 2 = 2

The new top of queue is 1. As 1 != 2 then its a new rank. The new rank is incremented by the number of its peers to obtain the current rank 2. As number of peers is 1, so new rank is 1.

If we were counting peers of current row (old code) then we would output
1, 3, 3, 4 which is incorrect.

We can't keep all output rows until we see next peer value when outputting, so finding number of top rank rows is the best approach.

We end up doing the numTopRankRows only once for each distinct sort key (not each row) so its not very expensive actually.

When decrementing rank values when outputting to memory, the ranks
drop by the number of peers of the lower rank row than the higher one.
Also fix the computation of numTopRankRows to avoid using the priority
queue container vector as that does not necessarily maintain the order
of ranks.
Copy link
Copy Markdown
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit thanks for the explanation!

@xiaoxmeng xiaoxmeng added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Feb 5, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Feb 6, 2026

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this in D92526503.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Feb 7, 2026

@xiaoxmeng merged this pull request in 34338bf.

amitkdutta pushed a commit to prestodb/presto that referenced this pull request Feb 10, 2026
Updating for facebookincubator/velox#16190

```
== NO RELEASE NOTE ==
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants