Skip to content

feat(fuzzer): Add TopNRowNumberFuzzer #12103

Closed
aditi-pandit wants to merge 1 commit intomainfrom
topn_fuzzer
Closed

feat(fuzzer): Add TopNRowNumberFuzzer #12103
aditi-pandit wants to merge 1 commit intomainfrom
topn_fuzzer

Conversation

@aditi-pandit
Copy link
Copy Markdown
Collaborator

@aditi-pandit aditi-pandit commented Jan 16, 2025

topNRowNumber node is an optimized planNode for SQL with ranking window functions but which limits them to only the topN results. Add a TopNRowNumberFuzzer for plans with this planNode.

This fuzzer is closely modeled after the RowNumberFuzzer. So the common code is abstracted to a RowNumberFuzzerBase class which is used as the parent class for both RowNumberFuzzer and TopnRowNumberFuzzer.

The fuzzer generates plans only for row_number function right now. It will be enhanced to support rank and dense_rank functions after #11554

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025
@netlify
Copy link
Copy Markdown

netlify bot commented Jan 16, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 711764a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67bf786b33296d00075fb339

@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 3 times, most recently from 9a7590d to a9011fa Compare January 16, 2025 23:26
@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 12 times, most recently from f7beb33 to 25ac51c Compare January 31, 2025 03:48
@aditi-pandit aditi-pandit changed the title Add TopNRowNumberFuzzer feat(Fuzzer) : Add TopNRowNumberFuzzer Jan 31, 2025
@aditi-pandit aditi-pandit changed the title feat(Fuzzer) : Add TopNRowNumberFuzzer feat(Fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025
@aditi-pandit aditi-pandit changed the title feat(Fuzzer): Add TopNRowNumberFuzzer feat(fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025
@aditi-pandit aditi-pandit marked this pull request as ready for review January 31, 2025 03:59
@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

Hi @aditi-pandit, thank you for adding fuzzer coverage for TopNRowNumber node. I noticed that the member methods of TopNRowNumberFuzzer are very similar to RowNumberFuzzer's. Does it make sense to create a RowNumberFuzzerBase class with the common functions and let RowNumberFuzzer and TopNRowNumberFuzzer inherit that? Similar to how we organize AggregationFuzzer, WindowFuzzer, and AggregationFuzzerBase?

@kagamiori : Most of the common logic in those classes could have even broader applicability so I moved them to FuzzerUtil class. What remains are the flags etc. I notice that AggregationFuzzerBase covers flags as well. We can try something like that. Will modify the code shortly.

@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 6 times, most recently from cafd0d1 to cfa8421 Compare February 7, 2025 03:06
@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

@kagamiori : Have added a RowNumberFuzzerBase class now. PTAL. Thanks !

Copy link
Copy Markdown
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, the code looks good to me, except a few comments for small refactoring. Thanks!

Comment on lines +196 to +206
std::vector<RowVectorPtr> flatten(const std::vector<RowVectorPtr>& vectors) {
std::vector<RowVectorPtr> flatVectors;
for (const auto& vector : vectors) {
auto flat = BaseVector::create<RowVector>(
vector->type(), vector->size(), vector->pool());
flat->copy(vector.get(), 0, 0, vector->size());
flatVectors.push_back(flat);
}

return flatVectors;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can this method be replaced with BaseVector::flattenVector()?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori : BaseVector::flattenVector flattens the original input vector, while this flatten method returns a separate output vector for the flattened list of input vectors. The original vector is retained and we want to run the fuzzer on the original input vectors only.

An option could be to copy the input vector and then use BaseVector::flatten(...) on the copied vector. But the rest of the stubbing around it makes it seem simpler to use the current flatten method. wdyt ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori : Nvm... I realize after using common logVectors method from AggregationFuzzerBase that this method isn't needed. see #12300

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with #12300

Comment on lines +221 to +213
// Disable testing with TableScan when input contains TIMESTAMP type, due to
// the issue #8127.
if (type->kind() == TypeKind::TIMESTAMP) {
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, I remember that for running fuzzers with PQR, the issue #8127 can be resolved by adding -Duser.timezone=America/Los_Angeles to etc/jvm.config in the directory where you installed the Presto server (e.g., presto-server-0.284/etc/jvm.config). Could you please have a try and let me know if it works?

Copy link
Copy Markdown
Collaborator Author

@aditi-pandit aditi-pandit Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori : Have tried with the user.timezone config and I don't see the errors either.

But of course it fails in Velox CI here.

So we'll have to change that separately and let this PR remain as is.

wdyt ?

Copy link
Copy Markdown
Contributor

@kagamiori kagamiori Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, I believe we already set -Duser.timezone=America/Los_Angeles in the CI jobs here, so if you still see an error when enabling Timestamp type, that needs to be looked into.

For now, could we move if (type->kind() == TypeKind::TIMESTAMP) { return false; } to a local method only used in TopNRowNumberFuzzer.cpp, so that this won't affect other fuzzers that use this isTableScanSupported() utility method? Then you can look into the error caused by Timestamp type after this PR is merged.

Comment on lines +97 to +123
void RowNumberFuzzerBase::logInput(const std::vector<RowVectorPtr>& input) {
if (VLOG_IS_ON(1)) {
// Flatten inputs.
const auto flatInput = test::flatten(input);
VLOG(1) << "Input: " << input[0]->toString();
for (const auto& v : flatInput) {
VLOG(1) << std::endl << v->toString(0, v->size());
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AggregationFuzzerBase also has a logVectors() method. We can move that to a common place to reuse the code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check #12300. I'll update this PR post its merge.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with #12300

@aditi-pandit aditi-pandit force-pushed the topn_fuzzer branch 2 times, most recently from 2d7ec1a to 6dbf273 Compare February 19, 2025 04:06
@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

@kagamiori : Have updated this PR after the refactoring PR is merged. PTAL.

@kagamiori
Copy link
Copy Markdown
Contributor

LGTM. Let's look into the error caused by Timestamp type and enable Timestamp in a separate PR.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

aditi-pandit commented Feb 22, 2025

@kagamiori : Have removed the timestamp check code from FuzzerUtil to RowNumberFuzzerBase now. PTAL.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, I got a few linter error internally. Could you take a look and address them? Thanks!

@aditi-pandit
Copy link
Copy Markdown
Collaborator Author

@kagamiori : Thanks for providing me the warnings. Have fixed them. PTAL.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@kagamiori merged this pull request in 6632054.

@aditi-pandit aditi-pandit deleted the topn_fuzzer branch February 27, 2025 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants