feat(fuzzer): Add TopNRowNumberFuzzer by aditi-pandit · Pull Request #12103 · facebookincubator/velox

aditi-pandit · 2025-01-16T22:20:04Z

topNRowNumber node is an optimized planNode for SQL with ranking window functions but which limits them to only the topN results. Add a TopNRowNumberFuzzer for plans with this planNode.

This fuzzer is closely modeled after the RowNumberFuzzer. So the common code is abstracted to a RowNumberFuzzerBase class which is used as the parent class for both RowNumberFuzzer and TopnRowNumberFuzzer.

The fuzzer generates plans only for row_number function right now. It will be enhanced to support rank and dense_rank functions after #11554

netlify · 2025-01-16T22:20:23Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`711764a`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/67bf786b33296d00075fb339

aditi-pandit · 2025-02-06T17:19:36Z

Hi @aditi-pandit, thank you for adding fuzzer coverage for TopNRowNumber node. I noticed that the member methods of TopNRowNumberFuzzer are very similar to RowNumberFuzzer's. Does it make sense to create a RowNumberFuzzerBase class with the common functions and let RowNumberFuzzer and TopNRowNumberFuzzer inherit that? Similar to how we organize AggregationFuzzer, WindowFuzzer, and AggregationFuzzerBase?

@kagamiori : Most of the common logic in those classes could have even broader applicability so I moved them to FuzzerUtil class. What remains are the flags etc. I notice that AggregationFuzzerBase covers flags as well. We can try something like that. Will modify the code shortly.

aditi-pandit · 2025-02-07T03:09:57Z

@kagamiori : Have added a RowNumberFuzzerBase class now. PTAL. Thanks !

kagamiori

Hi @aditi-pandit, the code looks good to me, except a few comments for small refactoring. Thanks!

kagamiori · 2025-02-08T01:42:11Z

velox/exec/fuzzer/FuzzerUtil.cpp

+std::vector<RowVectorPtr> flatten(const std::vector<RowVectorPtr>& vectors) {
+  std::vector<RowVectorPtr> flatVectors;
+  for (const auto& vector : vectors) {
+    auto flat = BaseVector::create<RowVector>(
+        vector->type(), vector->size(), vector->pool());
+    flat->copy(vector.get(), 0, 0, vector->size());
+    flatVectors.push_back(flat);
+  }
+
+  return flatVectors;
+}


nit: Can this method be replaced with BaseVector::flattenVector()?

@kagamiori : BaseVector::flattenVector flattens the original input vector, while this flatten method returns a separate output vector for the flattened list of input vectors. The original vector is retained and we want to run the fuzzer on the original input vectors only.

An option could be to copy the input vector and then use BaseVector::flatten(...) on the copied vector. But the rest of the stubbing around it makes it seem simpler to use the current flatten method. wdyt ?

@kagamiori : Nvm... I realize after using common logVectors method from AggregationFuzzerBase that this method isn't needed. see #12300

Updated with #12300

kagamiori · 2025-02-08T01:49:08Z

velox/exec/fuzzer/FuzzerUtil.cpp

+  // Disable testing with TableScan when input contains TIMESTAMP type, due to
+  // the issue #8127.
+  if (type->kind() == TypeKind::TIMESTAMP) {
+    return false;
+  }


Hi @aditi-pandit, I remember that for running fuzzers with PQR, the issue #8127 can be resolved by adding -Duser.timezone=America/Los_Angeles to etc/jvm.config in the directory where you installed the Presto server (e.g., presto-server-0.284/etc/jvm.config). Could you please have a try and let me know if it works?

@kagamiori : Have tried with the user.timezone config and I don't see the errors either.

But of course it fails in Velox CI here.

So we'll have to change that separately and let this PR remain as is.

wdyt ?

Hi @aditi-pandit, I believe we already set -Duser.timezone=America/Los_Angeles in the CI jobs here, so if you still see an error when enabling Timestamp type, that needs to be looked into.

For now, could we move if (type->kind() == TypeKind::TIMESTAMP) { return false; } to a local method only used in TopNRowNumberFuzzer.cpp, so that this won't affect other fuzzers that use this isTableScanSupported() utility method? Then you can look into the error caused by Timestamp type after this PR is merged.

kagamiori · 2025-02-08T02:05:30Z

velox/exec/fuzzer/RowNumberFuzzerBase.cpp

+void RowNumberFuzzerBase::logInput(const std::vector<RowVectorPtr>& input) {
+  if (VLOG_IS_ON(1)) {
+    // Flatten inputs.
+    const auto flatInput = test::flatten(input);
+    VLOG(1) << "Input: " << input[0]->toString();
+    for (const auto& v : flatInput) {
+      VLOG(1) << std::endl << v->toString(0, v->size());
+    }
+  }
+}


nit: AggregationFuzzerBase also has a logVectors() method. We can move that to a common place to reuse the code.

Check #12300. I'll update this PR post its merge.

Updated with #12300

aditi-pandit · 2025-02-19T04:12:23Z

@kagamiori : Have updated this PR after the refactoring PR is merged. PTAL.

kagamiori · 2025-02-20T18:56:42Z

LGTM. Let's look into the error caused by Timestamp type and enable Timestamp in a separate PR.

facebook-github-bot · 2025-02-20T18:58:12Z

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aditi-pandit · 2025-02-22T05:30:20Z

@kagamiori : Have removed the timestamp check code from FuzzerUtil to RowNumberFuzzerBase now. PTAL.

facebook-github-bot · 2025-02-24T20:29:23Z

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kagamiori

Hi @aditi-pandit, I got a few linter error internally. Could you take a look and address them? Thanks!

velox/exec/fuzzer/RowNumberFuzzerBase.h

velox/exec/fuzzer/RowNumberFuzzer.cpp

velox/exec/fuzzer/TopNRowNumberFuzzer.cpp

velox/exec/fuzzer/RowNumberFuzzerBase.cpp

velox/exec/fuzzer/TopNRowNumberFuzzerRunner.cpp

aditi-pandit · 2025-02-26T20:25:42Z

@kagamiori : Thanks for providing me the warnings. Have fixed them. PTAL.

facebook-github-bot · 2025-02-27T19:23:05Z

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-02-27T23:07:33Z

@kagamiori merged this pull request in 6632054.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025

aditi-pandit force-pushed the topn_fuzzer branch 3 times, most recently from 9a7590d to a9011fa Compare January 16, 2025 23:26

aditi-pandit force-pushed the topn_fuzzer branch 12 times, most recently from f7beb33 to 25ac51c Compare January 31, 2025 03:48

aditi-pandit changed the title ~~Add TopNRowNumberFuzzer~~ feat(Fuzzer) : Add TopNRowNumberFuzzer Jan 31, 2025

aditi-pandit requested review from duanmeng, kagamiori, kgpai and xiaoxmeng January 31, 2025 03:52

aditi-pandit force-pushed the topn_fuzzer branch from 25ac51c to 49a43dd Compare January 31, 2025 03:55

aditi-pandit changed the title ~~feat(Fuzzer) : Add TopNRowNumberFuzzer~~ feat(Fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025

aditi-pandit changed the title ~~feat(Fuzzer): Add TopNRowNumberFuzzer~~ feat(fuzzer): Add TopNRowNumberFuzzer Jan 31, 2025

aditi-pandit force-pushed the topn_fuzzer branch from 49a43dd to f458e56 Compare January 31, 2025 03:58

aditi-pandit marked this pull request as ready for review January 31, 2025 03:59

aditi-pandit requested review from assignUser and majetideepak as code owners January 31, 2025 03:59

aditi-pandit mentioned this pull request Jan 31, 2025

Add TopNRowNumberFuzzer for TopNRowNumber operator #12017

Open

aditi-pandit force-pushed the topn_fuzzer branch 6 times, most recently from cafd0d1 to cfa8421 Compare February 7, 2025 03:06

kagamiori reviewed Feb 8, 2025

View reviewed changes

aditi-pandit force-pushed the topn_fuzzer branch 2 times, most recently from 2d7ec1a to 6dbf273 Compare February 19, 2025 04:06

aditi-pandit force-pushed the topn_fuzzer branch from 6dbf273 to 7de27c6 Compare February 19, 2025 05:29

kagamiori approved these changes Feb 20, 2025

View reviewed changes

aditi-pandit force-pushed the topn_fuzzer branch from 7de27c6 to b2c13f7 Compare February 22, 2025 05:14

aditi-pandit force-pushed the topn_fuzzer branch from b2c13f7 to 87db1fd Compare February 22, 2025 08:03

kagamiori reviewed Feb 26, 2025

View reviewed changes

feat(fuzzer): Add TopnRowNumberFuzzer

711764a

aditi-pandit force-pushed the topn_fuzzer branch from 87db1fd to 711764a Compare February 26, 2025 20:24

facebook-github-bot closed this in 6632054 Feb 27, 2025

facebook-github-bot added the Merged label Feb 27, 2025

aditi-pandit deleted the topn_fuzzer branch February 27, 2025 23:27

Conversation

aditi-pandit commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

aditi-pandit commented Feb 6, 2025

Uh oh!

aditi-pandit commented Feb 7, 2025

Uh oh!

kagamiori left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kagamiori Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditi-pandit commented Feb 19, 2025

Uh oh!

kagamiori commented Feb 20, 2025

Uh oh!

facebook-github-bot commented Feb 20, 2025

Uh oh!

aditi-pandit commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Feb 24, 2025

Uh oh!

kagamiori left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aditi-pandit commented Feb 26, 2025

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aditi-pandit commented Jan 16, 2025 •

edited

Loading

netlify bot commented Jan 16, 2025 •

edited

Loading

aditi-pandit Feb 19, 2025 •

edited

Loading

kagamiori Feb 20, 2025 •

edited

Loading

aditi-pandit commented Feb 22, 2025 •

edited

Loading