Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky fuzz tests for filtered outer SortMergeJoin #12359

Open
Tracked by #9846
korowa opened this issue Sep 6, 2024 · 5 comments
Open
Tracked by #9846

Flaky fuzz tests for filtered outer SortMergeJoin #12359

korowa opened this issue Sep 6, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@korowa
Copy link
Contributor

korowa commented Sep 6, 2024

Describe the bug

After replacing the filter for join fuzz tests with the selective one (that doesn't return 100% of input rows), it turned out that following tests may periodically fail for HashJoin vs SortMergeJoin cases:

  • test_left_join_1k_filtered
  • test_right_join_1k_filtered
  • test_full_join_1k_filtered

To Reproduce

To reproduce this one should add JoinTestType::HjSmj to mentioned above test cases and run them (likely a couple of times)

Expected behavior

SMJ and HJ should return equal resultsets for these test cases

Additional context

No response

@comphead
Copy link
Contributor

comphead commented Sep 8, 2024

left_semi_join was fixed sometime ago. Need to check whats happening with

test_left_join_1k_filtered
test_right_join_1k_filtered
test_full_join_1k_filtered

The repro case is to run the same test 1000 times and depending on data distribution between batches the problem can arise

@comphead
Copy link
Contributor

fyi @viirya

@comphead
Copy link
Contributor

comphead commented Sep 12, 2024

I believe all of them flaky because of flaky LeftAnti join, all Left, Right, Full depends on LeftAnti starting from #10892 PR.
Checking if thats the case

@comphead
Copy link
Contributor

comphead commented Sep 12, 2024

Simple test case for LeftOuter

#[tokio::test]
async fn test_left_outer() {
    let left: Vec<RecordBatch> = make_staggered_batches(1);

    let left = vec![
        RecordBatch::try_new(
            left[0].schema().clone(),
            vec![
                Arc::new(Int32Array::from(vec![1])),
                Arc::new(Int32Array::from(vec![2])),
                Arc::new(Int32Array::from(vec![10])),
                Arc::new(Int32Array::from(vec![20])),
            ],
        ).unwrap()
    ];

    let right = vec![
        RecordBatch::try_new(
            left[0].schema().clone(),
            vec![
                Arc::new(Int32Array::from(vec![1, 1])),
                Arc::new(Int32Array::from(vec![2, 2])),
                Arc::new(Int32Array::from(vec![0, 30])),
                Arc::new(Int32Array::from(vec![0, 40])),
            ],
        ).unwrap()
    ];

    JoinFuzzTestCase::new(
        left,
        right,
        JoinType::Left,
        Some(Box::new(col_lt_col_filter)),
    )
        .run_test(&[JoinTestType::HjSmj], false)
        .await;
}

@viirya cc

@comphead
Copy link
Contributor

For LeftOuter/RightOuter and FullJoin the rootcause is the same. SMJ emits a null row if the filtered match is not found.
But if the filtered match row is found it should emit the joined row instead of null row. And here we come to the same issue as for LeftAnti, so we need to know when all the right rows processed for the given left row.

Given that SMJ emits rows in freeze_streamed which can be called anytime once the output size is equal to batch size there is 2 possible options:

  • Try to calculate if the SMJ is about to switch to ordering == Less which moves left pointer down, meaning everything is processed for the row
  • Move emit to some other place, for example we can fill the buffer with rows and filtered flags until the ordering switch, and only then make a final emit based on the data in the buffer + filtered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants