Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: fix some new warnings, fix force_hash_collisions flag propagation + comment out some tests #11467

Closed
wants to merge 1 commit into from

Conversation

nrc
Copy link
Contributor

@nrc nrc commented Jul 15, 2024

Fixes some warnings when compiling with nightly which will soon be warnings on stable too. These are multiple unexpected cfgcondition value:force_hash_collisions`` and one for loop over a &Option`. This is more readably written as an `if let` statement`

Which issue does this PR close?

NA

Rationale for this change

Preserve clean compiles

What changes are included in this PR?

Just the warning fixes

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Jul 15, 2024
Signed-off-by: Nick Cameron <[email protected]>
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 15, 2024
@nrc
Copy link
Contributor Author

nrc commented Jul 15, 2024

So, this has got more interesting than I predicted. Since force_hash_collisions wasn't being properly propagated to crates, tests which thought they were forcing hash collisions actually weren't. With that corrected there are now a bunch of legitimately failing tests which are beyond my ability to fix. I've cfged or commented them out for now.

@@ -1930,6 +1931,8 @@ mod tests {
Ok(())
}

// FIXME(#TODO) test fails with feature `force_hash_collisions`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nrc -- I agree that cfg'ing these out is the right step for this PR

I'll file a ticket to look into the failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was writing up a ticket and testing locally. When I removed the cfgs

diff --git a/datafusion/physical-plan/src/joins/hash_join.rs b/datafusion/physical-plan/src/joins/hash_join.rs
index fdf062b1f..359a86856 100644
--- a/datafusion/physical-plan/src/joins/hash_join.rs
+++ b/datafusion/physical-plan/src/joins/hash_join.rs
@@ -1583,7 +1583,7 @@ mod tests {
     use rstest::*;
     use rstest_reuse::*;

-    #[cfg(not(feature = "force_hash_collisions"))]
+
     fn div_ceil(a: usize, b: usize) -> usize {
         (a + b - 1) / b
     }
@@ -1932,7 +1932,6 @@ mod tests {
     }

     // FIXME(#TODO) test fails with feature `force_hash_collisions`
-    #[cfg(not(feature = "force_hash_collisions"))]
     #[apply(batch_sizes)]
     #[tokio::test]
     async fn join_inner_two(batch_size: usize) -> Result<()> {
@@ -1989,7 +1988,6 @@ mod tests {

     /// Test where the left has 2 parts, the right with 1 part => 1 part
     // FIXME(#TODO) test fails with feature `force_hash_collisions`
-    #[cfg(not(feature = "force_hash_collisions"))]
     #[apply(batch_sizes)]
     #[tokio::test]
     async fn join_inner_one_two_parts_left(batch_size: usize) -> Result<()> {
@@ -2103,7 +2101,6 @@ mod tests {

     /// Test where the left has 1 part, the right has 2 parts => 2 parts
     // FIXME(#TODO) test fails with feature `force_hash_collisions`
-    #[cfg(not(feature = "force_hash_collisions"))]
     #[apply(batch_sizes)]
     #[tokio::test]
     async fn join_inner_one_two_parts_right(batch_size: usize) -> Result<()> {

And then ran the tests they seemed to pass for me locally:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test --lib --features=force_hash_collisions   -p datafusion-physical-plan -- join_inner
warning: /Users/andrewlamb/Software/datafusion/Cargo.toml: unused manifest key: workspace.lints.rust.unexpected_cfgs.check-cfg
   Compiling datafusion-physical-plan v40.0.0 (/Users/andrewlamb/Software/datafusion/datafusion/physical-plan)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 3.59s
     Running unittests src/lib.rs (target/debug/deps/datafusion_physical_plan-8b4d4245d6c07ebc)

running 40 tests
test joins::hash_join::tests::join_inner_one_no_shared_column_names ... ok
test joins::hash_join::tests::join_inner_one::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left_randomly_ordered ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_one_randomly_ordered ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_4_2 ... ok
test joins::sort_merge_join::tests::join_inner_two ... ok
test joins::sort_merge_join::tests::join_inner_one ... ok
test joins::sort_merge_join::tests::join_inner_two_two ... ok
test joins::sort_merge_join::tests::join_inner_output_two_batches ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_2_10 ... ok
test joins::sort_merge_join::tests::join_inner_with_nulls_with_options ... ok
test joins::sort_merge_join::tests::join_inner_with_nulls ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_5_1 ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_4_2 ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_1_8192 ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_3_5 ... ok
test joins::nested_loop_join::tests::join_inner_with_filter ... ok

test result: ok. 40 passed; 0 failed; 0 ignored; 0 measured; 688 filtered out; finished in 0.03s

@@ -238,16 +238,17 @@ SELECT * FROM t1 FULL JOIN t2 ON t1_id = t2_id
44 d 4 44 x 3
NULL NULL NULL 55 w 3

# FIXME(#TODO) fails with feature `force_hash_collisions`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the issue with SMJ with hash collisions?

@alamb
Copy link
Contributor

alamb commented Jul 17, 2024

I plan to investigate / file tickets for why the sqllogictests are failing before merging this PR -- I hope to do so today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @nrc -- I did some local testing and I couldn't get the tests to fail for me locally. Are you sure they are failing if when run with force_hash_collisions? Maybe it was something about my machine and we should try it on CI 🤔

@@ -1930,6 +1931,8 @@ mod tests {
Ok(())
}

// FIXME(#TODO) test fails with feature `force_hash_collisions`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was writing up a ticket and testing locally. When I removed the cfgs

diff --git a/datafusion/physical-plan/src/joins/hash_join.rs b/datafusion/physical-plan/src/joins/hash_join.rs
index fdf062b1f..359a86856 100644
--- a/datafusion/physical-plan/src/joins/hash_join.rs
+++ b/datafusion/physical-plan/src/joins/hash_join.rs
@@ -1583,7 +1583,7 @@ mod tests {
     use rstest::*;
     use rstest_reuse::*;

-    #[cfg(not(feature = "force_hash_collisions"))]
+
     fn div_ceil(a: usize, b: usize) -> usize {
         (a + b - 1) / b
     }
@@ -1932,7 +1932,6 @@ mod tests {
     }

     // FIXME(#TODO) test fails with feature `force_hash_collisions`
-    #[cfg(not(feature = "force_hash_collisions"))]
     #[apply(batch_sizes)]
     #[tokio::test]
     async fn join_inner_two(batch_size: usize) -> Result<()> {
@@ -1989,7 +1988,6 @@ mod tests {

     /// Test where the left has 2 parts, the right with 1 part => 1 part
     // FIXME(#TODO) test fails with feature `force_hash_collisions`
-    #[cfg(not(feature = "force_hash_collisions"))]
     #[apply(batch_sizes)]
     #[tokio::test]
     async fn join_inner_one_two_parts_left(batch_size: usize) -> Result<()> {
@@ -2103,7 +2101,6 @@ mod tests {

     /// Test where the left has 1 part, the right has 2 parts => 2 parts
     // FIXME(#TODO) test fails with feature `force_hash_collisions`
-    #[cfg(not(feature = "force_hash_collisions"))]
     #[apply(batch_sizes)]
     #[tokio::test]
     async fn join_inner_one_two_parts_right(batch_size: usize) -> Result<()> {

And then ran the tests they seemed to pass for me locally:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test --lib --features=force_hash_collisions   -p datafusion-physical-plan -- join_inner
warning: /Users/andrewlamb/Software/datafusion/Cargo.toml: unused manifest key: workspace.lints.rust.unexpected_cfgs.check-cfg
   Compiling datafusion-physical-plan v40.0.0 (/Users/andrewlamb/Software/datafusion/datafusion/physical-plan)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 3.59s
     Running unittests src/lib.rs (target/debug/deps/datafusion_physical_plan-8b4d4245d6c07ebc)

running 40 tests
test joins::hash_join::tests::join_inner_one_no_shared_column_names ... ok
test joins::hash_join::tests::join_inner_one::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_one::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left_randomly_ordered ... ok
test joins::hash_join::tests::join_inner_one_two_parts_left::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_one_randomly_ordered ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_two::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_4_2 ... ok
test joins::hash_join::tests::join_inner_one_two_parts_right::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_3_5 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_1_8192 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_2_10 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_5_1 ... ok
test joins::hash_join::tests::join_inner_with_filter::batch_size_4_2 ... ok
test joins::sort_merge_join::tests::join_inner_two ... ok
test joins::sort_merge_join::tests::join_inner_one ... ok
test joins::sort_merge_join::tests::join_inner_two_two ... ok
test joins::sort_merge_join::tests::join_inner_output_two_batches ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_2_10 ... ok
test joins::sort_merge_join::tests::join_inner_with_nulls_with_options ... ok
test joins::sort_merge_join::tests::join_inner_with_nulls ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_5_1 ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_4_2 ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_1_8192 ... ok
test joins::hash_join::tests::partitioned_join_inner_one::batch_size_3_5 ... ok
test joins::nested_loop_join::tests::join_inner_with_filter ... ok

test result: ok. 40 passed; 0 failed; 0 ignored; 0 measured; 688 filtered out; finished in 0.03s

@alamb alamb changed the title Minor: fix some new warnings Minor: fix some new warnings, fix force_hash_collisions flag propagation + comment out some tests Jul 18, 2024
@nrc
Copy link
Contributor Author

nrc commented Jul 19, 2024

@alamb thanks for reviewing and looking into this!

This has got weirder and I may be doing something fundamentally wrong, but here's my investigation. For reference, I'm running on Linux AMD64, plenty of RAM and disk space. All these results are after only removing the cfg from hash_join.rs.

Running cargo test --lib --features=force_hash_collisions -p datafusion-physical-plan -- join_inner passed all fine, however, cargo test --lib --features=force_hash_collisions -- join_inner produced multiple failures (no failures with the cfgs restored or without cargo test --features=force_hash_collisions -- join_inner). Running the failing command inside the datafusion/physical-plan directory also passes, but running in datafusion fails (passes with -p datafusion-physical-plan).

I have no idea what is going on, but unless you have an idea, I can try and find the cause next week.

@nrc
Copy link
Contributor Author

nrc commented Jul 19, 2024

The errors:

Screenshot 2024-07-19 at 5 19 46 PM

@alamb
Copy link
Contributor

alamb commented Jul 20, 2024

I have no idea what is going on, but unless you have an idea, I can try and find the cause next week.

I think there are two possibilities:

  1. The tests are non deterministic (as in they produce different output based on some architecture specific property -- like thread scheduling or hash values or soemething)
  2. There really is a bug when there are hash collisions

Given the tests pass for some machines I am inclined to believe it is the first, but I don't really know for sure without being able to see what the test failures are (can you provide some the output of the failed tests?)

@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

It appears we ran out of time on this one and the new clippy lint was released on stable: #11651

@@ -152,4 +152,5 @@ rpath = false
large_futures = "warn"

[workspace.lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = ["cfg(tarpaulin)"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to force_hash_collisions change, or something separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it allows tarpaulin to be used as a feature even if the crate doesn't define it

@findepi
Copy link
Member

findepi commented Jul 25, 2024

I rebased this commit (conflicts) and included it with other fixes in #11654.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants